{"id":15485,"date":"2024-07-09T09:16:22","date_gmt":"2024-07-09T13:16:22","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=15485"},"modified":"2025-01-02T16:42:33","modified_gmt":"2025-01-02T21:42:33","slug":"local-llms-with-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2024\/07\/09\/local-llms-with-matlab\/","title":{"rendered":"Local LLMs with MATLAB"},"content":{"rendered":"<h6><\/h6>\r\nLocal large language models (LLMs), such as llama, phi3, and mistral, are now available in the <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/163796-large-language-models-llms-with-matlab\">Large Language Models (LLMs) with MATLAB<\/a> repository through <a href=\"https:\/\/ollama.com\/\">Ollama\u2122<\/a>! This is such exciting news that I can\u2019t think of a better introduction than to share with you this amazing development. Even if you don\u2019t read any further (but I hope you do), because you are too eager to try out local LLMs with MATLAB, know that you can access the repository via these two options:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li><a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/163796-large-language-models-llms-with-matlab\">File Exchange - Large Language (LLMs) with MATLAB<\/a><\/li>\r\n \t<li><a href=\"https:\/\/github.com\/matlab-deep-learning\/llms-with-matlab\/\">GitHub - Large Language Models (LLMs) with MATLAB<\/a><\/li>\r\n<\/ul>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15558 \" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2024\/07\/local_llms-1.png\" alt=\"local llms with matlab\" width=\"464\" height=\"289\" \/>\r\n<h6><\/h6>\r\nI am glad you decided to keep reading. In the previous blog post <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2024\/01\/22\/large-language-models-with-matlab\/\">Large Language Models with MATLAB<\/a>, I shared with you how to connect MATLAB to the OpenAI API. In this blog post, I am going to show you:\r\n<h6><\/h6>\r\n<ol>\r\n \t<li>How to access local LLMs, including llama3 and mixtral, by connecting MATLAB to a local Ollama server.<\/li>\r\n \t<li>How to use llama3 for retrieval-augmented generation (RAG) with the help of <a href=\"https:\/\/www.mathworks.com\/solutions\/artificial-intelligence\/natural-language-processing.html\">MATLAB NLP tools<\/a>.<\/li>\r\n \t<li>Why RAG is so useful when you want to use your own data for <a href=\"https:\/\/www.mathworks.com\/discovery\/natural-language-processing.html\">natural language processing<\/a> (NLP) tasks.<\/li>\r\n<\/ol>\r\nFor more examples on RAG, creating a chatbot, processing text in real-time, and more NLP applications, see <a href=\"https:\/\/github.com\/matlab-deep-learning\/llms-with-matlab\/tree\/main\/examples\">Examples: LLMs with MATLAB<\/a>.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Set Up Ollama Server<\/strong><\/p>\r\nFirst, go to <a href=\"https:\/\/ollama.com\/\">https:\/\/ollama.com\/<\/a> and follow the download instructions. To use local models with Ollama, you will need to install and start an Ollama server, and then, pull models into the server. For example, to pull llama3, go to your terminal and type:\r\n<pre class=\"brush: python\" style=\"background-color: white;\">ollama pull llama3\r\n<\/pre>\r\n<h6><\/h6>\r\nSome of the other supported LLMs are llama2, codellama, phi3, mistral, and gemma. To see all supported LLMs by the Ollama server, see <a href=\"https:\/\/ollama.com\/library\">Ollama models<\/a>. To learn more about connecting to Ollama from MATLAB, see <a href=\"https:\/\/github.com\/matlab-deep-learning\/llms-with-matlab\/blob\/main\/doc\/Ollama.md\">LLMs with MATLAB - Ollama<\/a>.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Initialize Chat for RAG<\/strong><\/p>\r\nRAG is a technique for enhancing the results achieved with an LLM by using your own data. The following figure shows the RAG workflow. Both accuracy and reliability can be augmented by retrieving information from trusted sources. For example, the prompt fed to the LLM can be enhanced with more up-to-date or technical information.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15503 \" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2024\/07\/rag.png\" alt=\"Retrieval-Augmented Generation (RAG) workflow\" width=\"719\" height=\"315\" \/>\r\n<h6><\/h6>\r\nInitialize the chatbot with the specified model (llama3) and instructions. The chatbot anticipates that it will receive a query from the user, which may or may not be enhanced by additional context. This means, RAG may or may not be applied.\r\n<pre>system_prompt = \"You are a helpful assistant. You might get a \" + ...\r\n    \"context for each question, but only use the information \" + ...\r\n    \"in the context if that makes sense to answer the question. \";\r\nchat = ollamaChat(\"llama3\",system_prompt);\r\n<\/pre>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Ask Simple Question<\/strong><\/p>\r\nFirst, let\u2019s check how the model performs when prompted with a general knowledge question.\r\n<h6><\/h6>\r\nDefine the prompt for the chatbot. Notice that the prompt does not include any context, that means that RAG is not applied.\r\n<pre>query_simple = \"What is the most famous Verdi opera?\";\r\nprompt_simple = \"Answer the following question: \"+ query_simple;\r\n<\/pre>\r\n<h6><\/h6>\r\nDefine a function that wraps text, which you can use to make the generated text easier to read.\r\n<pre>function wrappedText = wrapText(text)\r\n    wrappedText = splitSentences(text);\r\n    wrappedText = join(wrappedText,newline);\r\nend\r\n<\/pre>\r\n<h6><\/h6>\r\nGenerate a response.\r\n<pre>response_simple = generate(chat,prompt_simple);\r\nwrapText(response_simple)\r\n<\/pre>\r\n<h6><\/h6>\r\n<pre class=\"brush: python\" style=\"background-color: white; border: white;\">ans = \r\n    \"That's an easy one!\r\n     The most famous Verdi opera is undoubtedly \"La Traviata\".\r\n     It was written in 1853 and is based on Alexandre Dumas' novel \"La Dame aux Cam\u00e9lias\".\r\n     It tells the tragic story of Violetta Val\u00e9ry, a courtesan who falls in love with Alfredo Germont, but ultimately sacrifices her life to preserve his reputation.\r\n     The opera features some of Verdi's most beloved music, including the iconic duet \"Ah, fors'\u00e8 lui\" and the show-stopping aria \"Sempre libera\".\"\r\n<\/pre>\r\n<h6><\/h6>\r\nThe generated response is coherent, accurate, and quite helpful.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Ask Technical Question<\/strong><\/p>\r\nNow, let\u2019s see if the chatbot can perform as well when asked a technical question without enhancing its performance with RAG. More specifically, I want to know how to import a deep learning model from PyTorch\u00ae into MATLAB.\r\n<h6><\/h6>\r\nDefine the prompt for the chatbot and generate a response. Wrap the text for easier visualization.\r\n<pre>query_tech = \"How to import a PyTorch model into MATLAB?\";\r\nprompt_tech = \"Answer the following question: \"+ query_tech;\r\nresponse_tech = generate(chat,prompt_tech);\r\nwrapText(response_tech)\r\n<\/pre>\r\n<h6><\/h6>\r\n<pre class=\"brush: python\" style=\"background-color: white; border: white;\">ans = \r\n    \"To import a PyTorch model into MATLAB, you can use the `torch2matlab` tool provided by MathWorks.\r\n     This tool is specifically designed for converting PyTorch models into MATLAB's deep learning framework, known as Deep Learning Toolbox.\r\n     Here are the general steps to follow:\r\n     1.\r\n     First, make sure you have the `torch2matlab` tool installed on your system.\r\n     You can download it from the MathWorks website.\r\n     2.\r\n     Save your PyTorch model using the `torch.\r\n     jit.\r\n     save()` function, which will convert your model into a binary file (e.g., `.\r\n     pt` file).\r\n     3.\r\n     Open MATLAB and navigate to the directory where you saved the PyTorch model.\r\n     4.\r\n     Run the following command in the MATLAB Command Window:\r\n     ```matlab\r\n     model = torch2matlab('your_model_file.\r\n     pt');\r\n     ```\r\n     Replace `'your_model_file.\r\n     pt'` with the actual name of your PyTorch model file (e.g., `mymodel.pt`).\r\n     5.\r\n     The `torch2matlab` tool will convert your PyTorch model into a MATLAB-compatible format and store it in the `model` variable.\r\n     6.\r\n     You can then use this converted model in your MATLAB code, just like you would with any other Deep Learning Toolbox model.\r\n     Note that the conversion process may not always be perfect, as the two frameworks have different architectures and representations of neural networks.\r\n     However, the `torch2matlab` tool should handle most common cases and allow you to import and use your PyTorch models in MATLAB.\"\r\n<\/pre>\r\n<h6><\/h6>\r\nBefore you start looking into how to use torch2matlab, I have to inform you that such a tool does not exist. Even though the generated response contains some accurate elements, it is also clear that the model hallucinated, which is the most widely known pitfall of LLMs. The model didn\u2019t have enough data to generate an informed response but generated one anyways.\r\n<h6><\/h6>\r\nHallucinations might be more prevalent when querying on technical or domain-specific topics. For example, if you want as an engineer to use LLMs for your daily tasks, feeding additional technical information to the model using RAG, can yield much better results as you will see further down this post.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Download and Preprocess Document<\/strong><\/p>\r\nLuckily, I know just the right technical document to feed to the chatbot, a <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2024\/04\/22\/convert-deep-learning-models-between-pytorch-tensorflow-and-matlab\/\">previous blog post<\/a>, to enhance its accuracy.\r\n<h6><\/h6>\r\nSpecify the URL of the blog post.\r\n<pre>url = \"https:\/\/blogs.mathworks.com\/deep-learning\/2024\/04\/22\/convert-deep-learning-models-between-pytorch-tensorflow-and-matlab\/\";\r\n<\/pre>\r\n<h6><\/h6>\r\nDefine the local path where the post will be saved, download it using the provided URL, and save it to the specified local path.\r\n<pre>localpath = \".\/data\/\";\r\nif ~exist(localpath, 'dir')\r\n    mkdir(localpath);\r\nend\r\n\r\nfilename = \"blog.html\";\r\nwebsave(localpath+filename,url);\r\n<\/pre>\r\n<h6><\/h6>\r\nRead the text from the downloaded file by first creating a FileDatastore object.\r\n<pre>fds = fileDatastore(localpath,\"FileExtensions\",\".html\",\"ReadFcn\",@extractFileText);\r\n\r\nstr = [];\r\nwhile hasdata(fds)\r\n    textData = read(fds);\r\n    str = [str; textData];\r\nend\r\n<\/pre>\r\n<h6><\/h6>\r\nDefine a function for text preprocessing.\r\n<pre>function allDocs = preprocessDocuments(str)\r\n    paragraphs = splitParagraphs(join(str));\r\n    allDocs = tokenizedDocument(paragraphs);\r\nend\r\n<\/pre>\r\n<h6><\/h6>\r\nSplit the text data into paragraphs.\r\n<pre>document = preprocessDocuments(str);\r\n<\/pre>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Retrieve Document<\/strong><\/p>\r\nIn this section, I am going to show you an integral part of RAG, that is how to retrieve and filter the saved document based on the technical query.\r\n<h6><\/h6>\r\nTokenize the query and find similarity scores between the query and document.\r\n<pre>embQuery = bm25Similarity(document,tokenizedDocument(query_tech));\r\n<\/pre>\r\n<h6><\/h6>\r\nSort the documents in descending order of similarity scores.\r\n<pre>[~, idx] = sort(embQuery,\"descend\");\r\nlimitWords = 1000;\r\nselectedDocs = [];\r\ntotalWords = 0;\r\n<\/pre>\r\n<h6><\/h6>\r\nIterate over the sorted document indices until the word limit is reached.\r\n<pre>i = 1;\r\nwhile totalWords &lt;= limitWords &amp;&amp; i &lt;= length(idx)\r\n    totalWords = totalWords + size(document(idx(i)).tokenDetails,1);\r\n    selectedDocs = [selectedDocs; joinWords(document(idx(i)))];\r\n    i = i + 1;\r\nend\r\n<\/pre>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 20px;\"><strong>Generate Response with RAG<\/strong><\/p>\r\nDefine the prompt for the chatbot with added technical context, and generate a response.\r\n<pre>prompt_rag = \"Context:\" + join(selectedDocs, \" \") ...\r\n    + newline +\"Answer the following question: \"+ query_tech;\r\nresponse_rag = generate(chat, prompt_rag);\r\nwrapText(response_rag)\r\n<\/pre>\r\n<h6><\/h6>\r\n<pre class=\"brush: python\" style=\"background-color: white; border: white;\">ans = \r\n    \"To import a PyTorch model into MATLAB, you can use the `importNetworkFromPyTorch` function.\r\n     This function requires the name of the PyTorch model file and the input sizes as name-value arguments.\r\n     For example:\r\n     net = importNetworkFromPyTorch(\"mnasnet1_0.\r\n     pt\", PyTorchInputSizes=[NaN, 3,224,224]);\r\n     This code imports a PyTorch model named \"mnasnet1_0\" from a file called \"mnasnet1_0.\r\n     pt\" and specifies the input sizes as NaN, 3, 224, and 224.\r\n     The `PyTorchInputSizes` argument is used to automatically create and add the input layer for a batch of images.\"\r\n<\/pre>\r\n<h6><\/h6>\r\nThe chatbot\u2019s response is now accurate!\r\n<h6><\/h6>\r\nIn this example, I used web content to enhance the accuracy of the generated response. You can replicate this RAG workflow to enhance the accuracy of your queries with any sources (one or multiple) you choose, like technical reports, design specifications, or academic papers.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 22px; color: #c04c0b;\"><strong>Key Takeaways<\/strong><\/p>\r\n\r\n<ol>\r\n \t<li>The <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/163796-large-language-models-llms-with-matlab\">Large Language Models (LLMs) with MATLAB<\/a> repository has been updated with local LLMs through an Ollama server.<\/li>\r\n \t<li>Local LLMs are great for NLP tasks, such as RAG, and now you can use the most popular LLMs from MATLAB.<\/li>\r\n \t<li>Take advantage of MATLAB tools, and more specifically\u00a0<a href=\"https:\/\/www.mathworks.com\/products\/text-analytics.html\">Text Analytics Toolbox<\/a>\u00a0functions, to enhance the LLM functionality, such as retrieving, managing, and processing text.<\/li>\r\n<\/ol>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2024\/07\/local_llms-1.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>\r\nLocal large language models (LLMs), such as llama, phi3, and mistral, are now available in the Large Language Models (LLMs) with MATLAB repository through Ollama\u2122! This is such exciting news that I... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2024\/07\/09\/local-llms-with-matlab\/\">read more >><\/a><\/p>","protected":false},"author":194,"featured_media":15558,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9,5],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/15485"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/194"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=15485"}],"version-history":[{"count":24,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/15485\/revisions"}],"predecessor-version":[{"id":16763,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/15485\/revisions\/16763"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/15558"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=15485"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=15485"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=15485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}