Build a RAG Pipeline in MATLAB: From Document Ingestion to LLM-Driven Insights

Posted by Ciara McGahey, October 13, 2025

150 views (last 30 days) | 0 Likes | 0 comment

The following post is from Yuchen Dong, Senior Finance Application Engineer at MathWorks.

The example featured in the blog can be found on GitHub here.

Retrieval-Augmented Generation (RAG) has emerged as a powerful architecture to ground large language models (LLMs) in trusted, domain-specific data. In finance, pairing LLMs with curated sources, such as Federal Open Market Committee (FOMC) minutes, can drive more reliable insight generation.

In this blog, we’ll walk through how to build a RAG pipeline using MATLAB, from preprocessing FOMC documents all the way to generating responses with LLMs of your choice. You can tailor the database with hundreds of diverse, finance-related documents to suit your specific needs.

Why It Matters

With just a few lines of code, you can:

Store vectorized insights from regulatory reports
Search them efficiently using semantic similarity
Provide LLM responses anchored in real-world financial data

This RAG architecture is a powerful tool for compliance analysts, economists, and financial engineers seeking to extract value from unstructured documents.

Step 1: Load Meeting Documents

To begin, we load a document that reflects real-world financial discourse: minutes from a Federal Open Market Committee (FOMC) meeting, where U.S. monetary policy decisions are made. In this example, we use a single FOMC document in .pdf format.

Step 2: Preprocess the text

Raw text data is often messy. It may include stop words, punctuation, and inconsistent word forms that can reduce the accuracy of downstream tasks like embedding and search. To prepare our FOMC meeting notes for analysis, we use the Preprocess Text Data live task in MATLAB to clean and normalize the content.

This step includes tokenization, lemmatization, and removal of stop words and punctuation. The result is a cleaner, more structured text suitable for vectorization.

To visualize what terms dominate the discussion, we generate a word cloud of the preprocessed content:

This highlights dominant themes, such as inflation, policy rates, and employment.

Step 3: Chunking the Text

Before embedding the text, we apply two key steps:

Filter out short fragments that contain fewer than three tokens—these often represent incomplete or uninformative sentences.
Split the cleaned document into fixed-size chunks (in this case, ~128 tokens each), ensuring the resulting segments are manageable for vectorization and LLM input limits.

Step 4: Vectorize the Document Chunks

This model requires the Text Analytics Toolbox Model for either the all-MiniLM-L6-v2 Network or all-MiniLM-L12-v2 Network support package.

If the support package is not installed, it can be downloaded from the Add-Ons Menu.

With our text chunks ready, we now convert them into numerical vectors using a pretrained embedding model. These embeddings form the semantic backbone of our Retrieval-Augmented Generation (RAG) system.

Step 5: Store Embeddings in a Vector Database

To support fast and accurate retrieval for RAG workflows, we store the document embeddings in a vector database: specifically, PostgreSQL with the pgvector extension.

The pgvector extension enables you to store and query high-dimensional embedding vectors directly within PostgreSQL, along with any related structured data. This streamlines your architecture by combining traditional relational data and semantic search into one system.

Once pgvector is installed, you can use the Database Explorer App in MATLAB to connect to your PostgreSQL instance and view or query the embedded vectors—such as those we generated from the FOMC meeting notes.

Step 6: Retrieve Documents Based on a Query

Suppose we want to ask:

“Will the Federal Funds rate decrease in the next 3 months?”

The results are the most semantically relevant document snippets related to the query.

Step 7: Visualize and Validate Similarity

To evaluate the results, we compute cosine similarity and plot a t-SNE map to show how close the query is to the returned documents in vector space.

Step 8: Ask the LLM to Generate a Response

With the top matching chunks, we construct a prompt and generate a grounded answer using an LLM. The output provides a detailed response supported by FOMC context—grounded, explainable, and tailored to financial analysis.