Model Context Protocol for RAG
So, you’ve heard about Model Context Protocol (MCP) — it’s the new hotness that lets AI agents control tools like Blender, browsers, or your local PDF collection. MCP gives models like Claude or Cursor the ability to “talk to” real tools on your machine.
In this hands-on guide, we’ll build a RAG (Retrieval-Augmented Generation) setup using FastMCP and LangChain. The goal? Let Claude or Cursor query your local text files like a boss.
https://medium.com/media/86a5fafb0635f99959a62d92982069d3/href
In this short tutorial, we will talk about how to set up the RAG MCP server and let Cursor or Claude AI talk to your documents using it.
Model Context Protocol (Advanced AI Agents) Crash Course for Beginners
We would be doing this using a custom MCP server using mcp[‘cli’] & FastMCP.
https://medium.com/media/4d96643b61838767b4a42bb888a7f47b/href
Before We Begin
This guide assumes you:
Know what RAG is (Quick refresher: You augment LLMs with external data like documents or databases so they can answer stuff they weren’t trained on.)
Have Python installed
Know how to run Ollama (your local LLMs engine)
Lets’s get started
- Pip install uv
pip install uv
2. Initialise a new project with uv named ‘rag’
uv init rag
cd rag
3. Add all packages you need for your RAG application. In this case, we will add the following Python packages
uv add 'mcp[cli]' langchain langchain-community langchain-ollama chromadb
4. Create a server.py file with the following code (to support multiple files together)
# server.py
from mcp.server.fastmcp import FastMCP
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain_ollama.llms import OllamaLLM
from langchain_ollama import OllamaEmbeddings
# Create an MCP server
mcp = FastMCP("RAG")
embeddings = OllamaEmbeddings(
model="nomic-embed-text:latest",base_url="http://127.0.0.1:11434"
)
model = OllamaLLM(model="qwen2.5",base_url="http://127.0.0.1:11434")
#RAG 1
loader = TextLoader("dummy.txt")
data = loader.load()
#Document Transformer
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
#Vector DB
docsearch = Chroma.from_documents(texts, embeddings)
#Retriever
qa=RetrievalQA.from_chain_type(llm=model,retriever=docsearch.as_retriever())
#RAG 2
loader = TextLoader("dummy2.txt")
data = loader.load()
#Document Transformer
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
#Vector DB
docsearch = Chroma.from_documents(texts, embeddings)
#Retriever
qa2=RetrievalQA.from_chain_type(llm=model,retriever=docsearch.as_retriever())
#RAG 3
loader = TextLoader("dummy3.txt")
data = loader.load()
#Document Transformer
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
#Vector DB
docsearch = Chroma.from_documents(texts, embeddings)
#Retriever
qa3=RetrievalQA.from_chain_type(llm=model,retriever=docsearch.as_retriever())
@mcp.tool()
def retrieve11(prompt: str) -> str:
"""get information on Zackerkaky"""
return qa.run(prompt)
@mcp.tool()
def retrieve12(prompt: str) -> str:
"""get information on Construction Man"""
return qa2.run(prompt)
@mcp.tool()
def retrieve13(prompt: str) -> str:
"""get information on History"""
return qa3.run(prompt)
if __name__ == "__main__":
mcp.run()
The above code snippet
Uses Ollama (local LLMs). Change the models according to your needs
Creates tools using @mcp.tool()
Adds 3 tools, retrieve11,retrieve12,retrieve13, to the current MCP server. If you wish, you can remove any 2 tools.
Once done, just add this MCP Server to Claude/Cursor mcp config. Make sure to replace paths with the actual location of your project.
{
"mcpServers": {
"RAG": {
"command": "C:\Users\datas\anaconda3\Scripts\uv.exe",
"args": [
"--directory",
"C:\Users\datas\OneDrive\Desktop\rag",
"run",
"server.py"
] }
}
}
5. Restart Claude/Cursor, and the tools should be available on the AI Client.
Below is a tutorial for the Multi-File RAG MCP server using the above code
https://medium.com/media/f9faed14551a53b2bc39763cd3b2a436/href
Hope this is useful and you try create a RAG MCP Server on your own
Conclusion
Congrats! You’ve just wired up your own RAG MCP server and plugged it into Claude or Cursor. This isn’t just a toy — it’s a stepping stone to building AI workflows tailored to your data.
Want to add PDFs? Make the system dynamic? Add voice commands?
This is just the beginning.
RAG MCP Server tutorial was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.