AI Without the Handover: Managing Research with Model Context Protocol

The obvious solution to research overload seemed to be: let an AI help. Upload everything to ChatGPT, ask it questions, get answers.

Except that felt wrong. And honestly, it simply won’t work. My research notes aren’t meant for OpenAI’s servers. Neither are my half-formed thoughts in Obsidian or the collection of bookmarked web articles I’ve been meaning to read.

I wanted the help without the handover.

Enter the Model Context Protocol

The Model Context Protocol (MCP) is a new standard that allows AI models to connect directly to your apps and data. Instead of uploading files or copy-pasting content into a chat window, the AI acts as a client that simply “asks” your Zotero library—or your notes, or your saved articles—for information only when it needs it.

It serves as a standardized interface to give AI context without sending your entire digital life to the cloud.

First Attempt: Entirely Offline

I started with OpenAI’s open-source gpt-oss language model in the LM Studio desktop app. The promise was appealing: chat with my Zotero library completely offline, no internet required. And it worked—sort of.

The problem was the context window. MCP loads retrieved information into the AI’s working memory, and with smaller, more efficient local models, that memory fills up fast. Imagine trying to have a conversation while someone reads you the contents of an entire filing cabinet. The model simply couldn’t keep up.

It only worked properly with the larger, resource-hungry models, which rather defeated the point of running things locally on my hardware.

The Pivot to Claude Desktop

I switched to using Claude Desktop, which handles MCP more gracefully. It is important to note this isn’t entirely offline—Claude itself runs in the cloud—but the data flow is strictly controlled. My files stay on my machine unless the AI explicitly requests specific snippets to answer a query.

I continued to use a community-built Zotero MCP server (created by 54yyyu on GitHub) that gives Claude (and large language models from other providers) several ways to interact with my library:

Basic Search: Lets Claude find papers by title, author, keywords, or tags—essentially an automated version of Zotero’s built-in search.
Metadata Retrieval: Pulls detailed information about any item: full citations, abstracts and publication details. It can even export BibTeX citations directly.
Annotation Extraction: This is particularly clever. It can pull my highlights and notes directly from PDFs, even ones that Zotero hasn’t fully indexed yet. I can ask, “What did I highlight in that assessment paper?” and get actual quotes.

But the real power lies in semantic search.

How Semantic Search Actually Works

Traditional keyword search is literal. If I search for “formative assessment,” I’ll only find papers that use those exact words. But what if a paper discusses the same concept using different terminology—like “ongoing feedback” or “learning-oriented evaluation”?

Semantic search solves this by understanding concepts, not just matching string characters.

Here is the workflow:

Embedding: When you first set it up, the Zotero MCP server reads through your library—titles, abstracts, notes, and optionally full text. It creates a vector embedding for each item. Think of this as translating your research into a conceptual map, where similar ideas end up close together in a mathematical space.
Storage: These embeddings are stored in a local vector database (this server uses ChromaDB). The database doesn’t contain your actual papers; it contains numerical representations of their meaning.
Retrieval: When you search, your query is converted into the same kind of vector. The system finds items whose vectors are “close” to your query vector.

You can choose different embedding models:

Default (all-MiniLM-L6-v2): Runs entirely locally. Efficient and works well for most cases. This is what I use.
OpenAI / Gemini Models: Offer higher fidelity understanding but require an API key and incur costs.

Because the database sits locally, updates happen on your terms—either manually or on a schedule.

Under the Hood: From Discovery to Action

What makes this powerful for research is conceptual matching. I can paste a paragraph from a book and ask, “Find papers similar to this argument.”

Here is what a search actually looks like technically. If I ask: “Find research about generative AI’s positive impact on teaching and learning in higher education,” the system sends a structured request:

JSON

{
  "limit": 15,
  "query": "generative AI artificial intelligence positive outcomes impact higher education teaching learning"
}

The MCP server returns results ranked by a Similarity Score (a number between 0 and 1 indicating conceptual closeness):

Markdown

# Semantic Search Results
Found 15 similar items:

## 1. 2023 EDUCAUSE Horizon Action Plan: Generative AI
**Similarity Score:** 0.466
**Type:** webpage
**Item Key:** 49TMAJID
**matched_content:** This report describes an ideal future for generative AI...

## 2. Students' voices on generative AI: Perceptions, benefits, and challenges
**Similarity Score:** 0.458
**Type:** journalArticle
**Item Key:** WHBZLTNQ
...

The “Agentic” Behavior

What happens next is where the magic is. The AI doesn’t just dump a list; it uses this data to decide which sources to investigate further. It might:

Fetch the full text of the top three results to extract arguments.
Pull reference lists to identify influential cited works.
Retrieve my specific annotations to see what I found important previously.

It is not just retrieving information—it is navigating my research library the way I would, but exponentially faster. In other words: it becomes a new way of co- and re-exploring what I read, thought through but might have forgotten or have not yet connected the dots myself.

Building My Own Servers

Curiosity got the better of me. If someone built an MCP server for Zotero, could I build one for my other tools?

Using FastMCP, a Python framework that abstracts much of the complexity, I managed to create two working servers despite my limited coding knowledge:

For Obsidian: Now Claude can search through my personal knowledge base, surfacing connections between notes and helping me spot patterns I’d missed. It can even write new notes, such as handy overviews or summaries.
For Readeck: Instead of digging through hundreds of bookmarks, I can ask Claude to identify themes across my saved articles.

Both run locally. Both keep my data mainly on my machine.

The Reality Check

The setup isn’t perfect. Getting MCP servers running currently requires editing configuration files and running terminal commands. If that makes you nervous, this ecosystem might not be ready for you yet.

You also need decent hardware. While the semantic search runs efficiently, managing the local database and running the environment requires a capable machine (my desktop handles it, but a lightweight laptop might struggle).

There is also a learning curve for prompting. Asking “Tell me everything relevant” will still overwhelm the context window. You learn to be specific, asking the AI to act as a librarian rather than an oracle. Also, the llm context window can get overwhelmed when you use too many MCP servers that contain too many capabilities it can choose from.

The Privacy Trade-off

Here is what I value about this setup: Control.

My research notes don’t live on a third-party server. I’m not paying a subscription for a “PDF Chat” wrapper that might disappear. If Anthropic shuts down tomorrow, my MCP servers still work with any other AI that supports the protocol.

I am not dogmatic about it—I use the cloud-based Claude because the inference is currently superior to fully local models. However, I disable the setting that allows Anthropic to train on my data. It’s about finding the right balance between the convenience of a smart model and the control of local data storage.

Where This is Heading

MCP is young. The specification changes, servers break, and documentation is still growing. But the core philosophy is correct: bring the model to the data, not the data to the model.

Frameworks like FastMCP are democratizing these connections, allowing non-developers to build custom workflows. I’ve been experimenting with AI tools for years, and this one has staying power. Not because it’s flashy, but because it solves a real problem in a way that respects how researchers actually work.

If you’re protective of your data but curious about AI assistance, MCP is worth watching.