
Key learnings about LLMs:
- They come in all shapes and sizes. Lightweight models are cost-effective and energy-efficient but less capable than the giant, power-hungry models requiring massive hardware. Choosing the right model involves striking a balance between environmental impact, cost and output quality.
- They have limited context windows, meaning they can only handle a certain amount of text for input and output at one time.
- They often use Retrieval Augmented Generation (RAG). Instead of processing an entire document at once, an application using RAG splits it into smaller chunks. Based on your prompt, it retrieves relevant chunks to formulate an answer. This can lead to forgotten or unselected data, resulting in incomplete answers.
- They don’t work ‘line-by-line’. Instead, they view the entire context window at once, using an “attention” mechanism—first introduced in the seminal “Attention Is All You Need” paper?—to weigh the importance of different words. A significant drawback of this non-linear approach is a vulnerability to errors over long inputs. While the model’s output may seem perfectly fine when processing the initial entries of a dataset, its quality can degrade sharply as it reaches items further down the list.
- They often default to simple code for data processing. When given open-ended answers from a survey, a standard chatbot often uses code libraries for basic quantitative analysis (like word counts) rather than leveraging its language capabilities for true qualitative insight. Need simple word clouds or charts, independent of any contextual meaning? No problem at all.
- They lack traceable insights. As a black box, a neural network struggles to accurately explain how it reached a conclusion. It likely can’t tell you how many times a specific theme appears in your data, or it would blatantly lie. While it can provide a useful summary, the lack of a verifiable process makes it unreliable for deeper analysis.
In short, while LLMs are powerful, general-purpose chatbots like ChatGPT, Gemini and Copilot are inadequate at this time for exploring qualitative data analysis with AI. Furthermore, using LLMs directly instead of through general chatbots greatly improves the security of the data you send to it. It won’t be used for training, nor will it be seen by individuals who are not supposed to. So it is not recommended to give your entire dataset to ChatGPT and others but use their underlying motors, the LLMs, instead.
What I built
LLMs do excel at developing code for web applications. I don’t consider myself a developer. My coding knowledge is just enough to leverage generative AI to quickly build my own applications. So, what did I build? Last year, we added two GenAI-related questions to the annual university student questionnaire. The first question gauged students’ exposure to GenAI in their curriculum. The second asked students to estimate how many assignments they could pass using GenAI with minimal personal input. While the multiple-choice results were eye-opening, the detailed open-ended answers held the most value. With over 1600 open answers, an AI use started to pique my interest.
My local web app processes a CSV (or Excel sheet) of these open answers. It uses large chunks of the data to formulate and refine a codebook (a set of codes and themes), then applies it to each answer individually. The app creates a traceable record, adding the LLM’s reasoning for each code applied. This prepares the data for further analysis. The app can use major commercial models or run smaller models locally on my desktop. The local route is slower and less accurate, but it offers independence from big tech and can run on my home’s solar power—a satisfying bonus.
The results, while not perfect and potentially different from human analysis, are nevertheless usable. However, this tool is unsuitable for providing student feedback—that’s a whole different beast and too much to dive into in this post. This is something we attempt to guide our teachers on, though: how, when and if AI is meaningful for feedback in education.
My takeaway is this: don’t settle for the limitations of off-the-shelf applications. Using generative AI to build your own custom tools, which in turn utilise AI models, is a remarkably effective way to learn about the technology and create powerful solutions.



For those interested in my methods, here is what I now tend to use whenever an idea for a tool arises:
- Claude or Websim to design a nice interface for the web app or for quick iterations (free version)
- Google Ai Studio > ‘Build apps’ to actually develop the app (free)
- Visual Studio Code & Github Copilot for tweaking and debugging (free tier for education)
- Ollama for running local LLMs (‘free’ if you have beefy hardware already)
- OpenAI and Gemini API’s to use their LLMs directly (not free, pay per usage)