Mastering LLM Effectiveness in Development
I’m always looking for ways to get more done in less time, and with the rise of AI, this is finally bringing real benefits to my development process. However, many developers don’t see - or don’t want to see - the opportunities opening up for us, and continue working the old-fashioned way, inadvertently putting themselves at risk of falling behind in the competition.

As an engineering manager, I interact with developers from all around the world, and I witness their vastly different attitudes towards AI. Generally, I break it down into these categories:
- Inspired and actively using AI in their work (up to 20%);
- Interested, experimenting, but using AI for roughly 20% of tasks (about 60%);
- Skeptics who either tried it and weren’t impressed, or never tried it at all (around 20%). The numbers are quite subjective and mostly reflect my own experience rather than being representative across the entire industry.
There are studies showing that AI penetration in development is about 60%, which might actually be true. But for now, this figure mostly reflects the portion of developers who use ChatGPT, Perplexity, and similar AI-powered tools superficially, without deep practical integration. In practice, LLMs are mostly used at a surface level, even though their potential is enormous.
About My Experience
Over the past year, I’ve experimented with various editors and AI-powered IDE plugins, but I currently rely on Cursor AI, Github Copilot, and JetBrains Junie as my daily drivers for development.
I haven’t studied the internal workings of LLMs in depth or read up on their architecture, but by constantly switching between different tools, I’ve started to notice certain patterns. This eventually led me to conclusions that have made my use of LLMs in indie development much more effective (and I see no reason why this wouldn’t also apply in a company setting - more on that in future posts).
My takeaways might not be rigorously accurate from a Data Science perspective (I welcome any feedback!), but this is how things work in simplified terms. I hope this will be useful to a wide range of developers and help clarify why models sometimes glitch, how they work, and what you can do about it.
What Developers Complain About
In this section, I’ll break down a few main issues developers raise, and explain why these problems occur.
I’ll only cover fundamental principles and limitations, those that aren’t going anywhere for the foreseeable future. For now, we are witnessing various clever workarounds emerge - and we’re using them ourselves.
Problem #1: LLMs hallucinate
I liked this quote: “LLMs hallucinate 100% of the time, but 80% of the time they’re right.” Every LLM I know always tries to provide you with an answer - even if it’s wrong. Fundamentally, this is hard to solve. However, it’s important to realize that hallucinations can stem from several causes, and some of those can be tackled individually. In this section I’ll discuss one of these causes.
Starting with theory: Every model has a limited context window - the amount of text the LLM can consider or “remember” at one time when generating a reply. The context window is measured in tokens, but a token isn’t always the same length. As a rough estimate, consider 1 English token to be about 4 characters, or about 75 English words for every 100 tokens.
Context window sizes for various models:
- gpt-3 – 2,048 tokens (1,500 English words)
- gpt-4o – 128,000 tokens (~96,000 English words)
- gpt-4.1 – 1,000,000 tokens (~750,000 English words).
You can calculate token counts for your text at https://platform.openai.com/tokenizer. A key point: context size doesn’t directly determine how “smart” the model is. But having a larger context window is clearly beneficial.
When you chat in ChatGPT, there’s a limit for each conversation. When that’s exceeded, the model starts to forget (or “evict”) facts from the conversation and hallucinate.
The creators of popular models use a certain workaround for this limit. Instead of passing the entire conversation history with each new message to the model, they generate a short summary of key facts after each message and use that as the context for future prompts. This lets conversations go longer, but some facts do get lost. Summaries are, of course, made by the same models, which have the same constraints. This hack was vital in the early days of LLMs. Now with larger context windows, the trick remains - now it saves computation resources (the bigger the context, the more resources needed to process and generate an answer).
For development, the same principle applies. However, I suspect Cursor and other editors do NOT summarize the code you share with them, to preserve response accuracy. As a result, your context window fills up even faster.
Problem #2: LLMs use outdated information
Every model is trained on specific data sets, which might not include, for example, the latest version of your favorite framework - or may not even have heard of it! For less popular technologies, there’s scant information online, so the training set is poorer, which also leads to hallucinations, since models always try to provide some answer.
General-purpose models (GPT, Claude, Gemini, etc.) are very popular, but so are specialized ones trained on domain-specific data. For instance, code-specific models like OpenAI Codex or medical models. Since these are trained on relevant materials, their results often stand out. Still, their knowledge is limited by the training data.
Imagine being marooned on a desert island for 20 years - the world moves on, but your knowledge is frozen at some moment in time. LLMs are the same. They need regular re-training on fresh data to stay relevant.
Another crucial point: models don’t have access to the internet by default. However, all the major AI players have added the ability to search the web and use those results in your conversation. This way, you get up-to-date info even if the model was trained on old data. How exactly this works is hard to say, but one method is building an agent on top of the MCP protocol. Without going into detail: think of MCP as a standard for models to interact with external services, learn their available functions, and use them.
Problem #3: You have to explain everything all over again every time
Out of the box, the model doesn’t have memory. It can’t remember what you sent it. But the big players have emulated “memory” in their official clients. For example, while you chat, ChatGPT tracks facts about you that are worth saving, and later uses those facts as part of your chat’s context. This creates the illusion that the model “knows” something about you. Memory can also be implemented via the previously mentioned MCP or RAG.
So if you’re tired of repeating yourself in development, you’ll need to find a way to emulate “memory” for the model. And there are solutions - more on that below.
How to Solve These Issues in Cursor AI
These problems and workarounds might seem obvious, and I’m sure many of you have already encountered them. Still, since I have to explain these things over and over again, I want to fill that gap here.
Let’s use Cursor AI as a concrete example of solving these problems in development.
1. Context Window Limitations
All resources are finite, and LLMs are no different - there’s no magic here.
Here are some practices that will help you encounter hallucinations less often due to context window size:
Start New Chats as Often as Possible
Try to stick to “one problem/task/question = one chat.” If the chat is getting large, ask for a summary to start a new thread and paste it in. Also, think of times when, in conversation, you suddenly changed topics and your partner gave a completely off-topic reply. Models do the same when you mix topics - start a new chat for unrelated issues.
Only Pass Necessary Context to the Chat
You can select your whole project as context and sometimes get good results - usually only for small projects. For medium and large ones, you’ll have to be selective.
Switch to a Model with a Larger Context Window
Sometimes you need to load more context. In that case, try switching to a model with a bigger context window. But that means you’ll need to get familiar with the models available in Cursor. Here you can find descriptions of OpenAI’s models.
Limit Project Indexing
Cursor has a .cursorignore
mechanism. This lets it ignore and not index certain parts of your project, shrinking the scope and making searches faster. This is especially useful for monorepos and large projects. More details here.
2. Accessing Current Information
It’s annoying when you’re already using React 19 but Cursor only gives suggestions for version 18, because that’s what it was trained on.
Cursor’s developers suggest several ways to solve this:
Use the @docs Tag
When you use the @docs <url>
tag and give it a link to current documentation, Cursor will index those pages. When that’s done, it’ll be able to help you with the new framework features and APIs.
This works just as well for internal (private) documentation - including, say, your internal Swagger specs.
Use the @web Tag
To search for other info on the web from right inside your IDE, use the @web query
tag. Sure, you could Google it yourself, but this lets you work directly with those results in the same chat - pulling in StackOverflow answers, for example.
Use MCP
Cursor lets you connect any MCP, which can then be used in your chats as needed. One example: MCP context7, which helps you pull in up-to-date docs.
3. Teaching Cursor to Remember
What’s more fun than explaining to Cursor every time how your files and methods are organized? Or how to create a new API in your project? Or whether to use external libraries versus the standard library? Or the microservice architecture and what the project’s main goal is?
Refer to Old Chats
Cursor has recently added the ability to reference your old chats. Use them as in, “do it the same way as in that chat,” or as a continuation of a related task.
Refer to Files
Not obvious, but you can also ask: “Analyze this file/folder and do it the same way.”
Cursor Rules
Cursor and other IDEs already have a solution for persistent “memory”: cursor rules.
This is a file (or set of files, in Cursor’s case) that is automatically (or manually) passed along as part of your context. With documentation in cursor rules
, you won’t have to repeat yourself, and Cursor will follow your standards.
Yes, this is just documentation - and yes, you still have to write it. Cursor, like a junior developer, needs guidance and explanations. Some agents (like JetBrains Junie) are making strides in analyzing existing code to try to “do the same thing.” That’s handy, but not always effective, especially with legacy code.
The good news: you can always ask Cursor to generate and save rules based on your chat (just ask directly). In fact, they’ve recently added a /Generate Cursor Rules
command that does exactly that. Sure, you’ll want to review and maybe ask for tweaks, but it dramatically reduces the effort of writing specs.
As a bonus: newcomers to your team can read your cursor rules
as documentation and standards.
In Conclusion
To wrap up, I invite you to think about and answer the following questions:
- How many tokens are in this article?
- How might Cursor behave when working with a 10,000-line file?
- What should you do to refactor efficiently using Cursor?
- How can you have Cursor automatically run linters or builds at the end of each chat and fix errors?
Effective use is entirely possible if you understand the limitations of LLMs.