Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel | Towards Data Science

reflects the state of Claude Skills, MCP, and subagents as of February 2026. AI moves fast, so some details may be outdated by the time you read this. The concepts this post focuses on, however, are timeless.

If you’ve been building with LLMs for a while, you’ve probably lived through this loop over and over: you take your time crafting a great prompt that leads to excellent results, and then a few days later you need the same behavior again, so you start prompting from scratch again. After some repetitions you maybe realize the inefficiencies, so you’re going to store the prompt’s template somewhere so that you can retrieve it for later, but even then you need to find your prompt, paste it in, and tweak it for this particular conversation. It’s so tedious.

This is what I call the prompt engineering hamster wheel. And it’s a fundamentally broken workflow.

Claude Skills are Anthropic’s answer to this “reusable prompt” problem, and more. Beyond just saving you from repetitive prompting, they introduce a fundamentally different approach to context management, token economics, and the architecture of AI-powered development workflows.

In this post, I’ll unpack what skills and subagents actually are, how they differ from traditional MCP, and where the skill / MCP / subagent mix is heading.

What are Skills?

At their core, skills are reusable instruction sets that AI Agents, like Claude, can automatically access when they’re relevant to a conversation. You write a skill.md file with some metadata and a body of instructions, drop it into a .claude/skills/ directory, and Claude takes it from there.

Their looks

In its simplest form, a skill is a markdown file with a name, description, and body of instructions, like this:

---

name: 

description: 

---

Their strenghts

The main strength of skills lies in the auto-invocation. When starting a new conversation, the agent only reads each skill’s name and description, to save on tokens. When it determines a skill is relevant, it loads the body. If the body references additional files or folders, the agent reads those too, but only when it decides they are needed. In essence, skills are lazy-loaded context. The agent doesn’t consume the full instruction set upfront. It progressively discloses information to itself, pulling in only what’s needed for the current step.

This progressive disclosure operates across three levels, each with its own context budget:

Metadata (loaded at startup): The skill’s name (max 64 characters) and description (max 1,024 characters). This costs roughly ~100 tokens per skill, negligible overhead even with hundreds of skills registered.
Skill body (loaded on invocation): The full instruction set inside skill.md, up to ~5,000 tokens. This only enters the context window when the agent determines the skill is relevant.
Referenced files (loaded on demand): Additional markdown files, folders, or scripts within the skill directory. There’s practically no limit here, and the agent reads these on demand, only when the instructions reference them and the current task requires it.

Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel | Towards Data Science — Skills load context progressively across three levels, skill summary (metadata), body (detailed instructions), and referenced files (additional context), each triggered only when needed.

Insight: Skills are reusable, lazy-loaded, and auto-invoked instruction sets that use progressive disclosure across three levels: metadata, body, and referenced files. This minimizes the upfront cost by preventing to dump everything into the context window (looking at you, MCP 👀).

The problem in token economics

Cost factors

It’s no secret; an agent’s context window space isn’t free, and filling it has compounding costs. Every token in your context window costs you in three ways:

Actual cost: the obvious one is that you’re paying per token. This can be directly through API usage, or indirectly through usage limits.
Latency: you’re also paying with your time, since more input tokens means slower responses. Something that doesn’t scale well with the length of the context window (~attention mechanism).
Quality: finally, there’s also a degradation in quality due to long context windows. LLMs demonstrably perform worse when their context is cluttered with irrelevant information.

The costly overhead of MCPs

Let’s put this into perspective, through a quick back-of-the-envelope calculation. My go-to MCP picks for programming are:

AWS for infrastructure deployment. Three servers (aws-mcp, aws-official, aws-docs) combined yield a cost of around ~8,500 tokens (13 tools).
Context7 for documentation. Metadata is around ~750 tokens (2 tools).
Figma for bringing design to frontend development. Metadata is around ~500 tokens (2 tools).
GitHub for searching code in other repositories. Metadata is around ~2,000 tokens (26 tools).
Linear for project management. Metadata is around ~3,250 tokens (33 tools).
Serena for code search. Metadata is around ~4,500 tokens (26 tools).
Sentry for error tracking. Metadata is around ~12,500 tokens (22 tools).

That’s a total of roughly ~32,000 tokens of tool metadata, loaded into every single message, whether you’re interacting with the tool or not.

To put a dollar figure on this: Claude Opus 4.6 charges $5 per million input tokens. Those 32K tokens of idle MCP metadata add $0.16 to every message you send. That sounds small, until you realize that even a simple 5-message conversation already adds $0.8 in pure overhead. And most developers don’t send just 5 messages; add some short clarifications and context-gathering questions and you quickly reach 10s if not 100s of messages. Let’s say on average you send 50 messages a day over a 20-day work month, that’s $8/day, ~$160/month* in pure overhead, just for tool descriptions sitting in context. And that’s before you account for the latency and quality impact.

*A small asterisk: most models charge significantly less for cached input tokens (90% discount). An asterisk to this asterisk is that some of them charge extra when enabling caching, and they don’t always enable (API) caching by default (cough Claude cough).

The cost-effective approach of skills

The loading patttern of Skills fundamentally change all three cost factors. At the outset, the agent only sees each skill’s name and a short description, roughly ~100 tokens per skill. Like this, I could register 300 skills and still consume fewer tokens than my MCP setup does. The full instruction body (~5,000 tokens) only loads when the agent decides it’s relevant, and referenced files will only load when the current step needs them.

In practice, a typical conversation might invoke one or two skills while the rest remain invisible to the context window. That’s the key difference: MCP cost scales with the number of registered tools (across all servers), while skills’ cost scales more closely with actual usage.

MCP loads all metadata upfront. Skills load context only when relevant, a difference that compounds with every message.

Insight: MCP is “eager” and loads all tool metadata upfront regardless of whether it’s used. Skills are “lazy” and load context progressively and only when relevant. The difference matters for cost, latency, and output quality.

Wait, that’s misleading? Skills and MCP are two completely different things!

If the above reads like skills are the new and better MCPs, then allow me to correct that framing. The intent was to zoom in on their loading patterns and the impact they have on token consumption. Functionally, they are quite different.

MCP (Model Context Protocol) is an open standard that gives any LLM the ability to interact with external applications. Before MCP, connecting M models to N tools required M * N custom integrations. MCP collapses that to M + N: each model implements the protocol once, each tool exposes it once, and they all interoperate. It’s a simple infrastructural change, but it’s genuinely powerful (no wonder it took the world by storm).

Skills, on the other hand, are somewhat “glorified prompts”, and I mean that in the best possible way. They give an agent expertise and direction on how to approach a task, what conventions to follow, when to use which tool, and how to structure its output. They’re reusable instruction sets fetched on-demand when relevant, nothing more, nothing less.

Insight: MCP gives an agent capabilities (the “what”). Skills give it expertise (the “how”) and thus they’re complementary.

Here’s an example to make this concrete. Say you connect GitHub’s MCP server to your agent. MCP gives the agent the ability to create pull requests, list issues, and search repositories. But it doesn’t tell the agent, for example, how your team structures PRs, that you always include a testing section, that you tag by change type, that you reference the Linear ticket in the title. That’s what a skill does. The MCP provides the tools, the skill provides the playbook.

So, when earlier I showed that skills load context more efficiently than MCP, the real takeaway isn’t “use skills instead of MCP”, it’s that lazy-loading as a pattern works. Hence, it’s worth asking: why can’t MCP tool access be lazy-loaded too? That’s where subagents come in.

Subagents: best of both worlds

Subagents are specialized child agents with their own isolated context window and tools connected. Two properties make them powerful:

Isolated context: A subagent starts with a clean context window, pre-loaded with its own system prompt and only the tools assigned to it. Everything it reads, processes, and generates stays in its own context, the main agent only sees the final result.
Isolated tools: Each subagent can be equipped with its own set of MCP servers and skills. The main agent doesn’t need to know about (or pay for) tools it never directly uses.

Once a subagent finishes its task, its entire context is discarded. The tool metadata, the intermediate reasoning, the API responses: all gone. Only the result flows back to the main agent. This is actually a great thing. Not only do we avoid bloating the main agent’s context with unnecessary tool metadata, we also prevent unnecessary reasoning tokens from polluting the context. As an illustrative example, imagine a subagent that researches a library’s API. It might search across multiple documentation sources, read through dozens of pages, and try several queries before finding the right answer. You still pay for the subagent’s own token usage, but all of that intermediate work, the dead ends, the irrelevant pages, the search queries, gets discarded once the subagent finishes. The key benefit is that none of it compounds into the main agent’s context, so every subsequent message in your conversation stays clean and cheap.

This means you can design your setup so that MCP servers are only accessible through specific subagents, never loaded on the main agent at all. Instead of carrying ~32,000 tokens of tool metadata in every message, the main agent carries nearly zero. When it needs to open a pull request, it spins up a GitHub subagent, creates the PR, and returns the link. Similar to skills being lazy-loaded context, subagents are lazy-loaded workers: the main agent knows what specialists it can call on, and only spins one up when a task demands it.

A practical example

Let’s make this tangible. One workflow I use daily is a “feature branch wrap-up” that automates most of a very tedious part of my development cycle: opening a pull request. Here’s how skills, MCP, and subagents play together.

After the main agent and I finish the coding work, I ask it to wrap up the feature branch. The main agent doesn’t handle this itself; it delegates the entire PR workflow to a dedicated subagent. This subagent is equipped with the GitHub MCP server and a change-report skill that defines how my team structures PRs. Its skill.md looks roughly like this:

---
name: change-report
description: Use when generating a change report for a PR.
   Defines the team's PR structure, categorization rules, and formatting
   conventions.
---

1. Make sure there are no staging changes left, otherwise report back to 
   the main agent
2. Run `git diff dev...HEAD --stat` and `git log dev..HEAD --oneline`
   to gather all changes on this feature branch.
3. Analyze the diff and categorize the most crucial changes by their type
   (new features, refactors, bug fixes, or config changes).
4. Generate a structured change report following the template
   in `pr-template.md`.
5. Open the PR via GitHub MCP, populating the title and body from
   the generated report.
6. Answer with the PR link.

The pr-template.md file in the same directory defines my team’s PR structure: sections for summary, changes breakdown, and testing notes. This is level 3 of progressive disclosure: the subagent only reads it when step 4 tells it to.

Here’s what makes this setup work. The skill provides the expertise on how my team reports on changes, the GitHub MCP provides the capability to actually create the PR, and the subagent provides the context boundary to perform all of this work. The main agent, on the other hand, only calls the subagent, waits for it to complete, and gets either a confirmation back or a message of what went wrong.

The PR workflow in action: the main agent delegates the entire PR process to a subagent equipped with a change-report skill and GitHub MCP access.

Insight: skills, MCPs, and subagents work in harmony. The skill provides expertise and instruction, MCP provides the capability, the subagent provides the context boundary (keeping the main agent’s context clean).

The bigger picture

In the early days of LLMs, the race was about better models: fewer hallucinations, sharper reasoning, more creative output. That race hasn’t stopped completely, but the center of gravity has certainly shifted. MCP and Claude Code were genuinely revolutionary. Upgrading Claude Sonnet from 3.5 to 3.7 honestly was not. The incremental model improvements we’re getting today matter far less than the infrastructure we build around them. Skills, subagents, and multi-agent orchestration are all part of this shift: from “how do we make the model smarter” to “how do we get the most value out of what’s already here”.

Insight: the value in AI development has shifted from better models to better infrastructure. Skills, subagents, and multi-agent orchestration aren’t just developer experience improvements; they’re the architecture that makes agentic AI economically and operationally viable at scale.

Where we are today

Skills solve the prompt engineering hamster wheel by turning your best prompts into reusable, auto-invoked instruction sets. Subagents solve the context bloat problem by isolating tool access and intermediate reasoning into dedicated workers. Together, they make it possible to codify your expertise once and have it automatically applied across every future interaction. This is what engineering teams following the state-of-the-practice already do with documentation, style guides, and runbooks. Skills and subagents just make those artifacts machine-readable.

The subagent pattern is also unlocking multi-agent parallelism. Instead of one agent working through tasks sequentially, you can spin up multiple subagents concurrently, have them work independently, and collect their results. Anthropic’s own multi-agent research system already does this: Claude Opus 4.6 orchestrates while Claude Sonnet 4.6 subagents execute in parallel. This naturally leads to heterogeneous model routing, where an expensive frontier model orchestrates and plans, while smaller, cheaper models handle execution. The orchestrator reasons, the workers execute. This can dramatically reduce costs while maintaining output quality.

There’s an important caveat here. Where parallelism works well for read tasks, it gets much harder for write tasks that touch shared state. Say, for example, you’re spinning up a backend and a frontend subagent in parallel. The backend agent refactors an API endpoint, while the frontend agent, working from a snapshot taken before that change, generates code that calls the old endpoint. Neither agent is wrong in isolation, but together they produce an inconsistent result. This is a classic concurrency problem, coming from the AI workflows of the near-future, which to date remains an open problem.

Where it’s heading

I expect skill composition to become more sophisticated. Today, skills are relatively flat: a markdown file with optional references. But the architecture naturally supports layered skills that reference other skills, creating something like an inheritance hierarchy of expertise. Think a base “code review” skill extended by language-specific variants, further extended by team-specific conventions.

Most multi-agent systems today are strictly hierarchical: a main agent delegates to a subagent, the subagent finishes, and control returns. There’s currently not much peer-to-peer collaboration between subagents yet. Anthropic’s recently launched “agent teams” feature for Opus 4.6 is an early step towards this, allowing multiple agents to coordinate directly rather than routing everything through an orchestrator. On the protocol side, Google’s A2A (Agent-to-Agent Protocol) could standardize this pattern across providers; where MCP handles agent-to-tool communication, A2A would handle agent-to-agent communication. That said, A2A’s adoption has been slow compared to MCP’s explosive growth. One to watch, not one to bet on yet.

Agents will become the new functions

There’s a broader abstraction emerging here that’s worth stepping back to appreciate. Andrej Karpathy’s famous tweet “The hottest new programming language is English” captured something real about how we interact with LLMs. But skills and subagents take this abstraction one level further: agents are becoming the new functions.

A subagent is a self-contained unit of work: it takes an input (a task description), has its own internal state (context window), uses specific tools (MCP servers), follows specific instructions (skills), and returns an output. It can be called from multiple places, it’s reusable, and it’s composable. That’s a function. The main agent becomes the execution thread: orchestrating, branching, delegating, and synthesizing results from specialized workers.

Aside from the analogy, it can have the same practical implications that functions had for software engineering. Isolation limits the blast radius when an agent fails, rather than corrupting the entire system, and failures can be caught through try-except mechanisms. Specialization means each agent can be optimized for its specific task. Composability means you can build increasingly complex workflows from simple, testable parts. And observability follows naturally; since each agent is a discrete unit with clear inputs and outputs, tracing “why did the system do X” becomes inspecting a call stack rather than staring at a 200K-token context dump.

A subagent maps directly to a function: input, internal state, tools, instructions, and output. The main agent is the execution thread.

Conclusion

Skills look like simple “reusable prompts” on the surface, but they actually represent a thoughtful answer to some of the hardest problems in AI tooling: context management, token efficiency, and the gap between raw capability and domain expertise.

If you haven’t experimented with skills yet, start small. Pick your most-repeated prompting pattern, extract it into a skill.md, and see how it changes your workflow. Once that clicks, take the next step: identify which MCP tools don’t need to live on your main agent, or which subprocesses require a lot of reasoning that’s used after you find the answer, and scope them to dedicated subagents instead. You’ll be surprised how much cleaner your setup becomes when each agent only carries what it actually needs.

Key insights from this post

Skills are reusable, lazy-loaded, and auto-invoked instruction sets that use progressive disclosure across three levels: metadata, body, and referenced files. This minimizes the upfront cost by preventing to dump everything into the context window (looking at you, MCP 👀).
MCP is “eager” and loads all tool metadata upfront regardless of whether it’s used. Skills are “lazy” and load context progressively and only when relevant. The difference matters for cost, latency, and output quality.
MCP gives an agent capabilities (the “what”). Skills give it expertise (the “how”) and thus they’re complementary.
Skills, MCPs, and subagents work in harmony. The skill provides expertise and instruction, MCP provides the capability, the subagent provides the context boundary (keeping the main agent’s context clean).
The value in AI development has shifted from better models to better infrastructure. Skills, subagents, and multi-agent orchestration aren’t just developer experience improvements; they’re the architecture that makes agentic AI economically and operationally viable at scale.

Final insight: The prompt engineering hamster wheel is optional. It’s time to step off.

Found this useful? Follow me on LinkedIn, TDS, or Medium to see my next explorations!

All images shown in this article were created by myself, the author.