// AI Lessons

Building AI Skills

If you built custom GPTs, you already know how to build skills — here's the SKILL.md file that turns one-off chatbots into portable, agent-ready capabilities.

A few months ago, Boston Consulting Group quietly disclosed a number that should have rearranged many AI strategy decks: 36,000 custom GPTs built for 32,000 consultants. More custom GPTs than employees. At the time, most of the takes online were about the number — isn't that overkill? My read was different. BCG wasn't overbuilding chatbots. They were running the largest prompt-engineering scrimmage in the industry, and what they were really doing was teaching a generation of consultants how to package three things together: instructions, knowledge, and a refined output shape. That's the recipe for a custom GPT. It's also the recipe for a skill. If you've built a custom GPT, you already know how to build a skill — you just don't know the file format yet.

Your custom GPTs were rehearsal. Skills are the show.

// The Takeaway: A skill is a folder with a SKILL.md file — name, description, instructions, and optional bundled files. Anthropic shipped the format in October 2025 and released it as an open standard in December 2025. It now works in Claude.ai, Claude Code, OpenAI Codex, Cursor, Gemini CLI, VS Code Copilot, and Windsurf. Build it once, use it in your chatbot today, hand it to an agent tomorrow. Add an evals.md so it grades its own work and a memory.md so it remembers what you taught it, and you've got a self-improving capability — not a saved prompt with a fancy name.

See the open spec at agentskills.io → Free. Open standard. Works across vendors.

BCG's number wasn't a vanity stat. It was a leading indicator. The firm has publicly claimed the title of the world's #1 creator of custom GPTs, and they've built an internal three-layer assembly line for it — private firm data at the bottom, consultants prototyping on live cases in the middle, a central R&D team hardening the winners and shipping them firm-wide. McKinsey is doing the same shape of thing with a stated goal of one AI agent per employee — roughly 45,000. When two of the most disciplined operations on the planet both decide the unit of AI productivity is one packaged capability per person, the rest of us should pay attention.

// The real shift: Custom GPTs taught us that the useful unit of AI work isn't the model — it's the packaged capability: a name, a clear job description, the right knowledge, and a consistent output shape. Skills are that same idea, finally written down as an open file format. The 2023 game was prompt engineering. The 2025 game was custom GPTs. The 2026 game is portable, agent-ready skills that travel between your chatbot and your agents without requiring any rewriting.

What a Skill Actually Is

A skill is a folder. Inside the folder is a single required file — SKILL.md — and any optional supporting files the skill needs (reference docs, templates, scripts, examples). That's it. No SaaS. No vendor lock-in. No platform-specific JSON. Just a folder a model can read.

Here's what one looks like on disk:

my-skill/
├── SKILL.md            ← required. name + description + instructions.
├── examples/           ← optional. reference outputs the model can load on demand.
│   ├── example-1.md
│   └── example-2.md
├── references/         ← optional. long-form docs (voice guides, style rules, schemas).
│   └── voice-rules.md
├── templates/          ← optional. fill-in-the-blank scaffolds.
│   └── section.md
└── scripts/            ← optional. small scripts the skill can run (Python, shell, etc.).
    └── format.py

Only SKILL.md is required. Everything else is loaded on demand when the body references it by relative path.

Anthropic introduced the format in October 2025 alongside the launch of Agent Skills in Claude, then released the spec as an open standard in December 2025. The spec lives at agentskills.io. Today the same SKILL.md file runs in Claude.ai, Claude Code, the Claude API, OpenAI Codex, Cursor, Gemini CLI, Antigravity, VS Code Copilot, and Windsurf — with minimal or no tweaks. That portability is the whole point. Your custom GPT is trapped in ChatGPT. Your skill is not trapped anywhere.

Three levels of detail load progressively, which is the design choice that makes skills cheap to keep around:

Level 1 — Metadata. The name and description in the YAML frontmatter. Always in the model's context. This is what the model reads to decide whether to use the skill.
Level 2 — The SKILL.md body. Loaded only when the skill is triggered. Contains the actual instructions, workflow, and examples.
Level 3 — Bundled files. Loaded only when the SKILL.md body references them. Long reference docs, code snippets, templates, datasets. They cost almost nothing until they're needed.

The practical result: you can keep dozens of skills available without burning context window. The model only pays the token cost for the one it's actually using.

Anatomy of a SKILL.md File

Here's the minimum viable skill, end to end:

---
name: ai-tangle-section-draft
description: Drafts a single AI Tangle newsletter story section from a URL or short summary. Use when the user wants to turn one news item into a publication-ready Tangle section with a headline, a two-paragraph body, a why-it-matters line, and a source link, in the AIE house voice.
---

# AI Tangle Section Draft

Draft one AI Tangle story section from a single source.

## When to use this skill

Trigger when the user provides a URL or summary of a single AI news item and asks for a Tangle-style write-up. Do not use for full editions, multi-story roundups, or non-news commentary.

## Output structure

1. **Headline** — 6–10 words, declarative, no clickbait.
2. **Body** — two short paragraphs, conversational prose, no bullets.
3. **Why it matters** — one sentence, leads with the implication for enterprise AI buyers.
4. **Source** — single inline link with publication name.

## Voice rules

- Never use *delve, landscape, unleash, paradigm, game-changer*.
- Start with the story, not with throat-clearing.
- Default to prose, not bullets.
- Refer to the audience as "operators" or "buyers," never "users."

## Examples

See `examples/tangle-section-examples.md` for three reference sections written in the target voice.

Four parts are doing all the work here.

YAML frontmatter (name + description). This is the most important part of the file, and the part most people undercook. The model reads only the description when it's deciding whether to fire the skill. A vague description ("Helps with newsletters") won't trigger reliably. A specific description with concrete trigger conditions ("Use when the user provides a URL or short summary of a single AI news item and asks for a Tangle-style write-up") will. Treat the description as a routing decision, not a label.

The body. Plain Markdown. Tell the model what the skill does, when to use it, what the output should look like, what rules to follow, and what not to do. Short and concrete beats long and abstract. Skills that read like a checklist outperform skills that read like a manifesto.

Bundled files. Anything the skill needs but doesn't need to keep in active context — examples/, references/, templates/, scripts/. Reference them by relative path from inside SKILL.md. The model fetches them on demand.

Optional fields. Some platforms support allowed-tools (restrict which tools the skill can call), version, and other extensions. Start without them. Add when you need them.

Three ways to build one

1. By hand. Make a folder. Create SKILL.md. Write the frontmatter. Write the instructions. Drop it in the correct directory for your platform (e.g., ~/.claude/skills/ for Claude Code, the Skills tab in Claude.ai, or your project's .skills/ folder for Codex). Done. This is the fastest path once you've seen a few examples — start with the Anthropic skills repo and copy the shape of the closest match.

2. With the Skill Creator skill (the meta move). Anthropic ships a Skill Creator skill whose only job is building, editing, and benchmarking other skills. Turn it on, open a fresh chat, say "use skill-creator to build me a skill that drafts AI Tangle story sections," and answer its questions. It will generate SKILL.md, suggest a folder structure, and — in the v2 release — run evals against a baseline to tell you whether your description is actually triggering correctly. Yes, it is a skill that writes skills. The meta-ness is the point: the same format describes the work. (You can just type /skill creator at the end of a successful chat to codify this into a reproducible output.

3. By converting a custom GPT you already have. This is where BCG's 36,000 prototypes pay off for the rest of us. A custom GPT is already the three ingredients of a skill in disguise:

The GPT's Instructions → the body of your SKILL.md.
The GPT's Knowledge files → bundled reference files in the skill folder.
The GPT's conversation starters and example outputs → the Examples section and trigger language in your description.

Open a custom GPT you actually use. Copy the instructions into a new SKILL.md. Add a tight description that names the trigger condition. Drop the knowledge files alongside as references/. You just shipped a skill. Anything you built as a custom GPT in the last 18 months is a candidate.

Make the Skill Grade Its Own Work (and Remember What It Learned)

A skill that ships and never improves is just a saved prompt with better packaging. The version of the format that earns its place in your workflow has two more files in the folder: evals.md and memory.md. Peter Yang walked through this exact loop in a recent tutorial and the pattern is worth stealing wholesale.

Eval loop — the skill checks its own output. Drop an evals.md next to SKILL.md with 8–12 pass/fail checks grouped by what actually matters for that output. For the AI Tangle section skill above, that looks like: hook lands in the first sentence (pass/fail), no banned voice words (pass/fail), "why it matters" leads with the buyer implication (pass/fail), source link present and inline (pass/fail). Pass/fail beats a 1–5 score because the model can't reliably tell a 3 from a 4 — but it can tell a yes from a no all day. Then add one line to SKILL.md:

❝

When you run the evals, spin up a separate agent with a clean context window. If any check fails, send the draft back to revise. Loop until every check passes.

The clean context window is the whole trick. Agent A writes. Agent B grades without seeing A's reasoning. If B fails it, A revises. They go back and forth until B passes everything. Kick it off, get coffee, come back to a draft that already cleared the bar you set. You still edit the final pass — but you're editing a clean draft, not a first draft.

Memory file — the skill learns between runs. Evals catch the things you can write a rule for. memory.md catches the things you can't. Add a third file in the folder and tell the skill in SKILL.md:

❝

After each use, append a one-line note to memory.md with anything the user corrected that wasn't already covered by evals. Keep it reverse-chronological and concise.

This is where "make the voice more authentic," "stop opening with throat-clearing," and "never use the word delve" get logged. After a few weeks, the memory file becomes the most valuable artifact in the folder — a written record of your taste, in a format the next agent can read without you in the room.

A skill that edits your skills. Once you have three or four skills running, build one more: a skill-editor skill whose only job is to read another skill's folder and strip slop, dedupe instructions, tighten the description, and flag eval gaps. Run it on every skill once a month. Yang ships his alongside a no-ai-slop skill that hunts em-dashes and "X, not Y" phrasing — adapt the banned-words list to your own voice file and you have a self-cleaning library.

The folder for a mature skill ends up looking like this:

ai-tangle-section-draft/

├── SKILL.md
├── evals.md
├── memory.md
├── examples/
│   └── tangle-section-examples.md
└── references/
    └── voice-rules.md

Six files. One folder. Same open format. The skill now writes, grades itself, remembers what you taught it, and gets cleaned up by another skill on a schedule. That's the difference between a saved prompt and a capability you can actually hand to an agent.

From Chatbot Today to Agent Tomorrow

The payoff isn't just portability across vendors. It's portability across modes of use. The same SKILL.md file does two jobs without modification:

In a chatbot. You (or a teammate) open Claude, ChatGPT with Codex, Cursor — pick your interface — and the skill loads on demand when its description matches what you're asking for. Same workflow you have today with custom GPTs, just not stuck inside one vendor's walled garden.
In an agent. Hand the same folder to an autonomous agent — a Claude Code agent running overnight, a Codex workspace agent on a schedule, a custom workflow you've wired up with MCP. The agent uses the skill the same way the chatbot does. It reads the description to decide whether to fire, loads the body when it does, and pulls in bundled files as needed. The skill doesn't know or care whether a human is in the loop.

That's the architectural shift hidden inside the boring file format. Custom GPTs locked your packaged capability to a chat surface. Skills decouple the capability from the surface. The same instructions that make your assistant sharper today are what make your agent reliable tomorrow.

The Twenty-Minute Move This Week

If you build one skill this week, do it like this:

Pick a workflow you've already repeated five times. A weekly status email, a meeting recap format, a code review checklist, a draft format for one of your newsletter sections. If you keep pasting the same instructions into chat, that's a skill in waiting.
Write the description first, body second. The description is what triggers it. Spend more time there than feels reasonable. Be specific about when to use it and when not to.
Test the trigger before you test the output. Open a fresh chat, describe a real task in your own words without naming the skill, and see if it fires. If it doesn't, the description is wrong, not the body.
Then test the output. Run three real inputs through it. Edit the body. Run them again. This is the loop.
Then add evals.md and memory.md. Once the body is solid, write 8–12 pass/fail checks in evals.md so a second agent with a clean context window can grade and re-run the first until every check passes. Add a memory.md so anything you correct gets logged for next time. This is the move that turns a saved prompt into a skill that gets better while you sleep.

BCG built 36,000 of these the hard way, inside one vendor. You don't have to. The file format is open, the standard is portable, and the muscle you already have from custom GPTs transfers in an afternoon.

Your AI Sherpa,

Mark R. Hinkle
Founding Publisher, The AIE Network
Follow me on LinkedIn

If you want to get in contact or give me feedback, reply to this email. I read every single one of them.