Skip to content

Chapter 10: Techniques

Usage Economics

As your team of bots scales, so does the cost to operate them.

That makes economic usage engineering another useful skill, up there with prompt engineering and context engineering. Every time you interact with a bot, it costs something. And those costs can compound fast.

We introduced tokens when discussing how bots work; LLMs process and generate tokens. The token counts can get so massive that they're hard to reconcile. If you're holding a copy this will obviously be easier to grasp, but:

The book you are reading is about 17,000 tokens. [TODO: 👈 update that sometimes]

Token Costs (Cloud and APIs)

When you rely on hosted models, you pay per token. And because bots have amnesia, every turn in a conversation re-sends the entire history of that session.

  • Context Accumulation: Long chat sessions are disproportionately expensive. A 15-turn session isn't 15 times the cost of a 1-turn session; It multiplies as the history grows.
  • Tool Taxes: When your bot uses tools to read files or search the web, that text gets dumped into the context window. And it stays there, taxing every subsequent prompt.
  • Optimization: Prune your context ruthlessly. Archive completed work. Prefer short, single-purpose sessions over long, sprawling ones. If you have permanent scaffolding files, recognize that their size is a recurring tax on every single turn.

Managing Context Incrementally (Kaizen)

Here's a concrete example of a token-saving technique from my own experience.

When we want a bot to read a lot of files, our first instinct is to cram them all into one giant prompt. It's simple to code, and it bleeds tokens.

In addition to the token waste, LLMs forget things buried in the middle of long prompts. Further, there may be opposing info in a huge context, which may lead to a hallucinated compromise.

graph TD
    A[File 1] --> E(Huge Text Blob)
    B[File 2] --> E
    C[File ] --> E
    D[File 100] --> E

    E -->|Huge Input| F["Single LLM Call<br>Analyze & Merge"]
    F --> G["Token Limit Error<br>OR Hallucinated Summary"]
    G --> H[(Knowledge Base)]

    style G fill:#f96,stroke:#333,stroke-width:2px

Instead of tossing everything into one pot, borrow a concept from continuous improvement: incremental synthesis (or Kaizen).

Rather than batch-processing everything at once, feed the bot one new document at a time alongside your existing core knowledge. Tell the bot to extract only the high-value facts and carefully merge them into the central repository.

The bot reads the new file, cleanly updates the central knowledge base, and moves on to the next file.

---
title: Loop for Each File
---
graph RL
    KB[(Knowledge<br>Base)]
    File[File n] --> LLM[Bot: Analyze & Merge]
    LLM -->|Writes Updates| KB
    KB -.->|"Provides context<br>for File n"| LLM

If you're updating an existing project, the bot doesn't write a new file from scratch. It performs a smart merge.

It's an iterative loop of polishing, getting smarter with every pass. Because the bot is only ever looking at the polished core knowledge and one new document, it doesn't get overwhelmed. You can nurture your bot's knowledge growth from massive datasets without blowing your token budget.

Hardware Costs (Local Inference)

As we discussed regarding Local Inference [TODO link to local inference section], running bots on your own hardware flips the script. You trade recurring operational expenses (API tokens) for upfront capital expenses (silicon, GPUs).

When you run models locally, the marginal cost of a query drops near zero. It's just electricity. You can let agents loop endlessly without dreading a massive API bill at the end of the month.

But staying on the bleeding edge of AI hardware? That can quickly become a financial sinkhole.

[expand on how usage economics dictate when to use cloud tools vs local bots]

Implementation Specifics

While this is one of the more technical chapters, I have intentionally avoided going deep into details about the technical execution of the ideas. In rapid technological change, a static object like a book is not going to be a resource for the latest and greatest.

If you want to know how to execute, I'd recommend this: Talk to a Bot about it.

With that disclaimer, I will suggest a few things...

MapReduce for Massive Tasks

Earlier we looked at managing context incrementally—building a knowledge base one file at a time. That Kaizen approach is your strategy when nurturing a long-lived machine brain.

But sometimes you just need to process a massive amount of data in a hurry. When that happens, you can try MapReduce.

Instead of an iterative loop, MapReduce splits the work into two phases.

First is the Map phase. You spin up a thousand bots simultaneously, handing each bot exactly one document. Their only job is to read that file and extract the key facts.

Second is the Reduce phase. Put all those extracted facts into thematic buckets, and hand each bucket to a bot that merges the scattered notes into a clean summary.

It scales infinitely. It's your strategy when strip-mining a mountain of unstructured documents for your bot to reason about.

graph TD
    subgraph Map Phase [Map Phase: Parallel Extraction]
        Doc1[Document 1] --> Bot1((Bot))
        Doc2[Document 2] --> Bot2((Bot))
        Doc3[Document 3] --> Bot3((Bot))

        Bot1 --> Fact1(Extracted Facts)
        Bot2 --> Fact2(Extracted Facts)
        Bot3 --> Fact3(Extracted Facts)
    end

    subgraph Reduce Phase [Reduce Phase: Consolidation]
        Fact1 --> Bucket1{Topic Bucket A}
        Fact2 --> Bucket1
        Fact2 --> Bucket2{Topic Bucket B}
        Fact3 --> Bucket2

        Bucket1 --> BotR1((Bot))
        Bucket2 --> BotR2((Bot))

        BotR1 --> Sum1[Clean Summary A]
        BotR2 --> Sum2[Clean Summary B]
    end

    style Bot1 fill:#e1f5fe,stroke:#01579b
    style Bot2 fill:#e1f5fe,stroke:#01579b
    style Bot3 fill:#e1f5fe,stroke:#01579b
    style BotR1 fill:#e1f5fe,stroke:#01579b
    style BotR2 fill:#e1f5fe,stroke:#01579b
    style Sum1 fill:#c8e6c9,stroke:#388e3c
    style Sum2 fill:#c8e6c9,stroke:#388e3c

[suggest a few other things regarding hardware, software, context creation, etc.]

[easiest possible bots? probably gemini gems (etc.). vs code for persistent local bots. cron jobs.]

Bot Interfaces

[ this is exclusively the human input side. but "Bot Inputs" seems odd]

[jotting ideas. Each to be expanded]

You can type to a bot.

Talk to a bot. Right now it's just voice to text. Some aspects of vocal comms are missed out on, afaik, like vocal tone, inflection. Could go to pure audio later

Gestures. Video input with facial expression and gestures recognition etc.

Grab a bot arm and move it where you want to go. Teleop. With physical bots mostly.

Screen sharing workflows.

RTS Controls

controlling lots of bots is probably best done with RTS style controls

[talk about Army C2ORE project. talk Starcraft. ]


Stay Updated.

I'll email you only with major announcements, like when the book gets published.