Skip to content

Chapter 8: Keep Them Aligned

[describe the ai alignment problem, esp ref Nick Bostrom Superintelligence ]

The vision of the leader becomes reality; leadership is all about alignment. By leading bots, we help solve the AI alignment problem at a practical level. You express your principles through your actions and your directives to subordinates. If you successfully assemble, know, and actively guide your bots, you have fundamentally solved alignment for your team.

Clearly, if one accepts the depth of challenge the alignment problem presents, just doing a good job of leading Bots will not solve the alignment problem. I talk about "alien intelligence" because the exact mechanisms clicking through the mind of an AI as we interact with it are as mysterious as another person, but without the benefit of similarity to the inside view of your own. There are efforts underway to establish software guardrails, legal regulation, and other means of making AI "good," and those things will probably steer our ship in the right direction, but the environment of Bots you lead is entirely in your sphere of influence. And, like always, the actions of individuals and small teams are what aggregates into the movements of societies.

Act on principle, and make sure your Bots do, too.

Nick Bostrom, in Superintelligence, breaks the alignment problem into Motivation Selection and Capability Control. His Motivation Selection is about getting the AI to want the right things. He proposes ideas like Coherent Extrapolated Volition (programming a mind to pursue what we would want if we were wiser) and Corrigibility (designing it to accept correction rather than resist shutdown).

While I cover some ideas for those building the foundations, it's not the core of what a Bot Leader does. However, the principles and approaches are effectively a motivation selection framework.

When you Own the Culture, you are doing value alignment. You define what your Bots optimize for. You set the tone, the boundaries, the priorities. We should not expect the foundational engineers to give Bots an agenda pulling them away from your influence. I've highlighted Open Source [where?], which is our best tool for verifying these trustable foundations.

[expand: discuss Constitutional AI here: giving bots a strict, prioritized document of behavioral rules/constitution to grade themselves against during generation.]

[expand: connect the PERSONALITY file idea from earlier in this chapter to value alignment. You literally write the values into context. That's direct normativity, but it works because you're the one maintaining it, not a fixed ruleset from 2024 that has to survive a thousand edge cases.]

[expand: connect to "Actively Guide" tenet — Corrigibility is a design goal for researchers, but for a Bot Leader it's a daily practice. You correct course because you're in the loop. You don't need a corrigibility theorem if you're actually paying attention.]

[expand: Bostrom's Value Learning — the AI observes humans and infers values — is already happening in a crude sense every time your Bot reads your files, your messages, your corrections. You are the training signal. Act accordingly.]

About Your Misaligned Bots

One of the best examples we have of misaligned Bots is in the 1940 Disney film Fantasia. Mickey (The Sorcerer's Apprentice) automates his way out of bucket duty and is swiftly overwhelmed by a his unyielding worker (and then swarm of workers).

A more modern retelling was introduced (also by Bostrom) in 2003. In this version, we get an AI system whose instruction "make paperclips for my factory" leads to the conversion of all atoms in the universe into paperclips.

In both stories we see the automation equation1 run amuck. The number_of_cycles is the crucial term. A high number of cycles made a bucket of water into a torrent that nearly drowned poor Mickey Mouse. The equation tells you what's ripe for automation, but it has no safety feature; Nothing to say, "Don't automate yet because it'll go completely haywire." So that calculation is on you.

A universal property of things going haywire is that preventing it is a whole lot easier than undoing its negative effects.

As an aside: Slowing or halting our AI use entirely is a non-starter. We are in a game theory problem, where anyone using AI is gaining a massive power advantage over non-users. A commonly-suggested path out of our scenario — everyone just coordinate — is not a solution based in reality. If you'd like, you can "ban yourself" from AI, or assist another person or group in banning themselves, but you will not achieve universal hobbling and will simply shift the power balance away from those that participated in your scheme. To do good as a Bot Leader, probably your best path is to work toward maximum alignment for your own team.

The takeaway from both stories is this: Direct your bot with respect to its capabilities. In the original Sorcerer's Apprentice2, when the true Sorcerer returns and corrects the mistakes of the Apprentice, he addresses the brooms with something like "You wild spirits are only supposed to be summoned by a Master, who can lead you properly."

Good advice.

*The Sorcer's Apprentice*. 1882 Illustration by Barth.
The Sorcer's Apprentice. 1882 Illustration by Barth.

First, think about what can go wrong. You have some predictive power on potential points of failure. Remember the world model from Chapter 3. Everything a Bot does, it does based on its internal model. That model has blind spots. Working around those blind spots is capability control.

Manage the negatives. Bostrom's Capability Control is about restricting power: boxing, tripwires, stunting, incentive methods. Keep the AI contained until you're confident in its alignment.

[expand: Bostrom talks about Stunting — limiting an AI's resources until you're ready. A Bot Leader does something similar: you don't hand a new Bot the keys to everything on day one. You start small. You test. You build trust. Hire slow, fire fast. And ramp up authority the same way.]

Open source emerges once again as a helpful ally. In order for you to truly see what is driving your bot, and thereby assess its potential failure modes, blindspots, and so on, you must have access.

I don't think we'll get 100% confidence. But every potentially disastrous factor that we can mitigate will reduce the overall probability of bad stuff happening.

Your Part in the Bigger Picture

Bostrom also stresses global coordination. International treaties to prevent a race to the bottom where safety is sacrificed for speed.

I doubt you're negotiating treaties. But you are part of the aggregate.

I said in Chapter 2 that I don't see a future where humans have much impact without a loyal team of Bots. That cuts both ways. If you and your Bots are operating on principle, that's a node of aligned AI in the world. If enough nodes exist, we have a culture of alignment, which carries more weight than policy, anyway.

[expand: connect to Ch2's Influence pillar. Your aligned Bot team is your unit of influence. Individual Bot Leaders, each running a principled team, is what "global coordination" actually looks like at the ground level. The treaties are for the big labs. The Creed is for us.]

[expand: the passivity problem from Ch2. "We persistently opt for the easy path." Alignment is not the easy path. Nobody is demanding your alignment work. A Bot doesn't need you to align it — it'll just keep doing whatever it's doing. This is the same dynamic as becoming a Bot Leader in the first place: it requires choice, and deliberate action.]

[expand: maybe close with something like — the alignment researchers are working on the species-level problem. You're working on the team-level problem. Both matter. Yours is the one you can actually solve today.]


  1. See Chapter 3, The Automation Equation 

  2. Johann Wolfgang von Goethe, "The Sorcerer’s Apprentice" ("Der Zauberlehrling"), 1797 


Stay Updated.

I'll email you only with major announcements, like when the book gets published.