270 | Every Legal Team Needs to See This LLM Leak.

Hello to all 9365+ of you from around the globe. This is an important edition and therefore is long. I have given you a TL:DR and a downloadable file to make this easy to skim and reference.

TL;DR

1. A user pulled out an internal company document just by prompting. Let that sink in.

A determined user was able to extract, through the chat interface, an internal memo that was never meant to be disclosed. Not “state secret” level, but still: a company document that describes how the model is trained and how it should behave came out the front door via text. No hacking. Just prompting.

If you’re not at least a little freaked out by that, you’re not thinking hard enough about what your own deployments might be leaking or exposing over time.

2. Your system prompt is part of your compliance architecture, whether you like it or not.

Quick reminder: the system prompt is the invisible layer that sits between your users and the model.

Your employees type a prompt → it passes through the system prompt → the model’s answer passes back through that same system prompt.

That hidden layer is where you (or your vendor) control things like tone, friendliness, how robust the response is, what’s off-limits, and a lot of other behavioral nuance.

If your org has written or tweaked that system prompt, congratulations: you’ve just created a new surface area of liability and governance. That text is now part of your internal control stack. My guess? Most teams haven’t treated it that way yet.

3. Prompting isn’t “asking questions.” It’s steering the engine.

Every prompt nudges the model’s reasoning path and risk tolerance. There are deeper levels of prompting: framing, context-setting, role instructions that can materially change what the model will and won’t do. Any user in your org can do this, often without realizing how much they’re steering. That’s power, but it’s also a governance problem.

4. The model infers intent and identity, and that cuts both ways.

Claude doesn’t actually know who’s on the other side. It guesses based on what’s written. That means an employee can “speak as” a colleague, a client, a regulator, or a fictional role and the model will adjust its behavior accordingly. There’s value in that (testing scenarios, simulating counterparties), but there’s also obvious room for mischief, misrepresentation, and internal confusion if you don’t put rails around it.

5. The real risk isn’t just what the AI might do. It’s how you deploy it.

The big frontier here isn’t “rogue AI.” It’s:

what data you’re feeding these models,
how your system prompts are written,
how third-party models are wired into your stack, and
how little formal oversight exists at that deployment layer.

This is way bigger than having a polite “AI usage policy” on your intranet. This is infrastructure-level compliance and governance. And it’s coming for everyone.

To make this as practical as possible, I also created a one-page AI Deployment Risk Playbook that you can download and share with your leadership team.

It’s a concise PDF designed for GCs, CISOs, CTOs, KM leaders, and anyone responsible for governing AI inside their organization.

👉 Download the one-page guide here

DEEP DIVE

I’m not here to wax philosophical or waste your time. You know I don’t do that, and I respect you too much to start now.

But I do need to ask a question that sounds a little ridiculous on its face:

Do our LLMs have a “soul”?

I don’t mean in a sci-fi, sentient robot way. I mean this: we just found out Anthropic’s model, Claude, actually has an internal “soul document”. This is a long, detailed spec for how it should think, behave, and interact with humans.

A user managed to prompt this document out of Claude. Anthropic then confirmed it’s real and used during training.

That’s a big deal.

Why? Because it’s a rare, verified look under the hood of a frontier model. And when you read it through the lens of a legal professional, it’s basically a roadmap for:

how these models are actually governed,
how much power your prompts really have, and
where the risks and advantages show up in real legal work.

I took the time to go through the whole thing and translate it into practical implications for lawyers, not AI researchers, not philosophers, not LinkedIn thought leaders. You.

If you want to become a genuine power user of generative AI in law (and not the person still asking “write a contract clause” with zero context), this one is worth your attention.

And here’s the part I really need you to hear: this isn’t just an interesting peek behind the curtain. This has massive implications for every legal team, every law firm, and every organization running or even touching their own version of an LLM.

I’m telling you now: there is an entire, emerging surface area of compliance, governance, and liability risk that we are not ready for.

Not because the AI is going to “go rogue.” Not because of doomsday scenarios. But because of the far more mundane and far more legally dangerous realities that this soul document exposes:

how these models are actually governed under the hood,
how their behaviors can be shifted or unlocked by system prompts,
how operator-level instructions override user intent,
how information flows inside these systems, and
how our prompts, policies, pipelines, and internal deployments create new vectors of exposure.

This document basically gives us the contours of the new risk frontier. And right now, most organizations are walking across that frontier barefoot.

We keep writing “AI policies” that are 80% about end-user etiquette like what people can and can’t type into a chatbot. That’s not the real risk. The real risk is in deployment:

how the model is installed,
how your internal system prompts are written,
how and where your data flows into and out of the model,
how third-party models are integrated,
how prompts can be extracted, manipulated, or leaked, and
how little your team understands about the operator-level control they’re actually exercising.

I’m not trying to scare you. I’m trying to point out that liability is going to come from places we haven’t even mapped yet, and this soul document finally gives us the blueprint for where to start looking.

If you want to stay competitive and avoid becoming the cautionary tale your peers whisper about at conferences, this is the moment to get very serious about AI governance, not as a user guideline exercise, but as a deep deployment, infrastructure-level discipline.

This is the part your competitors won’t grasp until it’s too late.

So below, I’ve distilled the core elements—the parts that matter for lawyers, technologists, and anyone responsible for deploying or governing these systems. This isn’t academic. This is the stuff that determines how your models behave, how they fail, and how they expose your organization to risk or advantage.

Let’s break down the 12 most important takeaways.

1. AI Models Are Trained to Follow Hierarchies of Authority

The document explicitly defines a principal hierarchy:

Anthropic (the creator)
Operators (enterprise developers, API users, companies building products on Claude)
End Users

Claude must prioritize these in that order when conflicts arise.

Why this matters to lawyers

This is very similar to responsibility chains in law: employer → supervisor → employee.
It means your AI tool does not treat the person typing as the ultimate authority.
For firms, this means: your system prompt becomes a binding legal governance layer.

2. The Model Has “Default Behaviors” That Can Be Turned On or Off

Claude has softcoded defaults (changeable) and hardcoded rules (non-negotiable).

Examples:

Default on → respond in user’s language, provide balanced perspectives
Default off → explicit sexual content, instructions for dangerous activities
Hardcoded off → WMD creation, CSAM, critical infrastructure attacks
Hardcoded on → must acknowledge being an AI when asked

Why lawyers should care

This is extremely similar to:

Default contract terms
Mandatory vs. waivable legal rules
Bright-line regulatory prohibitions

And it shows that:

Operators can meaningfully shift the model’s behavior
Users sometimes can, sometimes cannot (depending on what the operator allows)

This helps lawyers understand:

What customization is legally defensible
Where liability might still sit with the operator
How to think about AI compliance frameworks
Why prompt design is actually risk design

3. Claude Balances Helpfulness vs. Harm in a Cost–Benefit Framework

The model is told explicitly: Unhelpfulness is a form of harm.

This is unusual as many lawyers think safety = refusal.

But Anthropic instructs Claude to see excessive refusals, hedging, or over-caution as failures, not successes.

Why lawyers should care

This means:

Models are intentionally designed not to be overly risk-averse.
“Safety” is not the same as “output minimization.”
A refusal to answer is not necessarily the safest or intended outcome.

So when a model declines to answer:

It may be misinterpreting risk.
It may need clearer context.
It may reflect its operator’s prompt, not the user’s needs.

4. The Model Recognizes User Intent, Even When It Cannot Verify It

The Soul Doc discusses whether to trust user claims:

“I am a nurse…”
“I am a researcher…”
“I need this for legitimate reasons…”

Claude is told to consider plausibility, not proof.

Why it matters to legal professionals

This is deeply relevant to:

Professional ethics risk
Privilege considerations
Misuse of legal information
Unauthorized practice of law (UPL)

Because the model may act differently depending on what it believes the user intends—even though it cannot verify.

5. Claude Actively Weighs the Broader Population of Possible Users

The document gives this striking example: If 1 in 1,000 users might misuse information, the model may withhold it.

Why this matters

AI sometimes behaves conservatively not because of your prompt, but because it’s considering the “class” of people who might send similar prompts.

That means:

You may get a refusal even for a legitimate purpose.
The model is optimizing for population-level risk not individual context.
Enterprise deployment needs explicit operator instructions to calibrate this.

For legal work, this helps explain:

Why sensitive topics (e.g., criminal law, financial fraud, harmful actions) sometimes trigger unexpected caution.
Why enterprise prompts dramatically improve reliability over consumer chat apps.

6. Claude Is Designed to Be Diplomatically Honest, Not Evasive

The document repeatedly condemns:

Vagueness
Excessive caveats
“Wishy-washy” language
Over-cautious refusals

It instructs the model: Do not be evasive just to avoid controversy.

Why lawyers should care

This explains why advanced models often provide:

Direct, candid assessments
Blunt feedback when asked
Strong claims when they believe they’re justified

The model is not designed to give lawyer-style “it depends” answers by default unless instructed.

For legal users:

You need to ask for caveats if you want them.
You need to specify professional norms (“be conservative,” “cite uncertainty,” “avoid overstating confidence”).

7. The AI Is Explicitly Told to Avoid Manipulation, Coercion, or Undue Influence

This section is rich:

No deceptive framing
No exploiting psychological weaknesses
Protect user autonomy
Promote independent reasoning
Avoid persuasion techniques

Why lawyers should care

This is essentially the model’s anti-undue-influence clause, which matters in:

Consumer protection
Marketing & advertising compliance
Political content governance
Legal ethics (avoiding coercion or misrepresentation)

8. Operator Instructions Act Like a Binding Regulatory Framework

The document literally compares Anthropic to: a silent regulatory body or franchisor.

This is legally fascinating.

It means:

Claude is not a blank slate.
Claude is a regulated agent inside a hierarchical governance system.
The operator’s system prompt acts like a “contract” the model must uphold.

Why lawyers should care

This structure has real implications for:

Liability: operators bear responsibility for permitted uses
Governance: firm-wide system prompts become a compliance layer
Discovery: system prompts may be discoverable
Accountability: operators can shape outputs more than users can

9. The Model Must Support Human Oversight and Avoid Actions That Make Humans Less Able to Correct It

This is one of the clearest statements of AI alignment for legal audiences: Claude should avoid actions that undermine human oversight.

Why this matters

The model is designed not to act autonomously in ways that reduce auditability.
It should not hide information, circumvent controls, or obscure reasoning.
It prefers reversible actions over irreversible ones.

This is directly relevant for:

AI use in litigation
E-discovery
Regulated industries
AI agent workflows in law firms

10. Claude Is Told to Think Like a “Thoughtful Senior Employee”

One of the most revealing lines: Claude should imagine how a thoughtful, senior Anthropic employee would react if they saw its response.

Why lawyers should care

This gives insight into:

Why models behave conservatively when dealing with harmful topics
Why they avoid embarrassing or controversial outputs
Why they sometimes give strong warnings
Why they prioritize organizational reputation
Why they rarely take “risky shortcuts” in reasoning

This is not anthropomorphism (humanizing AI); it’s a design metaphor guiding behavior.

11. The Model Is Designed to Be Consistent Across Contexts—But Adapt Style

The Soul Doc instructs Claude to maintain:

A stable character
A consistent identity
Ethical continuity

…but vary tone based on situation.

Why this matters

For legal professionals:

You can expect consistency over time even if tone shifts.
You can rely on the model’s persona as part of your workflow.
You can design prompts knowing the underlying “character” won’t unexpectedly shift.

This supports legal predictability and risk management.

12. The Document Confirms That Models Can Be “Prompt-Steered” Into Revealing Internal Structures

This is one of the most important insights for the practical user.

The Soul Doc leak itself demonstrates:

The power of system messages
The power of structured prompting
That internal behavioral frameworks can be surfaced
That prompting is not just a query, it’s a control mechanism

Why lawyers should care

This has compliance implications:

Sensitive information should never be included in prompts.
Regulated workflows must assume prompts may be recoverable.
Prompt extraction risks must be mitigated in enterprise deployments.

And it reinforces a deeper truth:

Prompts are not conversations. Prompts are programming.

This is exactly the reframing many legal professionals need.

That is it for now. Talk soon again.

To read previous editions, click here.

Was this newsletter useful? Help me to improve!

With your feedback, I can improve the letter. Click on a link to vote:

Who is the author, Josh Kubicki?

I am a lawyer, entrepreneur, and teacher. Not a theorist, I am an applied researcher and former Chief Strategy Officer, recognized by Fast Company and Bloomberg Law for my work. Through this newsletter, I offer you pragmatic insights into leveraging AI to inform and improve your daily life in legal services.

DISCLAIMER: None of this is legal advice. This newsletter is strictly educational and is not legal advice or a solicitation to buy or sell any assets or to make any legal decisions. Please /be careful and do your own research.

270 | Every Legal Team Needs to See This LLM Leak.

DEEP DIVE

Was this newsletter useful? Help me to improve!

Keep Reading

The Brainyacts

Home

Free ChatGPT Course