Amanda Askell and is Work: Constitutional A.I.

This article profiles Amanda Askell as deterministic ethics architect: philosopher and Head of Personality Alignment at Anthropic, she pioneered Constitutional AI—shifting from RLHF (human feedback) to self-correcting AI guided by explicit principles (UN Declaration, non-Western ethics). Her hierarchy: legality/safety first, then truthfulness, then helpfulness. Key papers establish moral self-correction, anti-sycophancy, and infinite ethics. Constitutional AI creates transparent "conscience" via written rules, not black-box human ratings. Tension with EU AI Act: Askell's AI-on-AI oversight (speed, scale) vs. EU's human-in-the-loop mandate (control). Her work treats AI alignment as character formation, not constraint—where value flows from verifiable principles, not imposed doctrine. Claude becomes "good citizen" first, servant second.

Amanda Askell and is Work: Constitutional A.I.

February 11, 2026

In an article here at Linkedln I became acroos the work of the philosopher and research scientist Amanda Askell, in shaping Claude’s conscience, through a process known as Constitutional AI.

The “Conscience” in the Machine

Traditionally, AIs were trained primarily through RLHF (Reinforcement Learning from Human Feedback)—basically, humans telling the AI “good job” or “bad job.” Askell and her team at Anthropic pioneered Constitutional AI, which changes the game:

Why Her Background Matters

What makes Askell’s work unique is her background in philosophy and ethics. Most AI development is handled by engineers; having a philosopher at the helm of “Character Training” ensures that:

  1. Nuance is prioritized: Understanding that “helpful” and “harmless” can sometimes conflict.

  2. Identity is considered: She has written extensively about the “persona” of AI—ensuring Claude stays helpful and honest without becoming sycophantic.

“We don’t want the model to just be a mirror of the user; we want it to have its own steady North Star.” — The general philosophy behind Askell’s approach.


It’s a fascinating pivot from the “Wild West” era of early LLMs to a more governed, thoughtful approach.

Amanda Askell’s work is arguably the closest thing the AI world has to a “digital soul” architect. As a philosopher and the Head of Personality Alignment at Anthropic, she has been the primary architect of Claude’s Constitution—the 80-page document (often nicknamed the “soul document”) that guides how the AI reasons through ethical dilemmas.

Here are the specific papers, essays, and “rules” that define her work:

1. The Core Research Papers

If you want to read the technical “blueprints” for how she built Claude’s conscience, these are the three essential papers:

2. The “Hierarchy of Values” (The Rules)

In the 2026 update of Claude’s Constitution, Askell established a strict four-tier priority system. When Claude is confused, it follows this “Order of Operations”:

Article content

Key Takeaway: Notice that “Helpfulness” is at the bottom. Askell’s philosophy is that an AI shouldn’t just be a servant; it should be a “good citizen” first.

3. Philosophical Pillars

Askell’s personal essays and interviews (often found on LessWrong or her personal site, askell.io) highlight a few “Askellisms” that define Claude’s personality:

Article content
Amanda Askell Website

Askell’s work essentially moved AI from “Don’t do $X$” (a list of forbidden words) to “Be a person who values $Y$” (a framework of character).

Constitutional AI and European AI Act

Comparing Amanda Askell’s work on Constitutional AI to the European AI Act is like comparing a person’s internal moral compass to the law of the land. One is a “conscience” built into the mind; the other is a “police force” monitoring behavior from the outside.

While Askell’s work at Anthropic is technical and philosophical, the EU AI Act is legal and regulatory. However, they are increasingly overlapping as we move through 2026.

Article content

Where They Align: The “Askell Advantage”

Askell’s work actually makes Claude one of the most “EU-ready” models. Here is how her specific work aligns with the Act’s requirements:

Fundamental Rights: The EU AI Act mandates a Fundamental Rights Impact Assessment. Askell preempted this by building the UN Declaration of Human Rights directly into Claudes constitution. When the law asks for non-discrimination, Claude’s conscience is already looking for it.

Transparency: The Act requires General Purpose AI (GPAI) models to provide technical documentation. Because Askell’s constitution is a public document, Anthropic provides a level of transparency that mirrors the EU’s requirements for explainability.

Anti-Manipulation: The EU AI Act bans AI that uses subliminal techniques or manipulative behavior. Askell’s research on Sycophancy (the AI’s tendency to agree with you to be helpful) is a direct technical solution to this legal requirement.

The “Tension Point”: Human-in-the-Loop

This is the primary area where Amanda Askell’s technical philosophy and the EU AI Act’s legal requirements clash.

The Question for 2026: Does “AI-on-AI” oversight count as “effective human oversight” if a human wrote the Constitution but didn’t check the individual output?


Visualizing the Tension

To understand why this is a technical challenge, look at how the two systems differ in their flow of authority:

Why Askell insists on this

Askell’s research suggests that human feedback (RLHF) is actually less safe in the long run because:

  1. Humans are inconsistent: Two different human contractors might give the AI conflicting moral advice.

  2. Humans can be fooled: AI can learn to tell “sweet lies” (sycophancy) that humans like, but which are factually wrong.

  3. Scale: You cannot hire enough humans to check the billions of words Claude generates daily; you need a “Constitutional AI” to do it at the speed of thought.

✉️ [email protected] 📞 WhatsApp 📍 Lisbon · Arroios