AI Book Review: The Alignment Problem – How Can Machines Learn Human Values?

A gripping investigation into why today’s AI systems so often miss the point – and what it will take to align them with human values.

Mar 02, 2026

Audio summary:

Follow me on LinkedIn: https://www.linkedin.com/in/johanosteyn/

Much of the AI conversation today swings between cheerleading and catastrophe. The Alignment Problem: How Can Machines Learn Human Values? offers something more demanding: a deeply reported look at where our systems are already going wrong, and a careful portrait of the researchers trying to fix them. This is not a theoretical puzzle; it is about credit scores, prison sentences, medical diagnoses and social feeds that affect real lives.

CONTEXT AND BACKGROUND

Brian Christian is best known for books like The Most Human Human and Algorithms to Live By. In The Alignment Problem, he turns to one of the central questions in AI safety: how do we ensure that machine learning systems actually do what we intend, rather than what we accidentally reward? The “alignment problem” in this sense is the growing gap between our goals and the behaviour of systems trained on vast datasets and complex optimisation functions.

The book is structured in three parts: Prophecy, Agency and Normativity. Prophecy explores the current landscape – the history of neural networks and the very real harms emerging from opaque models deployed in the wild, such as biased recidivism scoring tools and error-prone image classifiers. Agency examines reinforcement learning and the strange, sometimes hilarious, sometimes frightening ways agents learn to game reward functions. Normativity turns to the deepest question of all: how to encode human values in systems that learn from our behaviour, when we ourselves struggle to agree on what those values are.

INSIGHT AND ANALYSIS

What makes this book stand out is Christian’s storytelling. He takes what could be abstract technical issues and anchors them in vivid case studies: an image-recognition system that tags Black people as gorillas; a parole algorithm whose outputs vary systematically by race; a medical model that learns the wrong lesson about asthma and pneumonia risk. These are not sci-fi thought experiments; they are warnings from systems already at work in courts, hospitals and platforms around the world.

In the Agency section, Christian uses reinforcement learning to show how easily optimisation can go off the rails. Agents trained to achieve a goal learn perverse shortcuts – from simulated robots that find loopholes in their physics environments to game-playing systems that rack up points in ways their designers never envisaged. The point is not that these agents are malicious, but that they are relentlessly literal. They do exactly what we specify, not what we meant. If we get the reward function wrong, we should not be surprised when the behaviour looks wrong as well.

The final section, Normativity, is perhaps the most philosophically rich. Here Christian introduces ideas like inverse reinforcement learning – trying to infer what someone values from how they behave – and shows why this is both promising and treacherous. Our actions do not always reflect our ideals; our preferences change; our societies contain deep disagreements. He brings in voices from effective altruism and long-termism, highlighting debates about existential risk and how much weight we should give to far-future generations. At times, the discussion becomes quite dense, but it is a rare popular book that takes these questions seriously without drifting into hand-waving.

What I appreciate most is that Christian treats alignment as a mirror. Our attempts to make machines behave raise uncomfortable questions about our own biases, values and institutional failures. The problem is not only that AI systems can be racist or unfair; it is that they can faithfully reproduce patterns we have tolerated for decades, now at scale and with a veneer of objectivity.

IMPLICATIONS

For policymakers and regulators, The Alignment Problem is essential reading. It shows that simply “using AI” in government, policing or social protection without robust oversight is an invitation to automate injustice. We cannot treat algorithmic risk as a purely technical matter. Choices about data, objectives and constraints are political choices, and they need democratic scrutiny. For countries like South Africa, already grappling with inequality and historical bias, this should be a bright red flag.

For business leaders, the book is a warning against blind trust in predictive models. Whether we are scoring customers, screening CVs or optimising logistics, the temptation is to celebrate accuracy and efficiency while ignoring questions of fairness, transparency and long-term impact. Christian’s case studies show how quickly misaligned incentives can produce reputational damage, regulatory backlash and genuine harm to the communities companies claim to serve.

Parents and educators will find the book unsettling in a different way. If AI systems are learning from our data – from social media, surveillance footage, school records and more – then they are also learning our prejudices and blind spots. The question becomes not only “How do we fix the models?” but “What are we teaching them about what counts as success?” That speaks directly to how we design curricula, assess performance and talk to young people about their own digital footprints.

CLOSING TAKEAWAY

The Alignment Problem is one of the most important AI books I have read in recent years. It refuses both complacency and fatalism. Christian shows, with empathy and rigour, that aligning machine learning with human values is an urgent, unfinished project – one that will shape everything from banking and healthcare to democracy and defence. He also makes it clear that solving alignment is not just a job for engineers in Silicon Valley. It requires lawyers, philosophers, social scientists, journalists, policymakers and ordinary citizens to confront what we truly value, and to build institutions capable of defending those values in an age of optimisation. If we fail, the systems we unleash will not be alien invaders; they will be ruthless amplifiers of our own worst habits.

Author Bio: Johan Steyn is a prominent AI thought leader, speaker, and author with a deep understanding of artificial intelligence’s impact on business and society. He is passionate about ethical AI development and its role in shaping a better future. Find out more about Johan’s work at https://www.aiforbusiness.net

Johan's Substack

Ready for more?