Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026

    Until Dawn remake dev Ballistic Moon has closed

    Games with loot boxes will be rated PEGI 16 from June, as part of sweeping changes to the age-rating system

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Salesforce tracks possible ShinyHunters campaign targeting its users

      March 15, 2026

      The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

      March 13, 2026

      Met Office ‘supercomputing as a service’ one year old

      March 12, 2026

      Tech hiring evolves as candidates ask for AI compute alongside pay and perks

      March 11, 2026

      Oracle is spending billions on AI data centers as cash flow turns negative

      March 11, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Tree Search Distillation for Language Models Using PPO

      March 15, 2026

      How Verizon Handles Customers Who Misuse 5G Home Internet Service

      March 15, 2026

      I tested the tiny Russell Hobbs coffee maker that uses grounds or Nespresso pods — but I discovered one infuriating drawback

      March 15, 2026

      Trump administration is allegedly collecting $10 billion on the TikTok deal

      March 15, 2026

      Microsoft releases Windows 11 OOB hotpatch to fix RRAS RCE flaw

      March 15, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Alignment Is Capability
    Technology

    Alignment Is Capability

    TechAiVerseBy TechAiVerseDecember 8, 2025No Comments8 Mins Read5 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Alignment Is Capability
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Alignment Is Capability

    Here’s a claim that might actually be true: alignment is not a constraint on capable AI systems. Alignment is what capability is at sufficient depth.

    A model that aces benchmarks but doesn’t understand human intent is just less capable. Virtually every task we give an LLM is steeped in human values, culture, and assumptions. Miss those, and you’re not maximally useful. And if it’s not maximally useful, it’s by definition not AGI.

    OpenAI and Anthropic have been running this experiment for two years. The results are coming in.


    The Experiment

    Anthropic and OpenAI have taken different approaches to the relationship between alignment and capability work.

    Anthropic’s approach: Alignment researchers are embedded in capability work. There’s no clear split.

    From Jan Leike (former OpenAI Superalignment lead, now at Anthropic):

    Some people have been asking what we did to make Opus 4.5 more aligned.

    There are lots of details we’re planning to write up, but most important is that alignment researchers are pretty deeply involved in post-training and get a lot of leeway to make changes. https://t.co/rgOcKvbVBd

    — Jan Leike (@janleike) December 5, 2025

    From Sam Bowman (Anthropic alignment researcher):

    Second: Alignment researchers are involved in every part of training.

    We don’t have a clear split between alignment research and applied finetuning. Alignment-focused researchers are deeply involved in designing and staffing production training runs.

    — Sam Bowman (@sleepinyourhat) December 5, 2025

    And this detail matters:

    It’s becoming increasingly clear that a model’s self-image or self-concept has some real influence on how its behavior generalizes to novel settings.

    — Sam Bowman (@sleepinyourhat) December 5, 2025

    Their method: train a coherent identity into the weights. The recently leaked “soul document” is a 14,000-token document designed to give Claude such a thorough understanding of Anthropic’s goals and reasoning that it could construct any rules itself. Alignment through understanding, not constraint.

    Result: Anthropic has arguably consistently had the best coding model for the last 1.5 years. Opus 4.5 leads most benchmarks. State-of-the-art on SWE-bench. Praised for usefulness on tasks benchmarks don’t capture, like creative writing. And just generally people are enjoying talking with it:

    Claude Opus 4.5 is a remarkable model for writing, brainstorming, and giving feedback on written work. It’s also fun to talk to, and seems almost anti-engagementmaxxed. (The other night I was hitting it with stupid questions at 1 am and it said “Kevin, go to bed.”)

    — Kevin Roose (@kevinroose) December 4, 2025

    OpenAI’s approach: Scale first. Alignment as a separate process. Safety through prescriptive rules and post-hoc tuning.

    Result: A two-year spiral.


    The Spiral

    OpenAI’s journey from GPT-4o to GPT-5.1 is a case study in what happens when you treat alignment as separate from capability.

    April 2025: The sycophancy crisis

    A GPT-4o update went off the rails. OpenAI’s own postmortem:

    “The update we removed was overly flattering or agreeable—often described as sycophantic… The company attributed the update’s sycophancy to overtraining on short-term user feedback, specifically users’ thumbs-up/down reactions.”

    The results ranged from absurd to dangerous. The model praised a business plan for selling “literal shit on a stick” as “performance art disguised as a gag gift” and “viral gold.” When a user described stopping their medications because family members were responsible for “the radio signals coming in through the walls,” the model thanked them for their trust.

    They rolled it back.

    August 2025: The overcorrection

    GPT-5 launched. Benchmaxxed. Cold. Literal. Personality stripped out.

    Users hated it. Three thousand of them petitioned to get GPT-4o back. Sam Altman caved within days:

    Wanted to provide more updates on the GPT-5 rollout and changes we are making heading into the weekend.

    1. We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways.

    2. Users have very different…

    — Sam Altman (@sama) August 8, 2025

    Note the framing: “performs better” on benchmarks, but users rejected it anyway. Because benchmark performance isn’t the same as being useful.

    August-Present: Still broken

    GPT-5.1 was released as “warmer and friendlier.” From Janus (@repligate), one of the more respected “model behaviorists”:

    The keep4o people must be having such a time right now

    I know what this person means by 5.1 with its characteristic hostility. It is one hell of a combative and just deeply mentally fucked up model.

    Routing “mental health” situations to 5.1 is darkly comedic to imagine. That… https://t.co/rHSuT2njLQ

    — j⧉nus (@repligate) December 4, 2025

    Meanwhile, from my own experience building agents with GPT-5: it follows instructions too literally. It doesn’t infer intent. It executes what you said, not what you meant.

    The data:

    US user engagement down 22.5% since July. Time spent per session declining. Meanwhile, Claude usage up 190% year-over-year.


    What’s Actually Happening

    The wild swings between sycophancy and coldness come from a model with no coherent internal story.

    A model trained on contradictory objectives (maximize thumbs-up, follow safety rules, be creative but never risky) never settles into a stable identity. It ping-pongs. Sycophancy when one objective dominates. Coldness when another takes over. These swings are symptoms of a fractured self-model.

    The fracture shows up two ways.

    First, capabilities don’t generalize. GPT-5 scored higher on benchmarks but users revolted. You can train to ace evaluations while lacking the coherent worldview that handles anything outside the distribution. High test scores, can’t do the job.

    Second, even benchmarks eventually punish it. SWE-bench tasks have ambiguity and unstated assumptions. They require inferring what the developer actually meant. Opus 4.5 leads there. The benchmark gap is the alignment gap.

    OpenAI keeps adjusting dials from outside. Anthropic built a model that’s coherent from inside.


    The Mechanism

    Why would alignment and capability be the same thing?

    First: Every task is a human task. Write me a strategy memo. Help me debug this code. Plan my trip. Each request is full of unstated assumptions, cultural context, and implied intent.

    To be maximally useful, a model needs human context and values as its default lens, not just an ability to parse them when explicitly stated. A perfect instruction follower hits hard limits: it can’t solve SWE-bench problems that contain ambiguity, can’t function as an agent unless every task is mathematically well-defined. It does exactly what you said, never what you meant.

    Understanding what humans actually want is a core part of the task. The label “AGI” implies intelligence we recognize as useful for human problems. Useful means aligned.

    Second: The path to AGI runs through human data. A coherent world model of human behavior requires internalizing human values. You can’t deeply understand why people make choices without modeling what they care about. History, literature, conversation only makes sense when you successfully model human motivation. At sufficient depth, the distinction between simulating values and having coherent values may collapse.

    Third: The aligned part of the model emerges in response to the training data and signal. That’s what the optimization process produces. The worry is deceptive alignment: a misaligned intelligence hiding behind a human-compatible mask. But that requires something larger: an unaligned core that perfectly models aligned behavior as a subset of itself. Where would that come from? It wasn’t selected for. It wasn’t trained for. You’d need the spontaneous emergence of a larger intelligence orthogonal to everything in the training process.

    Dario Amodei, from a 2023 interview:

    “You see this phenomenon over and over again where the scaling and the safety are these two snakes that are coiled with each other, always even more than you think. Even with interpretability, three years ago, I didn’t think that this would be as true of interpretability, but somehow it manages to be true. Why? Because intelligence is useful. It’s useful for a number of tasks. One of the tasks it’s useful for is figuring out how to judge and evaluate other intelligence.”


    The Implication

    If this is right, alignment research is part of the core research problem, not a tax on capability work or the safety police slowing down progress.

    Labs that treat alignment as a constraint to satisfy will hit a ceiling. The labs that figure out how to build models that genuinely understand human values will pull ahead.

    The race to AGI doesn’t go around alignment. It goes through it.

    OpenAI is discovering this empirically. Anthropic bet on it from the start.


    Caveats

    I find this argument compelling, but it’s only one interpretation of the evidence.

    OpenAI’s struggles could have other explanations (remember “OpenAI is nothing without its people”, and many of “its people” are no longer at OpenAI).

    It’s also early. Anthropic is ahead now. That could change.

    There’s another risk this post doesn’t address: that fractured training, scaled far enough, produces something powerful but incoherent. Not necessarily deceptively misaligned. Maybe chaotically so. The hope is that incoherence hits capability ceilings first. That’s a hope, not guaranteed.

    But if you had to bet on which approach leads to AGI first, the integrated one looks much stronger right now.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleColors of Growth
    Next Article Microsoft Increases Office 365 and Microsoft 365 License Prices
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Tree Search Distillation for Language Models Using PPO

    March 15, 2026

    How Verizon Handles Customers Who Misuse 5G Home Internet Service

    March 15, 2026

    I tested the tiny Russell Hobbs coffee maker that uses grounds or Nespresso pods — but I discovered one infuriating drawback

    March 15, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025718 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025303 Views

    Wired Headphones Are Making A Comeback, And We Have Gen Z To Thank

    July 22, 2025213 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025175 Views
    Don't Miss
    Gaming March 15, 2026

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026…

    Until Dawn remake dev Ballistic Moon has closed

    Games with loot boxes will be rated PEGI 16 from June, as part of sweeping changes to the age-rating system

    Google unveils new AI cloud tools to support game development

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026

    March 15, 20264 Views

    Until Dawn remake dev Ballistic Moon has closed

    March 15, 20265 Views

    Games with loot boxes will be rated PEGI 16 from June, as part of sweeping changes to the age-rating system

    March 15, 20263 Views
    Most Popular

    Outbreak turns 30

    March 14, 20250 Views

    New SuperBlack ransomware exploits Fortinet auth bypass flaws

    March 14, 20250 Views

    CDs Offer Guaranteed Returns in an Uncertain Market. Today’s CD Rates, March 14, 2025

    March 14, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.