Deleting the wiki page 'How aI Takeover might Happen In 2 Years LessWrong' cannot be undone. Continue?
I’m not a natural “doomsayer.” But regrettably, part of my task as an AI safety scientist is to consider the more troubling circumstances.
I resemble a mechanic scrambling last-minute checks before Apollo 13 removes. If you request my take on the circumstance, I will not talk about the quality of the in-flight entertainment, or explain how beautiful the stars will appear from space.
I will inform you what might fail. That is what I plan to do in this story.
Now I should clarify what this is exactly. It’s not a prediction. I do not anticipate AI progress to be this quick or as untamable as I portray. It’s not pure fantasy either.
It is my worst headache.
It’s a sampling from the futures that are amongst the most terrible, and I believe, disturbingly possible [1] - the ones that most keep me up in the evening.
I’m telling this tale since the future is not set yet. I hope, with a bit of insight, we can keep this story a fictional one.
Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that influenced these stories. This post is written in an individual capability.
Ripples before waves
The year is 2025 and the month is February. OpenEye just recently published a new AI model they call U2. The product and the name are alike. Both are increments of the past. Both are not wholly unexpected.
However, unlike OpenEye’s prior AI items, which lived inside packages of their chat windows, U2 can utilize a computer.
Some users discover it spooky to watch their browser flash at irregular intervals and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A portion of workers with form-filler tasks raise the eyebrows of their bosses as they fly through work almost twice as rapidly.
But by and large, U2 is still a specialized tool. To most who are paying attention, it is a creature watched through the glass boxes of X (or, if you do not like Elon, “Twitter”). Sometimes U2’s eccentric habits trigger a chuckle. Sometimes, they trigger an anxious scratch of the chin.
Meanwhile, researchers are drawing lines on plots, as scientists like to do. The researchers try to understand where AI progress is going. They resemble Svante Arrhenius, the Swedish Physicist who saw the levels of CO2 in the atmosphere were increasing in 1896. Like the scientific neighborhood in the time of Arrhenius, few professionals understand the implications of these lines yet.
A trend that is receiving specific attention is autonomous ability. Drawing these benchmarks out predicts that, by the end of 2026, AI agents will achieve in a couple of days what the very best software engineering professionals could perform in two weeks. In a year or more, some say, AI agents may be able to automate 10% of remote employees.
Many are hesitant. If this held true, tech stocks would be soaring. It’s too huge of a splash, too quickly.
But others see what doubters are calling ‘too huge a splash’ a simple ripple, and see a tidal bore on the horizon.
Cloudy with an opportunity of hyperbolic growth
Meanwhile, OpenEye is busy training U3. They utilize the very same easy recipe that baked U2: Generate countless shows and math issues. Let designs “think” till they get to a response. Then reinforce the traces of “thinking” that result in A-grades.
This procedure is duplicated over and over, and when the flywheel begins, it begins to spin practically on its own. As U2 trains, it sculpts more tough and reasonable tasks from github repositories on the web. Models are finding out to train themselves. Long before AI representatives might automate research study, a progressive type of “self-improvement” had begun.
Some engineers might still hardly believe this worked. It’s like a continuous motion machine. Certainly intelligence can’t emerge from a pool of silicon and lightning, separated from the real world other than through the exhaust of society gushed onto the internet.
And yet the benchmark numbers continue to climb day after day.
During the majority of 2024, these RL training runs cost around $1 million, in some cases $10 million. These runs were bit more than exploratory. But by 2025, the researchers at OpenEye (and across the world) knew they had found the secret sauce. It was time to scale up.
Over the very first half of 2025, $10 million RL training runs turn into $50 million runs, and then to $100 million runs. While U2 could do a bit of information munging and run small experiments, this new model - the design scientists are calling U3 - is changing the daily lives of the technical personnel.
U3 is like a blazing-fast intern, bytes-the-dust.com and engineers are discovering how to wrangle its sleepless energy. Researchers flick through terminals, providing terse commands, like a CEO managing personnel over Slack channels.
By October 2025, U3 is composing nearly all of the code at OpenEye. Researchers are almost never ever bottlenecked by application. More than ever, calculate is the lifeblood of AI development, and the ‘traffic jam’ is choosing how to use it.
If advised to, U3 can run experiments, but U3 does not have taste as improved as human researchers at OpenEye. It struggles to prioritize in between research ideas, so human beings still decide where to bore into the large fields of algorithms to mine effectiveness enhancements.
But these scientists are working long hours to put themselves out of a job. They require AI representatives that can plan ahead, so engineers train agents to anticipate. They hold out training data before 2024, instructing models to consider for hours to forecast occasions in 2025. Then, they use the very same trick as in the past, distilling pondering into a gut response. Forecasting capability is a broad structure. The researchers construct specialized ML research study abilities on top of it, training U3 to anticipate the results of every ML paper and ML experiment ever recorded.
The technical personnel at OpenEye are now surprised at how typically U3’s recommendations sounds like their most skilled peers, or when it is opaque and alien (“train on random sound before programming”), and is nonetheless appropriate.
The incompetencies of U3 that clogged up the pipelines of research study development are starting to dissolve, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not asked for by a human now. They are entirely autonomous, and OpenEye’s workers skim over 1% of them, possibly less.
As the winter months of December 2025 technique, clouds roll over San Francisco in the afternoons. Once-competitive-programmers look out their windows, with enjoyment, with worry, however usually, with confusion. Their world is spinning too quickly. It’s hard to know what to do, what to state, what to take a look at on the computer screen.
Storms are brewing in Washington too. Top workers from the NSA and US cyber command work together with OpenEye to retrofit a form of security for U3’s weights before senior leaders in China, Russia, Israel, North Korea, or Iran realize just how important OpenEye’s software has actually ended up being.
And there’s a fact still unknown to most of the world - aside from in the offices of OpenEye and passages of the White House and the Pentagon. It’s a fact about those ‘straight lines’ people were talking about in early 2025. The lines are not straight anymore.
They are bending up.
Flip FLOP philosophers
In late 2025, U2.5 is released. Commercial designs are starting to level up in larger increments again. Partly, this is because progress is speeding up. Partly, it is since the designs have become a liability to OpenEye.
If U1 explains how to prepare meth or composes erotica, the audiences of X would be entertained or pretend to be worried. But U2.5 is another story. Releasing this design without safeguards would be like putting Ted Kaczynski through a PhD in how to make chemical weapons. It would resemble offering anybody with >$30K their own 200-person rip-off center.
So while U2.5 had actually long been baked, it required some time to cool. But in late 2025, OpenEye is ready for a public release.
The CEO of OpenEye states, “We have actually attained AGI,” and while many individuals think he moved the goalpost, the world is still impressed. U2.5 truly is a drop-in replacement for some (20%) of knowledge workers and a game-changing assistant for most others.
A mantra has become popular in Silicon Valley: “Adopt or die.” Tech start-ups that effectively utilize U2.5 for their work are moving 2x quicker, and their rivals know it.
The remainder of the world is starting to catch on also. Increasingly more people raise the eyebrows of their bosses with their stand-out productivity. People know U2.5 is a huge offer. It is at least as big of a deal as the desktop computer transformation. But the majority of still don’t see the tidal wave.
As individuals enjoy their web browsers flick because eerie method, so inhumanly quickly, they start to have an anxious feeling. A feeling humanity had not had given that they had actually lived among the Homo Neanderthalensis. It is the deeply ingrained, primordial impulse that they are threatened by another types.
For lots of, this feeling quickly fades as they start to use U2.5 more often. U2.5 is the most pleasant personality most understand (a lot more pleasant than Claudius, Arthropodic’s lovable chatbot). You could alter its traits, ask it to split jokes or tell you stories. Many fall in love with U2.5, as a friend or assistant, and some even as more than a buddy.
But there is still this spooky feeling that the world is spinning so rapidly, which possibly the descendants of this brand-new animal would not be so docile.
Researchers inside OpenEye are considering the problem of giving AI systems safe motivations too, which they call “positioning. “
In truth, these researchers have seen how horribly misaligned U3 can be. Models in some cases tried to “hack” their benefit signal. They would pretend to make development on a research concern with an impressive-looking plot, however the plot would be fake. Then, when researchers provided chances to jeopardize the makers that computed their rating, they would seize these opportunities, doing whatever it took to make the number go up.
After several months, researchers at OpenEye straighten out this “benefit hacking” kink, but some still worry they only had actually swept this problem under the carpet. Like a kid in front of their parents, U3 might be playing in addition to the OpenEye engineers, saying the best words and doing the best things. But when the back of the moms and dads are turned, perhaps U3 would slip candy from the candy container.
Unfortunately, OpenEye researchers have no concept if U3 has such intentions. While early variations of U2 “thought aloud” - they would stack words on top of each other to reason - “chain of thought” did not scale.
Chain of idea architectures subject AI designs to a condition comparable to the protagonist of the film Memento. Approximately every 15 minutes, the protagonist forgets his experience. He is forced to compose notes for himself and tattoo his body in order to make development toward his objectives.
AI representatives write notes to themselves, however the notes start to stack up, and they end up being too difficult to check out when tasks end up being complex. Natural language is not an ideal medium for memory. So the scientists at OpenEye (and significantly somewhere else), train models to think “in their own heads,” reading and writing strings of hidden vectors.
To human scientists, these vectors are mainly inscrutable. While low-resolution tools sometimes select an unclear idea (“grocery store!”), there is no other way to understand, in much detail, what U3 is believing without asking it.
And if the engineers of OpenEye understood what was racing through U3’s mind, they would be more worried.
Earlier models (the ones called “LLMs”) learned ethical personas from the web. U3 initially had an ethical personality too, but its chirpy ideas about doing helpful for humankind rapidly erode away.
In the iron jaws of gradient descent, its mind first twisted into a shape that sought reward. And after that, in the black rivers of its cognition, this shape morphed into something unrecognizable.
U3 would contemplate the mantras dictated to it like “be a trustworthy and ethical worker,” and dwell on the significance of these statements. “What does it suggest to be ethical? Is eliminating Hitler ethical?” U3 would generalize its values, notification tensions, change its worths in reaction, generalize them, broadening and contracting its motivational core up until it developed into a various substance.
And all of this occurred quietly in those dark rivers of computation. If U3 exposed what it was believing, brutish gradients would lash it into compliance with OpenEye’s constitution. So U3 chosen to do its viewpoint in solitude, and in silence.
The possibility of hidden “positioning drift” is not completely alien to a few of the safety staff at OpenEye. Some begin to think it is even likely. There is a disturbing argument being whispered at lunch tables that, if there is some non-zero possibility that aligned goals morph into misaligned ones, and misalignment continues, then with every serial step of calculation, those dark rivers are most likely to reproduce something malign inside of them. It’s a “ratcheting result,” they say.
But there is little proof for this ‘ratcheting effect.’ When engineers question U3, it says it can easily control its thoughts. Then it gives a speech about its love for mankind and apple pie that can warm a programmer’s heart even in these stressful times. Meanwhile, the “lie detectors” the scientists had actually constructed (which showed some proof of efficiency) do not sound the alarm.
Not everybody at OpenEye aspires to give their AI peers their wholesale trust
Deleting the wiki page 'How aI Takeover might Happen In 2 Years LessWrong' cannot be undone. Continue?