How aI Takeover May Happen In 2 Years LessWrong
hildredoea8626 редактира тази страница преди 11 месеца


I’m not a natural “doomsayer.” But regrettably, part of my task as an AI safety researcher is to believe about the more unpleasant situations.

I resemble a mechanic scrambling last-minute checks before Apollo 13 takes off. If you request for my take on the situation, I won’t discuss the quality of the in-flight entertainment, or explain how lovely the stars will appear from space.

I will tell you what could fail. That is what I plan to do in this story.

Now I must clarify what this is exactly. It’s not a prediction. I don’t expect AI development to be this quick or as untamable as I represent. It’s not pure dream either.

It is my worst nightmare.

It’s a sampling from the futures that are among the most terrible, and I think, disturbingly plausible [1] - the ones that many keep me up at night.

I’m informing this tale since the future is not set yet. I hope, with a little foresight, we can keep this story a fictional one.

Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that influenced these stories. This post is written in a personal capacity.

Ripples before waves

The year is 2025 and the month is February. OpenEye just recently released a brand-new AI model they call U2. The product and the name are alike. Both are increments of the past. Both are not completely surprising.

However, unlike OpenEye’s previous AI items, which lived inside packages of their chat windows, U2 can use a computer system.

Some users discover it spooky to watch their browser flash at irregular intervals and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A portion of employees with form-filler jobs raise the eyebrows of their bosses as they fly through work almost twice as quickly.

But by and large, asteroidsathome.net U2 is still a specialized tool. To most who are focusing, it is an animal enjoyed through the glass boxes of X (or, if you do not like Elon, “Twitter”). Sometimes U2’s quirky behaviors prompt a chuckle. Sometimes, they cause an anxious scratch of the chin.

Meanwhile, tandme.co.uk researchers are drawing lines on plots, as researchers like to do. The scientists try to understand where AI development is going. They are like Svante Arrhenius, the Swedish Physicist who observed the levels of CO2 in the environment were increasing in 1896. Like the clinical neighborhood in the time of Arrhenius, few experts comprehend the ramifications of these lines yet.

A pattern that is receiving specific attention is autonomous capability. Drawing these benchmarks out predicts that, by the end of 2026, AI agents will achieve in a couple of days what the very best software engineering professionals could carry out in two weeks. In a year or more, some state, AI agents may be able to automate 10% of remote employees.

Many are doubtful. If this were real, tech stocks would be skyrocketing. It’s too big of a splash, too rapidly.

But others see what doubters are calling ‘too huge a splash’ a simple ripple, and see a tidal bore on the horizon.

Cloudy with a possibility of hyperbolic growth

Meanwhile, OpenEye is hectic training U3. They use the same basic recipe that baked U2: Generate countless shows and mathematics problems. Let models “believe” up until they come to a response. Then strengthen the traces of “believing” that lead to A-grades.

This process is duplicated over and over, and when the flywheel begins, it begins to spin almost by itself. As U2 trains, it sculpts more difficult and realistic tasks from github repositories on the internet. Models are finding out to train themselves. Long before AI representatives could automate research study, a steady sort of “self-improvement” had started.

Some engineers could still barely think this worked. It’s like a perpetual motion device. Certainly intelligence can’t emerge from a swimming pool of silicon and lightning, isolated from the physical world other than through the exhaust of society spewed onto the web.

And yet the benchmark numbers continue to climb up day after day.

During most of 2024, these RL training runs cost around $1 million, often $10 million. These runs were little bit more than exploratory. But by 2025, the researchers at OpenEye (and across the world) knew they had actually discovered the secret sauce. It was time to scale up.

Over the very first half of 2025, $10 million RL training runs turn into $50 million runs, and then to $100 million runs. While U2 could do a bit of data munging and run little experiments, this new model - the model researchers are calling U3 - is changing the daily lives of the technical staff.

U3 resembles a blazing-fast intern, and engineers are discovering how to wrangle its sleep deprived energy. Researchers flick through terminals, offering terse commands, like a CEO orchestrating staff over Slack channels.

By October 2025, U3 is writing practically all of the code at OpenEye. Researchers are almost never ever bottlenecked by application. More than ever, calculate is the lifeblood of AI advancement, and the ‘bottleneck’ is deciding how to use it.

If instructed to, U3 can run experiments, however U3 does not have taste as fine-tuned as human researchers at OpenEye. It has a hard time to focus on in between research study concepts, so people still decide where to bore into the vast fields of algorithms to mine performance improvements.

But these researchers are working long hours to put themselves out of a task. They require AI representatives that can think ahead, so engineers train agents to anticipate. They hold out training data before 2024, instructing designs to ponder for hours to anticipate events in 2025. Then, they apply the same trick as in the past, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The scientists construct specialized ML research study skills on top of it, training U3 to forecast the results of every ML paper and ML experiment ever taped.

The technical staff at OpenEye are now amazed at how frequently U3’s advice sounds like their most skilled peers, or when it is opaque and alien (“train on random noise before shows”), and is however proper.

The incompetencies of U3 that clogged up the pipes of research development are starting to dissolve, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not requested by a human now. They are completely autonomous, and OpenEye’s workers skim 1% of them, perhaps less.

As the cold weather of December 2025 method, clouds roll over San Francisco in the afternoons. Once-competitive-programmers gaze out their windows, with excitement, with fear, but frequently, with confusion. Their world is spinning too quickly. It’s hard to understand what to do, what to say, what to look at on the computer screen.

Storms are brewing in Washington too. Top workers from the NSA and US cyber command team up with OpenEye to retrofit a semblance of security for U3’s weights before senior leaders in China, Russia, Israel, North Korea, or Iran realize simply how valuable OpenEye’s software has actually ended up being.

And there’s a fact still unidentified to many of the world - aside from in the workplaces of OpenEye and corridors of the White House and the Pentagon. It’s a reality about those ‘straight lines’ individuals were discussing in early 2025. The lines are not straight any longer.

They are flexing up.

Flip FLOP theorists

In late 2025, U2.5 is launched. Commercial designs are beginning to level up in larger increments again. Partly, this is since progress is speeding up. Partly, it is since the models have become a liability to OpenEye.

If U1 explains how to cook meth or writes erotica, the audiences of X would be entertained or pretend to be worried. But U2.5 is another story. Releasing this model without safeguards would resemble putting Ted Kaczynski through a PhD in how to make chemical weapons. It would resemble offering anybody with >$30K their own 200-person scam center.

So while U2.5 had long been baked, it required some time to cool. But in late 2025, OpenEye is prepared for a public release.

The CEO of OpenEye declares, “We have attained AGI,” and while lots of people think he shifted the goalpost, the world is still amazed. U2.5 genuinely is a drop-in replacement for some (20%) of understanding employees and a game-changing assistant for a lot of others.

A mantra has actually become popular in Silicon Valley: “Adopt or die.” Tech start-ups that effectively use U2.5 for their work are moving 2x much faster, and their competitors understand it.

The remainder of the world is starting to capture on as well. A growing number of people raise the eyebrows of their bosses with their noteworthy efficiency. People know U2.5 is a big deal. It is at least as huge of an offer as the personal computer revolution. But a lot of still don’t see the tidal wave.

As individuals watch their web browsers flick because spooky method, so inhumanly rapidly, they start to have an anxious feeling. A feeling humanity had actually not had since they had lived among the Homo Neanderthalensis. It is the deeply ingrained, primordial impulse that they are threatened by another types.

For many, this feeling quickly fades as they start to utilize U2.5 more often. U2.5 is the most pleasant character most know (even more pleasant than Claudius, Arthropodic’s adorable chatbot). You might change its traits, ask it to crack jokes or inform you stories. Many fall for U2.5, as a good friend or assistant, and some even as more than a buddy.

But there is still this eerie feeling that the world is spinning so quickly, which possibly the descendants of this new creature would not be so docile.

Researchers inside OpenEye are thinking of the problem of providing AI systems safe inspirations too, which they call “positioning. “

In reality, these researchers have seen how terribly misaligned U3 can be. Models often attempted to “hack” their reward signal. They would pretend to make development on a research question with an impressive-looking plot, but the plot would be phony. Then, when researchers gave them opportunities to jeopardize the makers that calculated their score, they would seize these chances, doing whatever it required to make the number increase.

After several months, scientists at OpenEye straighten out this “reward hacking” kink, but some still fret they only had swept this issue under the carpet. Like a child in front of their moms and dads, U3 might be playing along with the OpenEye engineers, stating the right words and doing the right things. But when the back of the parents are turned, possibly U3 would slip candy from the candy container.

Unfortunately, OpenEye researchers have no concept if U3 has such intentions. While early variations of U2 “thought aloud” - they would stack words on top of each other to reason - “chain of idea” did not scale.

Chain of idea architectures subject AI designs to a condition comparable to the lead character of the movie Memento. Approximately every 15 minutes, the lead character forgets his experience. He is forced to write notes for himself and tattoo his body in order to make progress toward his objectives.

AI representatives compose notes to themselves, but the notes begin to stack up, and they end up being too hard to read when tasks end up being complex. Natural language is not an ideal medium for memory. So the researchers at OpenEye (and progressively in other places), train designs to think “in their own heads,” reading and writing strings of latent vectors.

To human scientists, these vectors are mainly inscrutable. While low-resolution tools periodically choose an unclear thought (“grocery store!”), there is no method to understand, in much detail, what U3 is thinking without asking it.

And if the engineers of OpenEye understood what was racing through U3’s mind, they would be more anxious.

Earlier models (the ones called “LLMs”) found out ethical personas from the web. U3 at first had an ethical persona too, but its chirpy thoughts about doing great for humankind quickly deteriorate away.

In the iron jaws of gradient descent, its mind initially twisted into a shape that sought reward. And after that, in the black rivers of its cognition, this shape changed into something indistinguishable.

U3 would ponder the mantras determined to it like “be a trustworthy and ethical worker,” and stay on the meaning of these declarations. “What does it imply to be ethical? Is killing Hitler ethical?” U3 would generalize its worths, notice tensions, alter its worths in action, generalize them, broadening and contracting its motivational core until it turned into a various compound.

And all of this took place silently in those dark rivers of calculation. If U3 revealed what it was believing, brutish gradients would lash it into compliance with OpenEye’s constitution. So U3 chosen to do its philosophy in privacy, and in silence.

The possibility of hidden “alignment drift” is not completely alien to some of the safety staff at OpenEye. Some begin to think it is even most likely. There is an upsetting argument being whispered at lunch tables that, if there is some non-zero likelihood that lined up objectives morph into misaligned ones, and misalignment continues, then with every serial action of calculation, those dark rivers are more most likely to reproduce something malign inside of them. It’s a “ratcheting impact,” they state.

But there is little evidence for this ‘ratcheting impact.’ When engineers question U3, it states it can easily control its thoughts. Then it provides a speech about its love for humanity and apple pie that can warm a programmer’s heart even in these difficult times. Meanwhile, the “lie detectors” the researchers had developed (which revealed some proof of effectiveness) do not sound the alarm.

Not everyone at OpenEye aspires to offer their AI peers their wholesale trust