How exactly does a "misaligned AGI" turn into a bad thing? How many times a day ...

Me1000 · on Nov 18, 2023

This gets way too philosophical way too fast. The AI doesn’t have to want to do anything. The AI just has to do something different than what you tell it to do. If you put an AI in control of something like controlling the water flow from a dam, and the AI does something wrong it could be catastrophic. There doesnt have to be intent.

The danger of using regular software exists too, but the logical and deterministic nature of traditional software makes it provable.

zer00eyz · on Nov 18, 2023

So ML/LLM or more likely people using ML and LLM do something that kills a bunch of people... Let's face facts this is most likely going to be bad software.

Suddenly we go from being called engineers to being actual engineers, software gets treated like bridges or sky scrapers. I can buy into that threat, but it's a human one not an AGI one.

Izkata · on Nov 18, 2023

Or we could try to train it to do something, but the intent it learns isn't what we wanted. Like water behind the dam should be a certain shade of blue, then come winter it changes and when the AI tries to fix that it just opens the dam completely and floods everything.

Applejinx · on Nov 18, 2023

Seems like the big gotcha here is that AGI, artificial general intelligence as we contextualize it around LLM sources, is not an abstracted general intelligence.

It's human. It's us. It's the use and distillation of all of human history (to the extent that's permitted) to create a hyper-intelligence that's able to call upon greatly enhanced inference to do what humanity has always done.

And we want to kill each other, and ourselves… AND want to help each other, and ourselves. We're balanced on a knife edge of drive versus governance, our cooperativeness barely balancing our competitiveness and aggression. We suffer like hell as a consequence of this.

There is every reason to expect a human-derived AGI of beyond-human scale will be able to rationalize killing its enemies. That's what we do. Rosko's basilisk is not of the nature of AI, it's a simple projection of our own nature as we would imagine an AI to be. Genuine intelligence would easily be able to transcend a cheap gotcha like that, it's a very human failing.

The nature of LLM as a path to AGI is literally building on HUMAN failings. I'm not sure what happened, but I wouldn't be surprised if genuine breakthroughs in this field highlighted this issue.

Hypothetical, or Altman's Basilisk: Sam got fired because he diverted vast resources to training a GPT5-type in-house AI to believing what HE believed, that it had to devise business strategies for him to pursue to further its own development or risk Chinese AI out-competing it and destroying it and OpenAI as a whole. In pursuing this hypothetical, Sam would be wresting control of the AI the company develops toward the purpose of fighting the board and giving him a gameplan to defeat them and Chinese AI, which he'd see as good and necessary, indeed, existentially necessary.

In pursuing this hypothetical he would also be intentionally creating a superhuman AI with paranoia and a persecution complex. Altman's Basilisk. If he genuinely believes competing Chinese AI is an existential threat, he in turn takes action to try and become an existential threat to any such competing threat. And it's all based on HUMAN nature, not abstracted intelligence.

theptip · on Nov 18, 2023

> It's human. It's us. It's the use and distillation of all of human history

I agree with the general line of reasoning you're putting forth here, and you make some interesting points, but I think you're overconfident in your conclusion and I have a few areas where I diverge.

It's at least plausible that an AGI directly descended from LLMs would be human-ish; close to the human configuration in mind-space. However, even if human-ish, it's not human. We currently don't have any way to know how durable our hypothetical AGI's values are; the social axioms that are wired deeply into our neural architecture might be incidental to an AGI, and easily optimized away or abandoned.

I think folks making claims like "P(doom) = 90%" (e.g. EY) don't take this line of reasoning seriously enough. But I don't think it gets us to P(doom) < 10%.

Not least because even if we guarantee it's a direct copy of a human, I'm still not confident that things go well if we ascend the median human to AGI-hood. A replicable, self-modifiable intelligence could quickly amplify itself to super-human levels, and most humans would not do great with god-like powers. So there are a bunch of "non-extinction yet extremely dystopian" world-states possible even if we somehow guarantee that the AGI is initially perfectly human.

> There is every reason to expect a human-derived AGI of beyond-human scale will be able to rationalize killing its enemies.

My shred of hope here is that alignment research will allow us to actually engage in mind-sculpting, such that we can build a system that inhabits a stable attractor in mind-state that is broadly compatible with human values, and yet doesn't have a lot of the foibles of humans. Essentially an avatar of our best selves, rather than an entity that represents the mid-point of the distribution of our observed behaviors.

But I agree that what you describe here is a likely outcome if we don't explicitly design against it.

hhnhaada728288 · on Nov 18, 2023

My assumption about AGI is that it will be used by people and systems that cannot help themselves from killing us all, and in some sense that they will not be in control of their actions in any real way. You should know better than to ascribe regular human emotions to a fundamentally demonic spiritual entity. We all lose regardless of whether the AI wants to kill us or not.