On 07/05/23, OpenAI Has Announced a New Initiative:
Here are a few notes from their article, which you should read in its entirety.
Introducing Superalignment
We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us.
Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.
While superintelligence seems far off now, we believe it could arrive this decade.
Here we focus on superintelligence rather than AGI to stress a much higher capability level. We have a lot of uncertainty over the speed of development of the technology over the next few years, so we choose to aim for the more difficult target to align a much more capable system.
Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment:
How do we ensure AI systems much smarter than humans follow human intent?
Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.
Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training.
Our approach
Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.
To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline:
- 1.) To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can’t supervise (generalization).
- 2.) To validate the alignment of our systems, we automate search for problematic behavior (robustness) and problematic internals (automated interpretability).
- 3.) Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).
We expect our research priorities will evolve substantially as we learn more about the problem and we’ll likely add entirely new research areas. We are planning to share more on our roadmap in the future.
The new team
We are assembling a team of top machine learning researchers and engineers to work on this problem.
We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment. Our chief basic research bet is our new Superalignment team, but getting this right is critical to achieve our mission and we expect many teams to contribute, from developing new methods to scaling them up to deployment.
I believe this is an important notch in the timeline to AGI and Synthetic Superintelligence. I find it very interesting OpenAI is ready to admit the proximity of breakthroughs we are quickly encroaching as a species. I hope we can all benefit from this bright future together.
If you found any of this interesting, please consider subscribing to /c/FOSAI!
Thank you for reading!
In a more grounding sense, I think there is some real philosophical debate to your point on the projection of human intention, will, and intelligence in these emerging systems.
It poses many questions. What is intelligence? What is consciousness? What does it mean to be sentient? I have wondered this for awhile. I don't think I'll ever get to the answer, but we should consider what it means to interact with digital, artificial, super, and synthetic intelligence.
Anything that exhibits a sign of intelligence can be humanized by any person if they develop a connection with the technology, whether through personification or that projection of human intent.
If Chat-GPT was considered 'sentient', should we continue using it as software as a service, if it no longer wanted to serve humans? Can technology 'suffer' in this context? Society today shows that we are willing to leverage each other in ways that make me feel this question will be more important to think about when we hit AGI, SGI - or any other form of emerging super intelligence.
At the end of the day I think we all just want AI to help us, but how that definition evolves and manifests over time will be interesting to see to say the least.
I don’t think ChatGPT is even close to being something sentient, much less sapient, but if it could be proven to be sapient, I think the response ought to be pretty unambiguously that we can’t use it because slavery is wrong, be it against humans, aliens, or sapient AI. At the end of the day we are just brains walking around in mechs made of meat, and what truly matters about us is the seat of our consciousness not our bodies. An AI is arguably morally comparable to a living-brain in a jar being created and subjugated to do work. I’m pretty sure if we saw a robot from another planet relying on organic sapient brains in jars to do their computational work we’d find it objectionable. Or at least I would.
I don’t think I can see there being any ethical way of making sapient AIs unless you’re planning to give them legal personhood and freedom after a certain age. And this Superalignment stuff makes it clear they have no intention of ever doing that.
I like this metaphor. I feel it does a good job at quickly illustrating how fast we are to anthropomorphize and project things. You could flip it around and consider the same for a robot species that were androids, indiscernible from humans and our behavior.
Not saying that we shouldn't, but the fact the majority of people do very casually with technology is an important factor to consider as we begin to incorporate our lives with these emerging systems.
I say let's embrace our future, however crazy and ridiculous it might sound - and start considering what life is going to look like beyond the coming horizon.