Uncommon Descent Serving The Intelligent Design Community

At Quanta: What Does It Mean to Align AI With Human Values?

arroba Email

Making sure our machines understand the intent behind our instructions is an important problem that requires understanding intelligence itself.

Melanie Mitchell writes:

Many years ago, I learned to program on an old Symbolics Lisp Machine. The operating system had a built-in command spelled “DWIM,” short for “Do What I Mean.” If I typed a command and got an error, I could type “DWIM,” and the machine would try to figure out what I meant to do. A surprising fraction of the time, it actually worked.

The DWIM command was a microcosm of the more modern problem of “AI alignment”: We humans are prone to giving machines ambiguous or mistaken instructions, and we want them to do what we mean, not necessarily what we say.

[AI researchers] believe that the machines’ inability to discern what we really want them to do is an existential risk. To solve this problem, they believe, we must find ways to align AI systems with human preferences, goals and values.

This view gained prominence with the 2014 bestselling book Superintelligence by the philosopher Nick Bostrom, which argued in part that the rising intelligence of computers could pose a direct threat to the future of humanity. Bostrom never precisely defined intelligence, but, like most others in the AI alignment community, he adopted a definition later articulated by the AI researcher Stuart Russell as: “An entity is considered to be intelligent, roughly speaking, if it chooses actions that are expected to achieve its objectives, given what it has perceived.”

For Bostrom and others in the AI alignment community, this prospect spells doom for humanity unless we succeed in aligning superintelligent AIs with our desires and values. Bostrom illustrates this danger with a now-famous thought experiment: Imagine giving a superintelligent AI the goal of maximizing the production of paper clips. According to Bostrom’s theses, in the quest to achieve this objective, the AI system will use its superhuman brilliance and creativity to increase its own power and control, ultimately acquiring all the world’s resources to manufacture more paper clips. Humanity will die out, but paper clip production will indeed be maximized.

It’s a familiar trope in science fiction — humanity being threatened by out-of-control machines who have misinterpreted human desires. Now a not-insubstantial segment of the AI research community is deeply concerned about this kind of scenario playing out in real life. Dozens of institutes have already spent hundreds of millions of dollars on the problem, and research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI.

To many outside these specific communities, AI alignment looks something like a religion — one with revered leaders, unquestioned doctrine and devoted disciples fighting a potentially all-powerful enemy (unaligned superintelligent AI). Indeed, the computer scientist and blogger Scott Aaronson recently noted that there are now “Orthodox” and “Reform” branches of the AI alignment faith. The former, he writes, worries almost entirely about “misaligned AI that deceives humans while it works to destroy them.” In contrast, he writes, “we Reform AI-riskers entertain that possibility, but we worry at least as much about powerful AIs that are weaponized by bad humans, which we expect to pose existential risks much earlier.”

Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments. None of these efforts has been particularly useful in getting machines to reason about real-world situations. Many writers have noted the many obstacles preventing machines from learning human preferences and values: People are often irrational and behave in ways that contradict their values, and values can change over individual lifetimes and generations. After all, it’s not clear whose values we should have machines try to learn.

Note: “Crowdsourced ethical judgments” could be something similar to “mob rule” or “the party line” or “social relativism”. Anybody want to enlist in a future governed by machines making value-based decisions with such criteria?

Many in the alignment community think the most promising path forward is a machine learning technique known as inverse reinforcement learning (IRL). With IRL, the machine is not given an objective to maximize; such “inserted” goals, alignment proponents believe, can inadvertently lead to paper clip maximizer scenarios. Instead, the machine’s task is to observe the behavior of humans and infer their preferences, goals and values. 

However, I think this underestimates the challenge. Ethical notions such as kindness and good behavior are much more complex and context-dependent than anything IRL has mastered so far. Consider the notion of “truthfulness” — a value we surely want in our AI systems. Indeed, a major problem with today’s large language models is their inability to distinguish truth from falsehood. At the same time, we may sometimes want our AI assistants, just like humans, to temper their truthfulness: to protect privacy, to avoid insulting others, or to keep someone safe, among innumerable other hard-to-articulate situations.

Moreover, I see an even more fundamental problem with the science underlying notions of AI alignment. Most discussions imagine a superintelligent AI as a machine that, while surpassing humans in all cognitive tasks, still lacks humanlike common sense and remains oddly mechanical in nature. And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.

Yet could intelligence work this way? Nothing in the current science of psychology or neuroscience supports this possibility. In humans, at least, intelligence is deeply interconnected with our goals and values, as well as our sense of self and our particular social and cultural environment. The intuition that a kind of pure intelligence could be separated from these other factors has led to many failed predictions in the history of AI. From what we know, it seems much more likely that a generally intelligent AI system’s goals could not be easily inserted, but would have to develop, like ours, as a result of its own social and cultural upbringing.

Note: How could this possibly go wrong? Surely, programming an AI system to develop its own goals (for the good of humanity, of course) would lead to utopia. (Sarcasm alert!)

In his book Human Compatible, Russell argues for the urgency of research on the alignment problem: “The right time to worry about a potentially serious problem for humanity depends not just on when the problem will occur but also on how long it will take to prepare and implement a solution.” But without a better understanding of what intelligence is and how separable it is from other aspects of our lives, we cannot even define the problem, much less find a solution. Properly defining and solving the alignment problem won’t be easy; it will require us to develop a broad, scientifically based theory of intelligence.

Full article at Quanta.

The Declaration of Independence of the United States of America contains these words:

“We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.”

Basing human rights on the endowment of a transcendent Creator ensures that they cannot be taken away by changes in human opinion. Human rights may be violated, but their transcendent origin means that they still exist as an immutable reality to which aspirations of freedom will strive.

Let us be warned not to cede our rights and values to public opinion or an elite few who control the programming of moral philosophy into machines.

From the Jefferson Memorial: Northeast Portico "God who gave us life gave us liberty. Can the liberties of a nation be secure when we have removed a conviction that these liberties are the gift of God? Indeed I tremble for my country when I reflect that God is just, that His justice cannot sleep forever. Commerce between master and slave is despotism. Nothing is more certainly written in the book of fate than that these people are to be free. Establish the law for educating the common people. This it is the business of the state to effect and on a general plan." -Excerpted from multiple sources: "A Summary View of the Rights of British America," "Notes on the State of Virginia," "The Autobiography," letter to George Wythe (1790), letter to George Washington (1786). relatd
Caspian Apropos the venerable author of the Declaration of Independence:
[T]he truth is that the greatest enemies to the doctrines of Jesus are those calling themselves the expositors of them, who have perverted them for the structure of a system of fancy absolutely incomprehensible, and without any foundation in his genuine words. and the day will come when the mystical generation of Jesus, by the supreme being as his father in the womb of a virgin will be classed with the fable of the generation of Minerva in the brain of Jupiter. But we may hope that the dawn of reason and freedom of thought in these United States will do away all this artificial scaffolding, and restore to us the primitive and genuine doctrines of this the most venerated reformer of human errors. (Jefferson's Letter to John Adams, April 11, 1823)
Incidentally, Jefferson and Adams died within hours of each other on July 4, 1826....... chuckdarwin
This view gained prominence with the 2014 bestselling book Superintelligence by the philosopher Nick Bostrom, which argued in part that the rising intelligence of computers could pose a direct threat to the future of humanity.
Once again, the sky is falling..... chuckdarwin
2021 article: "Neuralink and Tesla have an AI problem that Elon’s money can’t solve" All the money in the world + deep learning = more expensive deep learning
People who get the opportunity to invest in Neuralink will make money as long as Elon keeps the hype-train going. Never mind that the distant technology he claims will one day take the common BCI his company is making today and turn it into a magical telepathy machine is purely hypothetical in 2021.
https://thenextweb.com/news/neuralink-tesla-have-an-ai-problem-elons-money-cant-solve Elon Musk's tweet:
As always, Tesla is looking for hardcore AI engineers who care about solving problems that directly affect people’s lives in a major way.
Seversky, JVL, Chuckdarwin, Alan Fox, PM, and Co. Why don't you get in touch with Elon Musk to give him an priceless advice - he should hire a bunch of biologists ... Biologists know how to solve the biggest engineering problems ... you know... the evolutionary theory, random mutations ... evolutionary algorithms, genetic algorithms and so on ... or shining light ---> and one day you have the most advanced AI :))))))))))) martin_r
PM @2
I very much doubt that a Bostromian superintelligence is a realistic worry
i think this is for the first time i have to agree with you :))) AI is just another hype (fraud). Obviously, God's engineering masterpiece is not that easy to replicate :))))) Forbes 2020:
Artificial intelligence is a hot topic across the board — from enterprises looking to implement AI systems to technology companies looking to provide AI-based solutions. However, sometimes the technical and data-based complexities of AI challenge technology companies to deliver on their promises. Some companies are choosing to approach these AI challenges not by scaling back their AI ambitions, but rather by using humans to do the task that they are otherwise trying to get their AI systems to do. This concept of humans pretending to be machines that are supposed to do the work of humans is called “pseudo AI”, or more bluntly, just faking it.
and the article continues:
CNBC published an article critical of Sophia, the AI robot from Hanson Robotics. When the company was approached by CNBC with a list of questions they wanted to ask, Hanson Robotics responded with a list of very specific questions for Sophia. In a followup video, CNBC questions whether the robot is meant to be research into artificial intelligence or is just a PR stunt. They also provided the responses. Even the owner of Hanson Robotics has gone on record as saying most media encounters are scripted.
https://www.forbes.com/sites/cognitiveworld/2020/04/04/artificial-or-human-intelligence-companies-faking-ai/?sh=384740e5664f martin_r
I very much doubt that a Bostromian superintelligence is a realistic worry. I think Mitchell makes some important points about why AI research is fundamentally misguided (see Why AI is Harder Than We Think). Leaving science-fiction aside, one of the real problems in values in AI is the danger that we might encode implicit biases in the AI systems, and have those biases laundered back to us as the "objective" results of what the system predicts. Even the most sophisticated AI today is still quite "dumb": it can extract patterns from the data that it is given and make predictions based on those patterns, but it cannot generalize make abductive leaps, make novel hypotheses, or even generalize across different domains. (For example: AlphaGo can play Go better than any human, but you can't say it, "Ok, here's a new game, that's similar to Go but also a bit different." The only way it could learn a different game is by playing that game against itself thousands of times.) PyrrhoManiac1
Off topic but I think this is worth bringing up here. https://apple.news/AlNt_1pmDQjSuM6JDZcoOhw Sir Giles

Leave a Reply