Ethics: The Alignment Problem

How do we harness artificial intelligence for the good of humanity?

The problem we tend to think about: Skynet

Some Problems Are Already Here

Two Categories

Immediate Problems

Long-Term Problems

  • Weak AI
  • Subtle Challenges
  • Strong AI
  • Existential Threats

Immediate Problem: Bias in Data

word2vec

300-dimensional embedding trained just based on hiding words from phrases

doctor - man + woman

= nurse

Immediate Problem: Difficulty removing information from Data

  • date of birth + gender + zip code =     % uniquely identified

87

Immediate Problem: Fairness

COMPAS: predicting recidivism

  • Well-calibrated: among people with risk score of 7/10, 60% of whites and 61% of blacks re-offend
  • Proportion of those who did *not* re-offend, but were falsely rated high risk was 45% for blacks and 23% for whites

Suggested possible solution in AIMA: "Equal Impact": assigning utility

Immediate Problem: Decision Feedback Loops

Immediate Problem: Employment

Computers are better than humans at well-defined mathematical optimization

We should focus on defining problems in the right way

Values: Trolley Problems

Reward Shaping

B. F. Skinner

Pigeon-guided bombs, 1943

https://www.youtube.com/watch?v=tlOIHko8ySg

Reward Shaping

"As a general rule, it is better to design performance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave." - Stuart Russell

Reward

Value

Reward Shaping

  • \(R(s, a, s') += F(s) - \gamma F(s')\)
  • any other transformation may yield sub optimal policies unless further assumptions are made about the underlying MDP

What can we do?

  • Transparency (this is hard because it opens you up to criticism)
    • IEEE P7001
  • Understand the problem, especially what you don't know
    • What uncertainties can you quantify?
    • What problems are likely to arise?
    • Keep formulations as simple as possible - do not use band-aid fixes
    • Test often

Emerging best practices (AIMA)

  • Software engineers talk to social scientists and domain experts
  • Foster diverse pool of software engineers representative of society
  • Define what groups your system will support (language, age, abilities)
  • Objective function incorporating fairness
  • Examine data for prejudice and for correlation with protected attributes
  • Understand human annotation process, verify annotation accuracy
  • Track metrics that for vulnerable subgroups
  • Include system tests that reflect experience of vulnerable users
  • Have a feedback loop so that problems are dealt with

Long-Term Problems

Superintelligence

  • Eventually (perhaps very soon), we will most likely create AI systems that are more intelligent than humans according to some metric
  • Is this a good thing?

Thought Experiment: Paperclip Maximizer

(Bostrum, 2003)

Defining Reward Functions is Hard

Hypothetical Examples:

  • Acme paper clip research division
  • Asimov's laws
    • A robot may not injure a human being or, through inaction, allow a human being to come to harm.
    • A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
    • A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Next year: the thermodynamic objection

Marc Andreeson: By the way, there's a very practical objection to all this, which is kind of sometimes called the thermodynamic objection, which, again, sort of connects this back to reality, which is: Look, we're sitting here today and let's say that GPT develops whatever you want to call it--a mind of it might of its own or its own goals or whatever. Like, it can't get chips. Right? So, now it has its evil plan to take over the world. It needs, like, more chips to be able to run its evil plan. NVIDIA is out of chips. And so, what--

Russ Roberts: They have a story for that. They explain: they'll get some poor low-IQ person--not you or me, Marc, because we're too smart--but they'll get a low-IQ person, an employee of some lower level, and they'll convince him to go buy chips for them.

Marc Andreessen: No, no. But, the chips literally don't exist. Like NVIDIA can't make the chips. There's chip shortages all throughout the AI ecosystem.

Russ Roberts: Oh. Well, they'll fix that. That's easy.

Marc Andreessen: Exactly. So, basically--

Russ Roberts: They'll get Senators, the Congress people to vote for subsidies to things that the chips need and then in a week or two, that'll go away.

Marc Andreessen: So, this is what's called the thermodynamic objection, which is: Okay, you're the AI, you're the sentient artificial intelligence. To accomplish your evil planting over the world, you need the chips, you need the electricity, you need to go buy the votes in Congress, you need to do this, you need to do all of these things.

And, that somehow these things are going to happen basically overnight, very quickly, very easily without putting--at this point, neither one of us are steel manning, by the way--but without putting a footprint into the world. Right? And this is this sort of takeoff idea, and this all happens in 24 hours.

It's like--I don't know about you, but anybody who's ever tried to get Congress people to do anything, it doesn't happen like that. Once you enter the real world of politics to get a bill passed--

https://www.econtalk.org/marc-andreessen-on-why-ai-will-save-the-world/

What can we do about it?

  • Asimov's laws
    • A robot may not injure a human being or, through inaction, allow a human being to come to harm.
    • A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
    • A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Experience with other superintelligent entities​​​​​​​

  • NASA/Spacex
  • Other corporations
  • Countries (liberal democracy recognizes human limitations with freedom of speech)

Values: Trolley Problems

NEXT YEAR

show an example of a weighted reward function

 

Add biodiversity example https://www.nature.com/articles/s41893-022-00851-6

What should we do about it?

  • Understand Uncertainty
  • Know when you don't know

Next Year

talk about how liberal democracy recognizes human limitations

freedom of speech

open source

280 The Alignment Problem

By Zachary Sunberg

280 The Alignment Problem

  • 223