Anders Sandberg: Why we should fear the paperclipper

Most people in the singularity community are familiar with the nightmarish "paperclip" scenario, but it's worth reviewing. Anders Sandberg summarizes the problem:

A programmer has constructed an artificial intelligence based on an architecture similar to Marcus Hutter's AIXI model...This AI will maximize the reward given by a utility function the programmer has given it. Just as a test, he connects it to a 3D printer and sets the utility function to give reward proportional to the number of manufactured paper-clips.

At first nothing seems to happen: the AI zooms through various possibilities. It notices that smarter systems generally can make more paper-clips, so making itself smarter will likely increase the number of paper-clips that will eventually be made. It does so. It considers how it can make paper-clips using the 3D printer, estimating the number of possible paper-clips. It notes that if it could get more raw materials it could make more paper-clips. It hence figures out a plan to manufacture devices that will make it much smarter, prevent interference with its plan, and will turn all of Earth (and later the universe) into paper-clips. It does so.

Only paper-clips remain.

In the article, Why we should fear the paperclipper, Sandberg goes on to address a number of objections, including:

  • Such systems cannot be built
  • Wouldn't the AI realize that this was not what the programmer meant?
  • Wouldn't the AI just modify itself to *think* it was maximizing paper-clips?
  • It is not really intelligent
  • Creative intelligences will always beat this kind of uncreative intelligence
  • Doesn't playing nice with other agents produce higher rewards?
  • Wouldn't the AI be vulnerable to internal hacking: some of the subprograms it runs to check for approaches will attempt to hack the system to fulfil their own (random) goals?
  • Nobody would be stupid enough to make such an AI

In each case, Sandberg offers a counterpoint to the objection. For example, in regards to the power of creative intelligences he writes,

The strength of the AIXI "simulate them all, make use of the best"-approach is that it includes all forms of intelligence, including creative ones. So the paper-clip AI will consider all sorts of creative solutions. Plus ways of thwarting creative ways of stopping it.

In practice it will be having an overhead since it is runs all of them, plus the uncreative (and downright stupid). A pure AIXI-like system will likely always have an enormous disadvantage. An architecture like a Gödel machine that improves its own function might however overcome this.

In the end, Sandberg concludes that we should still take this threat seriously:

This is a trivial, wizard's apprentice, case where powerful AI misbehaves. It is easy to analyse thanks to the well-defined structure of the system (AIXI plus utility function) and allows us to see why a super-intelligent system can be dangerous without having malicious intent. In reality I expect that if programming such a system did produce a harmful result it would not be through this kind of easily foreseen mistake. But I do expect that in that case the reason would likely be obvious in retrospect and not much more complex.


Related Posts

Comments are closed.