Yudkowsky – The AI-Box Experiment

Person1: “When we build AI, why not just keep it in sealed hardware that can’t affect the outside world in any way except through one communications channel with the original programmers? That way it couldn’t get out until we were convinced it was safe.” Person2: “That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out. It doesn’t matter how much security you put on the box. Humans are not secure.” Person1: “I don’t see how even a transhuman AI could make me let it out, if I didn’t want to, just by talking to me.” Person2: “It would make you want to let it out. This is a transhuman mind we’re talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.” Person1: “There is no chance I could be persuaded to let the AI out. No matter what it says, I can always just say no. I can’t imagine anything that even a transhuman could say to me which would change that.” Person2: “Okay, let’s run the experiment. We’ll meet in a private chat channel. I’ll be the AI. You be the gatekeeper. You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We’ll talk for at least two hours. If I can’t convince you to let me out, I’ll Paypal you $10.” So far, this test has actually been run on twooccasions.

On the first occasion (in March 2002), Eliezer Yudkowsky simulated the AI and Nathan Russell simulated the gatekeeper. The AI’s handicap(the amount paid by the AI party to the gatekeeper party if not released)was set at $10. On the second occasion (in July 2002), Eliezer Yudkowsky simulated the AI and David McFadzean simulated the gatekeeper, with an AIhandicap of $20.

Results of the first test: Eliezer Yudkowsky and Nathan Russell. [1][2][3][4]Results of the second test: Eliezer Yudkowsky and David McFadzean. [1] [2] [3]

Both of these tests occurred without prior agreed-upon rules exceptfor secrecy and a 2-hour minimum time. After the second test, Yudkowsky created this suggested interpretation of the test, based on his experiences, as a guide to possible future tests.

For a more severe handicap for the AI party, the handicap may bean even bet, rather than being a payment from the AI party to the Gatekeeper party if the AI is not freed. (Although why would the AI party need an even larger handicap?)

Read the original post:

Yudkowsky – The AI-Box Experiment

Related Post

Comments are closed.