{"id":198886,"date":"2017-06-15T07:19:15","date_gmt":"2017-06-15T11:19:15","guid":{"rendered":"http:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/this-backflipping-noodle-has-a-lot-to-teach-us-about-ai-safety-the-verge\/"},"modified":"2017-06-15T07:19:15","modified_gmt":"2017-06-15T11:19:15","slug":"this-backflipping-noodle-has-a-lot-to-teach-us-about-ai-safety-the-verge","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/ai\/this-backflipping-noodle-has-a-lot-to-teach-us-about-ai-safety-the-verge\/","title":{"rendered":"This backflipping noodle has a lot to teach us about AI safety &#8211; The Verge"},"content":{"rendered":"<p><p>    AI isnt going to be a threat to humanity because its evil or    cruel, AI will be a threat to humanity because we havent    properly explained what it is we want it to do. Consider the    classic paperclip    maximizer thought experiment, in which an all-powerful AI    is told, simply, make paperclips. The AI, not constrained by    any human morality or reason, does so, eventually transforming    all resources on Earth into paperclips, and wiping out our    species in the process. As with any relationship, when talking    to our computers, communication is key.  <\/p>\n<p>    Thats why a new piece of research published yesterday by    Googles DeepMind    and the Elon Musk-funded     OpenAI institute is so interesting. It offers a simple way    for humans to give feedback to AI systems  crucially, without    the instructor needing to know anything about programming or    artificial intelligence.  <\/p>\n<p>    The method is a variation of whats known as reinforcement    learning or RL. With RL systems, a computer learns by    trial-and-error, repeating the same task over and over, while    programmers direct its actions by setting certain reward    criteria. For example, if you want a computer to learn     how to play Atari games (something DeepMind has done in the    past) you might make the games point system the reward    criteria. Over time, the algorithm will learn to play in a way    that best accrues points, often leading to super-human    performance.  <\/p>\n<p>    What DeepMind and OpenAIs researchers have done is replace    this predefined reward criteria with a much simpler feedback    system. Humans are shown an AI performing two versions of the    same task and simply tell it which is better. This happens    again and again, and eventually the systems learns what is    expected of it. Think of it like getting an eye test, when    youre looking through different lenses, and being asked over    and over: better... or worse? Heres what that looks like when    teaching a computer to play the classic Atari game    Q*bert:  <\/p>\n<p>    This method of feedback is surprisingly effective, and    researchers were able to use it to train an AI to play a number    of Atari video games, as well perform simulated robot tasks    (like picking telling an arm to pick up a ball). This better \/    worse reward function could even be used to program trickier    behavior, like teaching a very basic virtual robot how to    backflip. Thats how we get to the GIF at the top of the page.    The behavior you see has been created by watching the Hopper    bot jump up and down, and telling it well done when it gets a    bit closer to doing a backflip. Over time, it learns    how.  <\/p>\n<p>    Of course, no one is suggesting this method is a cure-all for    teaching AI. There are a number of big downsides and    limitations in using this sort of feedback. The first being    that although it doesnt take much skill on behalf of    the human operator, it does take time. For example, in    teaching the Hopper bot to backflip, a human was asked to    judge its behavior some 900 times  a process that took about    an hour. The bot itself had to work through 70 hours of    simulated training time, which was sped up artificially.  <\/p>\n<p>    For some simple tasks, says Oxford Robotics researcher Markus    Wulfmeier (who was not involved in this research), it would be    quicker for a programmer to simply define what it is they    wanted. But, says Wulfmeier, its increasingly important to    render human supervision more effective for AI systems, and    this paper represents a small step in the right direction.  <\/p>\n<p>    DeepMind and OpenAI say pretty much the same  its a small    step, but a promising one, and in the future, theyre looking    to apply it to more and more complex scenarios. Speaking to    The Verge over email, DeepMind researcher Jan Leike    said: The setup described in [our paper] already scales from    robotic simulations to more complex Atari games, which suggests    that the system will scale further. Leike suggests the next    step is to test it in more varied 3D environments. You can read    the full paper describing the work here.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>See the article here: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.theverge.com\/2017\/6\/14\/15792818\/ai-safety-human-feedback-openai-deepmind\" title=\"This backflipping noodle has a lot to teach us about AI safety - The Verge\">This backflipping noodle has a lot to teach us about AI safety - The Verge<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> AI isnt going to be a threat to humanity because its evil or cruel, AI will be a threat to humanity because we havent properly explained what it is we want it to do.  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/ai\/this-backflipping-noodle-has-a-lot-to-teach-us-about-ai-safety-the-verge\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187743],"tags":[],"class_list":["post-198886","post","type-post","status-publish","format-standard","hentry","category-ai"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/198886"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=198886"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/198886\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=198886"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=198886"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=198886"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}