{"id":219591,"date":"2017-06-14T17:27:29","date_gmt":"2017-06-14T21:27:29","guid":{"rendered":"http:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/the-secret-to-training-ai-might-be-knowing-how-to-say-good-job-quartz.php"},"modified":"2022-02-25T17:05:32","modified_gmt":"2022-02-25T22:05:32","slug":"the-secret-to-training-ai-might-be-knowing-how-to-say-good-job-quartz","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/artificial-intelligence\/the-secret-to-training-ai-might-be-knowing-how-to-say-good-job-quartz.php","title":{"rendered":"The secret to training AI might be knowing how to say good job &#8211; Quartz"},"content":{"rendered":"<p><p>    Its tough to appreciate how efficient at learning humans    really are. From just a few experiences, we can figure out    complex tasks like learning to walk or becoming pros at the    office coffee machine (roughly of equal importance).  <\/p>\n<p>    But we havent been able to give machines that same gift.    Reinforcement learning, a promising sector of AI research where    algorithms test different ways to accomplish a task until they    can reliably get it right, is one method used to get machines    to learn by doing.  <\/p>\n<p>    The fields biggest problem: Whats the best way to tell AI it    has done something right?  <\/p>\n<p>    This week, research trying to answer that question was    published by major outfits in Silicon Valley: A joint venture    between Alphabet DeepMind and Elon Musk-funded OpenAI, as well    as separate work from Microsoft-owned Maluuba.  <\/p>\n<p>    The two papers represent different perspectives on how machines    of the future might learn. OpenAI and DeepMinds work suggests    that humans may be the best shepherds for fledging AI, guiding    the way it learns to ensure its safety. Maluuba takes a new    look at an idea AI researchers have hammered at for years,    trying to find a way for its algorithm to better understand its    failures and successes without human intervention.  <\/p>\n<p>    The DeepMind and OpenAI     research, posted June 13, has humans watch two videos of a    3D object trying to do a front flip. The human chooses the    video where the algorithm made the better attemptbut theres a    secret! The algorithm has already tried to predict which    attempt was better, so the human not only shows a better way to    do the task, but gives a nod to how humans perceive the better    attempt.  <\/p>\n<p>    Much reinforcement learning research from DeepMind and OpenAI    in the past has focused on video games, where theres a clear    goal: Get more points. This new research has an objective goal    (do a front flip), but the human judgement can be subjective.    OpenAI researchers say this idea could improve AI safety,    because future algorithms would be able to align themselves    with what humans think are correct and safe behaviors.  <\/p>\n<p>    Microsofts Maluuba takes a different approach to reinforcement    learning, and used it to beat the game Ms. Pac-Man, according    to research published June 14. The team quadrupled the previous    high score on the game (by human or machine), achieving the    maximum number of points possible.  <\/p>\n<p>    When the agent (Ms. Pac-Man) starts to learn, it moves    randomly; it knows nothing about the game board. As it    discovers new rewards (the little pellets and fruit Ms. Pac-Man    eats) it begins placing little algorithms in those spots, which    continuously learn how best to avoid ghosts and get more points    based on Ms. Pac-Mans interactions, according to the     Maluuba research paper.  <\/p>\n<p>    As the 163 potential algorithms are mapped, they continually    send which movement they think would generate the highest    reward to the agent, which averages the inputs and moves Ms.    Pac-Man. Each time the agent dies, all the algorithms process    what generated rewards. These helper algorithms were carefully    crafted by humans to understand how to learn, however.  <\/p>\n<p>    Instead of having one algorithm learn one complex problem, the    AI distributes learning over many smaller algorithms, each    tackling simpler problems, Maluuba says     in a video. This research could be applied to other highly    complex problems, like financial trading, according to the    company.  <\/p>\n<p>    But its worth noting that since more than 100 algorithms are    being used to tell Ms. Pac-Man where to move and win the game,    this technique is likely to be extremely computationally    intensive, so its probably not ready for the Microsoft    production line any time soon.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Go here to see the original:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/qz.com\/1005915\/the-secret-to-training-ai-might-be-a-well-timed-pat-on-the-back\/\" title=\"The secret to training AI might be knowing how to say good job - Quartz\">The secret to training AI might be knowing how to say good job - Quartz<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Its tough to appreciate how efficient at learning humans really are.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/artificial-intelligence\/the-secret-to-training-ai-might-be-knowing-how-to-say-good-job-quartz.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[13],"tags":[],"class_list":["post-219591","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence"],"modified_by":"Danzig","_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/219591"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=219591"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/219591\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=219591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=219591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=219591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}