{"id":1027172,"date":"2023-08-02T15:17:36","date_gmt":"2023-08-02T19:17:36","guid":{"rendered":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/a-new-attack-impacts-chatgptand-no-one-knows-how-to-stop-it-wired.php"},"modified":"2023-08-02T15:17:36","modified_gmt":"2023-08-02T19:17:36","slug":"a-new-attack-impacts-chatgptand-no-one-knows-how-to-stop-it-wired","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/neural-network\/a-new-attack-impacts-chatgptand-no-one-knows-how-to-stop-it-wired.php","title":{"rendered":"A New Attack Impacts ChatGPTand No One Knows How to Stop It &#8211; WIRED"},"content":{"rendered":"<p><p>    Making models more resistant to prompt injection and other    adversarial jailbreaking measures is an area of active    research, says Michael Sellitto, interim head of policy and    societal impacts at Anthropic. We are experimenting with ways    to strengthen base model guardrails to make them more    harmless, while also investigating additional layers of    defense.  <\/p>\n<p>    ChatGPT and its brethren are built atop large language models,    enormously large neural network algorithms geared toward using    language that has been fed vast amounts of human text, and    which predict the characters that should follow a given input    string.  <\/p>\n<p>    These algorithms are very good at making such predictions,    which makes them adept at generating output that seems to tap    into real intelligence and knowledge. But these language models    are also prone to fabricating information, repeating social    biases, and producing strange responses as answers prove more    difficult to predict.  <\/p>\n<p>    Adversarial attacks exploit the way that machine learning picks    up on patterns in data to     produce aberrant behaviors. Imperceptible changes to images    can, for instance, cause image classifiers to misidentify an    object, or make speech recognition systems respond to    inaudible messages.  <\/p>\n<p>    Developing such an attack typically involves looking at how a    model responds to a given input and then tweaking it until a    problematic prompt is discovered. In one well-known experiment,    from 2018, researchers added stickers to stop signs    to bamboozle a computer vision system similar to the ones used    in many vehicle safety systems. There are ways to protect    machine learning algorithms from such attacks, by giving the    models additional training, but these methods do not eliminate    the possibility of further attacks.  <\/p>\n<p>    Armando    Solar-Lezama, a professor in MITs college of computing,    says it makes sense that adversarial attacks exist in language    models, given that they affect many other machine learning    models. But he says it is extremely surprising that an attack    developed on a generic open source model should work so well on    several different proprietary systems.  <\/p>\n<p>    Solar-Lezama says the issue may be that all large language    models are trained on similar corpora of text data, much of it    downloaded from the same websites. I think a lot of it has to    do with the fact that there's only so much data out there in    the world, he says. He adds that the main method used to    fine-tune models to get them to behave, which involves having    human testers provide feedback, may not, in fact, adjust their    behavior that much.  <\/p>\n<p>    Solar-Lezama adds that the CMU study highlights the importance    of open source models to open study of AI systems and their    weaknesses. In May, a powerful language model developed by Meta    was leaked, and the model has since been     put to many uses by outside researchers.  <\/p>\n<p>    The outputs produced by the CMU researchers are fairly generic    and do not seem harmful. But companies are rushing to use large    models and chatbots in many ways. Matt Fredrikson,    another associate professor at CMU involved with the study,    says that a bot capable of taking actions on the web, like    booking a flight or communicating with a contact, could perhaps    be goaded into doing something harmful in the future with an    adversarial attack.  <\/p>\n<p>    To some AI researchers, the attack primarily points to the    importance of accepting that language models and chatbots will    be misused. Keeping AI capabilities out of the hands of bad    actors is a horse that's already fled the barn, says Arvind Narayanan,    a computer science professor at Princeton University.  <\/p>\n<p>    Narayanan says he hopes that the CMU work will nudge those who    work on AI safety to focus less on trying to align models    themselves and more on trying to protect systems that are    likely to come under attack, such as social networks that are    likely to experience a rise in     AI-generative disinformation.  <\/p>\n<p>    Solar-Lezama of MIT says the work is also a reminder to those    who are giddy with the potential of ChatGPT and similar AI    programs. Any decision that is important should not be made by    a [language] model on its own, he says. In a way, its just    common sense.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>See more here:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.wired.com\/story\/ai-adversarial-attacks\/\" title=\"A New Attack Impacts ChatGPTand No One Knows How to Stop It - WIRED\">A New Attack Impacts ChatGPTand No One Knows How to Stop It - WIRED<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Making models more resistant to prompt injection and other adversarial jailbreaking measures is an area of active research, says Michael Sellitto, interim head of policy and societal impacts at Anthropic.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/neural-network\/a-new-attack-impacts-chatgptand-no-one-knows-how-to-stop-it-wired.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1237600],"tags":[],"class_list":["post-1027172","post","type-post","status-publish","format-standard","hentry","category-neural-network"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1027172"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1027172"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1027172\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1027172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1027172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1027172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}