{"id":187952,"date":"2017-04-15T17:31:43","date_gmt":"2017-04-15T21:31:43","guid":{"rendered":"http:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/the-7-worst-automation-failures-cso-online\/"},"modified":"2017-04-15T17:31:43","modified_gmt":"2017-04-15T21:31:43","slug":"the-7-worst-automation-failures-cso-online","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/automation\/the-7-worst-automation-failures-cso-online\/","title":{"rendered":"The 7 worst automation failures &#8211; CSO Online"},"content":{"rendered":"<p><p>    There are IT jobs that you just know are built for failure.    They are so big and cumbersome and in some cases are plowing    through new ground that unforeseen outcomes are likely. Then    there are other situations where an IT pro might just say    whoops when that unforeseen result should have been, well,    foreseen.  <\/p>\n<p>    UpGuardhas pulled together a group of the    biggest instances in the past few years in which the    well-intentioned automation of a companys IT systems    facilitated a major breach instead.  <\/p>\n<p>    Healthcare.gov:    How an oversight broke the U.S. governments healthcare    website  <\/p>\n<p>    When the U.S. government rolled out the Affordable Care Acts    web enrollment tool, Healthcare.gov, in October 2013, it was    expected to be a monumental undertaking; and with the delivery    of millions of citizens health insurance on the line, the    stakes were high. So, when a major software failure crashed the    website a mere two hours following its launch, the White House    administration suffered a sizeable backlash. Due to a lack of    integration, visibility, and testing, the project had    significant problems from the start  beginning with over 100    defects with Healthcare.govs account creation feature, dubbed    Account Lite.  <\/p>\n<p>    Given its function, Account Lite was a crucial piece of the    Healthcare.gov site, serving as the mechanism by which people    would create their accounts and gain access to their healthcare    options. This particular module had so many problems that it    was assuredly a disaster waiting to happen. Nevertheless,    contractors moved forward with it as it stood.  <\/p>\n<p>    The software release failed, preventing millions from securing    healthcare coverage. Whats more, the outage had political    ramifications as critics of the Affordable Care Act began    citing the outage as evidence of the administrations inability    to develop a successful healthcare program. The site was    eventually stabilized, but the work that should have been    integrated before the release was completed only after the    crash occurred.  <\/p>\n<p>    Dropbox: The buggy    outage that dropped Dropbox from the web  <\/p>\n<p>    No IT team enjoys the experience of an outage, especially when    it kicks off a race for your team to implement its emergency    procedures. In January 2014, Dropbox found themselves    scrambling in this very scenario, when a planned product    upgrade took down the sites for three hours.  <\/p>\n<p>    When a subtle bug in the Dropbox script automatically applied    its updates to a small number of active machines, it affected    Dropboxs thousands of production servers and caused the    companys live services to fail. Fortunately for Dropbox, its    emergency procedures were well designed and largely    effective.With its backup and recovery strategy, the IT    team was able to restore most of their services within three    hours. For some of the larger databases, however, recovery was    slower  taking the company several days for all of its core    services to fully return.  <\/p>\n<p>    Amazon\/DynamoDB: When the    DynamoDB database disrupted all of Amazons    infrastructure  <\/p>\n<p>    Just as physical services like freight haulage require physical    infrastructures like roads and highways, companies digital    services depend on underlying digital infrastructures. When    some of Amazons automated infrastructure processes timed out    in September 2015, their Amazon Web Services cloud platform    suffered an outage. Cascading from a simple network disruption    into broad service failure, Amazon experienced a network outage    like those traditional on-premise data centers experience,    despite its very advanced and integrated cloud platform.  <\/p>\n<p>    Amazon had a network disruption that impacted a portion of its    DynamoDB cloud databases storage servers. When this happened,    a number of storage servers simultaneously requested their    membership data, exceeded their allowed retrieval and    transmission time. As a result, the servers were unable to    obtain their membership data, and subsequently removed    themselves from taking requests.  <\/p>\n<p>    When the servers that became unavailable for requests began    retrying the requests, the DynamoDB timeout issue manifested    itself in a broader network outage. Just like that, a network    disruption started a vicious cycle and affecting Amazons    customers as it took down AWS for 5 hours.  <\/p>\n<p>    Opsmatic: recipe    for disaster  <\/p>\n<p>    When managed under traditional server administration,    automation often faces the same set of age old IT problems. One    of those classic, faulty assumptions is if it aint broke,    dont fix it  assuming that all systems are operating the way    they should be. When Opsmatics routine server maintenance shut    down its whole operation, it was because things werent exactly    as they had thought.  <\/p>\n<p>    In Opsmatics case, a Chef recipe called remove_default_users    had been created during the early stages of the companys    Amazon Web Services experimentation. Now, long after the test,    that recipe was somehow still running against the production    servers, unbeknownst to the staff maintaining them.  <\/p>\n<p>    Like many major outages, this incident was the result of a    long, causal sequence of mistakes, none of which were caught    until they added up to a giant problem.  <\/p>\n<p>    Knight Capital:    How one tiny mistype cost Knight Capital $1 billion  <\/p>\n<p>    Knight Capital automated not only its administrative IT    processes, but also its algorithmic trading. Unfortunately,    this meant that changes and unplanned errors  in handling real    money  could happen very quickly. This is the story of how a    single error caused Knight Capital to lose $172,222 per    second for 45 minutes straight in 2012.  <\/p>\n<p>    When operating a data center at scale, clusters of servers    often run a single function. This distributes the load across    more computing resources and provides better performance for    high traffic applications. This model requires all the    servers in a cluster to use the same configurations, no matter    which particular server in the cluster they are using, so that    all the applications will behave the same way. However,    configurations  even if identical at provisioning  always    drift apart.  <\/p>\n<p>    Despite all of its automation, Knight Capital was still    manually deploying code across server banks, and an inevitable    human error caused one of its eight servers to have a different    configuration from all the others. When one of Knights    technicians made this mistake during the deployment of the new    server code, no one knew. Thus, from that point forward, the IT    staff were operating under the misconception that these servers    were identical.  <\/p>\n<p>    At the same time, a decommissioned code remained available on    the misconfigured server. As a result, this server began    sending orders to certain trading centers for execution, and    the error triggered a domino effect around algorithmic stock    trading  costing Knight Capital $465 million in trading loss.  <\/p>\n<p>    Delta Airlines:    automated fleet of flightless birds  <\/p>\n<p>    Large logistics operations rely on automated systems to achieve    the necessary speed to perform at scale. Some airlines struggle    to keep those systems functional. Just like traditional, manual    methods of systems administration, automated systems suffer    from misconfigurations. In the worst-case scenarios from recent    years, failure of these systems has cost airlines hundreds of    millions of dollars and more in their customers goodwill.  <\/p>\n<p>    When misconfigurations occur, they are pushed out quickly    through automated mechanisms and can bring entire systems down.    For airlines, this means flight operations are interrupted,    planes are delayed, and money is siphoned out of the business.    In one such case in January 2017, Delta told investors that one    glitch in their automated system caused an expansive outage,    costing the airline more than $150 million.  <\/p>\n<p>    Google Gmail:    Youve got mail?: Gmails 2014 bug-induced failure  <\/p>\n<p>    When technology giants experience the occasional    automation-related outage, an hour of downtime can mean a lot    more. For these huge organizations to make any sort of change,    they have to do so across thousands of servers. Having always    been on the bleeding edge of technology, its no surprise that    Google has automated its configuration management. Although    employed to make operations easier, when the wrong change is    executed in an automated system that means it can propagate far    and wide within a matter of seconds.  <\/p>\n<p>    In 2014, a bug in Googles internal automated configuration    system caused Gmail to crash for around half an hour. The    incorrect configuration was sent to live services, causing    users requests for their data to be ignored, for those    services, in turn, to generate errors.  <\/p>\n<p>    The lesson is that configuration automation is not the same as    configuration management. Automation ensure that changes get    pushed out across all systems.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read the original here: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow\" href=\"http:\/\/www.csoonline.com\/article\/3188426\/security\/the-7-worst-automation-failures.html\" title=\"The 7 worst automation failures - CSO Online\">The 7 worst automation failures - CSO Online<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> There are IT jobs that you just know are built for failure. They are so big and cumbersome and in some cases are plowing through new ground that unforeseen outcomes are likely <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/automation\/the-7-worst-automation-failures-cso-online\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187732],"tags":[],"class_list":["post-187952","post","type-post","status-publish","format-standard","hentry","category-automation"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/187952"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=187952"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/187952\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=187952"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=187952"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=187952"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}