{"id":233439,"date":"2017-08-09T03:00:44","date_gmt":"2017-08-09T07:00:44","guid":{"rendered":"http:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/ibm-pushes-envelope-in-deep-learning-scalability-top500-news.php"},"modified":"2017-08-09T03:00:44","modified_gmt":"2017-08-09T07:00:44","slug":"ibm-pushes-envelope-in-deep-learning-scalability-top500-news","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/ibm-pushes-envelope-in-deep-learning-scalability-top500-news.php","title":{"rendered":"IBM Pushes Envelope in Deep Learning Scalability &#8211; TOP500 News"},"content":{"rendered":"<p><p>    This week IBM demonstrated software that was able to    significantly boost the speed of training deep neural networks,    while improving the accuracy of those networks. The software    achieved this by dramatically increasing the scalability of    these training applications across large number of GPUs.  <\/p>\n<\/p>\n<p>    Source: IBM Research  <\/p>\n<\/p>\n<p>    In a blog posted by IBM Fellow Hillery Hunter, director of the    Accelerated Cognitive Infrastructure group at IBM Research, she    outlined the motivation for the work:  <\/p>\n<p>    For our part, my team in IBM Research has been focused on    reducing these training times for large models with large data    sets. Our objective is to reduce the wait-time associated with    deep learning training from days or hours to minutes or    seconds, and enable improved accuracy of these AI models. To    achieve this, we are tackling grand-challenge scale issues in    distributing deep learning across large numbers of servers and    GPUs.  <\/p>\n<p>    The technology they developed to accomplish this, encapsulated    in their Distributed Deep Learning (DDL) software, delivered a    record 95 percent scaling efficiency across 256 NVIDIA Tesla    P100 GPUs using the Caffe deep learning framework for an image    recognition application. That exceeds the previous high-water    mark of 89 percent efficiency achieved by Facebook for training    a similar network with those same GPUs on Caffe2.  <\/p>\n<\/p>\n<p>    Source: IBM Research  <\/p>\n<\/p>\n<p>    The quality of the training was also improved by the DDL    software, which delivered an image recognition accuracy of 33.8    percent for a network trained with a ResNet-101 model on a    7.5-million image dataset (ImageNet-22k). The previous best    result of 29.8 percent accuracy was achieved by Microsoft in    2014. But in the case of the IBM training, its level of    accuracy was achieved in just 7 hours of training, while the    Microsoft run took 10 days.  <\/p>\n<p>    It should be noted that the Microsoft training was executed on    a 120-node HP Proliant cluster, powered by 240 Intel Xeon    E5-2450L CPUs, while the IBM training was executed on a 64-node    Power8 cluster (Power Systems S822LC for HPC), equipped with    256 NVIDIA P100 GPUs. Inasmuch as those GPUs represent more    than two petaflops of single precision floating point    performance, the IBM system is about two orders of magnitude    more powerful than the commodity cluster used by Microsoft.  <\/p>\n<p>    That doesnt negate the importance of the IBM achievement. As    was pointed out by Hunter in her blog, scaling a deep learning    problem across more GPUs is made much more difficult as these    processors get faster, since communication between them and the    rest of the system struggles to keep pace as the computational    power of the graphics chips increase. She describes the problem    as follows:  <\/p>\n<p>    [A]sGPUs get much faster, they learn much faster,    and they have to share their learning with all of the other    GPUs at a rate that isnt possible with conventional software.    This puts stress on the system network and is a tough technical    problem. Basically, smarter and faster learners (the GPUs) need    a better means of communicating, or they get out of sync and    spend the majority of time waiting for each others results.    So, you get no speedupand potentially even degraded    performancefrom using more, faster-learning GPUs.  <\/p>\n<\/p>\n<p>    IBM Fellow Hillery Hunter. Source    IBM  <\/p>\n<\/p>\n<p>    At about 10 single precision teraflops per GPU, the NVIDIA P100    is one of the fastest GPUs available today. The NVIDIA V100    GPUs, which are just entering the market now, will offer 120    teraflops of mixed single\/half precision performance, further    challenging the ability of these deep learning applications to    scale efficiently.  <\/p>\n<p>    The IBM software is able to overcome the compute\/communication    imbalance to a great extent by employing a multi-dimensional    ring algorithm. This allows communication to be optimized based    on the bandwidth of each network link, the network topology,    and the latency for each phase. This is accomplished by    adjusting the number of dimensions and the size of each one.    For server hardware with different types of communication    links, the software is able to adjust its behavior to take    advantage of the fastest links in order to avoid bottlenecks in    the slower ones.  <\/p>\n<p>    Even though this is still a research effort, the DDL software    is going to be available to customers on a trial basis as part    of IBMs PowerAI, the companys deep learning software suite    aimed at enterprise users. DDL is available today in version 4    of PowerAI, and according to IBM, it contains implementations    at various stages of development, for Caffe, Tensorflow, and    Torch.  <\/p>\n<p>    An API has been provided for developers to tap into DDLs core    functions. The current implementation is based on MPI  IBMs    own Spectrum MPI, to be specific  which provides optimizations    for the companys Power\/InfiniBand-based clusters. IBM says you    can also use DDL without MPI underneath if desired, but    presumably your performance will vary accordingly. IBM is    hoping that third-party developers will start using this new    capability and demonstrate its advantages across a wider array    of deep learning applications.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>View post:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/www.top500.org\/news\/ibm-pushes-envelope-in-deep-learning-scalability\/\" title=\"IBM Pushes Envelope in Deep Learning Scalability - TOP500 News\">IBM Pushes Envelope in Deep Learning Scalability - TOP500 News<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> This week IBM demonstrated software that was able to significantly boost the speed of training deep neural networks, while improving the accuracy of those networks. The software achieved this by dramatically increasing the scalability of these training applications across large number of GPUs. Source: IBM Research In a blog posted by IBM Fellow Hillery Hunter, director of the Accelerated Cognitive Infrastructure group at IBM Research, she outlined the motivation for the work: For our part, my team in IBM Research has been focused on reducing these training times for large models with large data sets.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/ibm-pushes-envelope-in-deep-learning-scalability-top500-news.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[41],"tags":[],"class_list":["post-233439","post","type-post","status-publish","format-standard","hentry","category-super-computer"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/233439"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=233439"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/233439\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=233439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=233439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=233439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}