{"id":207867,"date":"2017-02-14T10:07:54","date_gmt":"2017-02-14T15:07:54","guid":{"rendered":"http:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/exabyte-measures-linpack-performance-across-major-cloud-vendors-top500-news.php"},"modified":"2017-02-14T10:07:54","modified_gmt":"2017-02-14T15:07:54","slug":"exabyte-measures-linpack-performance-across-major-cloud-vendors-top500-news","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/exabyte-measures-linpack-performance-across-major-cloud-vendors-top500-news.php","title":{"rendered":"Exabyte Measures Linpack Performance Across Major Cloud Vendors &#8211; TOP500 News"},"content":{"rendered":"<p><p>    Exabyte, a materials    discovery cloud specialist, has published a study that compares    Linpack performance on four of the largest public cloud    providers. Although the studys methodology had some drawbacks,    the results suggested that with the right hardware, HPC    applications could not only scale well in cloud environments,    but could also deliver performance on par with that of    conventional supercomputers.  <\/p>\n<p>    Overall, HPC practitioners have resisted using cloud computing    for a variety of reasons, one of the more significant being the    lack of performant hardware available in cloud infrastructure.    Cluster network performance, in particular, has been found    wanting in generic clouds, since conventional Ethernet, both    GigE and 10GigE, do not generally have the bandwidth and    latency characteristics to keep up with MPI applications    running on high core-count nodes. As well see in a moment, it    was the network that seemed to matter most in terms of    scalability for these cloud environments.  <\/p>\n<p>    The Exabyte study used high performance Linpack (HPL) as the    benchmark metric, measuring its performance on four of the most    widely used public clouds in the industry: Amazon Web Service    (AWS), Microsoft Azure, IBM SoftLayer, and Rackspace. (Not    coincidentally Exabyte, a cloud service provider for materials    design, device simulations, and computational chemistry,    employs AWS, Azure, SoftLayer and Rackspace as the    infrastructure of choice for its customers.) Linpack was    measured on specific instances of these clouds to determine    benchmark performance across different cluster sizes and its    efficiency in scaling from 1 to 32 nodes. The results were    compared to those on Edison,    a 2.5 petaflop (peak) NERSC supercomputer built by Cray. It    currently occupies the number 60 spot on the TOP500 rankings.  <\/p>\n<p>    To keep the benchmark results on as level a playing field as    possible, it looks like the Exabyte team tried to use the same    processor technology across systems, in this case, Intel Xeon    processors of the Haswell or Ivy Bridge generation. However,    the specific hardware profile  clock speed, core count, and    RAM capacity -- varied quite a bit across the different    environments. As it turns out though, the system network was    the largest variable across the platforms. The table below    shows the node specification for each environment.  <\/p>\n<\/p>\n<p>    Source: Exabyte Inc.  <\/p>\n<\/p>\n<p>    As might be expected, Edison was able to deliver very good    results, with a respectable 27-fold speedup as the Linpack run    progressed from 1 to 32 nodes, at which point 10.44 teraflops    of performance was achieved. That represents decent scalability    and is probably typical of a system with a high-performance    interconnect, in this case Crays Aries network. Note that    Edison had the highest core count per node (48), but one of the    slower processor clocks (2.4 GHz) of the environments tested.  <\/p>\n<p>    The AWS cloud test used the c4.8xlarage instance, but was    measured in three different ways: one with hyperthreading    enabled, one with hyperthreading disabled, and one with    hyperthreading disabled and with the node placements optimized    to minimize network latency and maximize bandwidth. The results    didnt vary all that much between the three with maximum    Linpack performance of 10.74 being recorded for the 32-node    setup with hyperthreading disabled and optimal node placement.    However the speedup achieved for 32 nodes was just a little    over 17 times that of a single node.  <\/p>\n<p>    The Rackspace cloud instance didnt do nearly as well in the    performance department, achieving only 3.04 teraflops on the    32-node setup. Even with just a single node, its performance    was much worse than that of the AWS case, despite having more    cores, a similar clock frequency, and an identical memory    capacity per node. Rackspace did, however, deliver a better    than an 18-fold speed as it progressed from 1 to 32 nodes --    slightly better than that of Amazon. That superior speedup is    not immediately explainable since AWS instance provides more    than twice the bandwidth of the Rackspace setup. Its    conceivable the latters network latency is somewhat lower than    that of AWS.  <\/p>\n<p>    IBM SoftLayer fared even worse, delivering just 2.46 Linpack    teraflops at 32 nodes and a speedup of just over 4 times that    of a single node. No doubt the relatively slow processor clock    (2.0 GHz) and slow network speed (1 gigabit\/sec) had a lot to    do with its poor performance.  <\/p>\n<p>    Micrsofts Azure cloud offered the most interesting results.    Here the Exabyte team decided to test three instances F16s, A9,    and H16. The latter two instances were equipped with    InfiniBand, the only platforms in the study where this was the    case. The A9 instance provided 32 gigabits\/sec and the H16    instance provided 54 gigabits\/sec  nearly as fast as the 64    gigabits\/sec of the Aries interconnect on Edison.  <\/p>\n<p>    Not surprisingly, the A9 and H16 exhibited superior scalability    for Linpack, specifically, more than a 28-fold speedup on 32    nodes compared to a single node. Thats slightly better than    the 27x speedup Edison achieved. In the performance area, the    H16 instance really shined, delivering 17.26 Linpack teraflops    in the 32-node configuration. Thats much higher than any of    the other environments tested, including the Edison    supercomputer. Its probably no coincidence that the H16, which    is specifically designed for HPC work, was equipped with the    fastest processor of the bunch at 3.2 GHz. Both the A9 and H16    instances also had significantly more memory per node than the    other environments.  <\/p>\n<p>    One of the unfortunate aspects of the Edison measurement is    that they enabled hyperthreading for the Linpack runs,    something Intel explicitly says not to do if you want to    maximize performance on this benchmark. With the exception of    one of the AWS tests, none of the others ran the benchmark with    hyperthreading enabled.  <\/p>\n<p>    In fact, the poor Linpack yield on Edison, at just 36 percent    of peak in the 32-node test run, suggests the benchmark was not    devised very well forthat system. The actual TOP500 run    across the entire machine achieved a Linpack yield of more than    64 percent of peak, which is fairly typically of an HPC cluster    with a high-performance network. The Azure H16 in this test had    a 67 percent Linpack yield.  <\/p>\n<p>    Theres also no way to tell if other hardware variations     things like cache size, memory performance, etc. -- could have    affected the results across the different cloud instances. In    addition, it was unclear if the benchmark implementations were    optimized for the particular environments tested. Teams working    on a TOP500 submission will often devote weeks to tweaking a    Linpack implementation to maximize performance on a particular    system.  <\/p>\n<p>    It would have been interesting to see Linpack results on other    InfiniBand-equipped clouds. In 2014,     an InfiniBand option was added to SoftLayer, but the    current website doesnt make any mention of such a capability.    However, Penguin Computing On Demand, Nimbix, and ProfitBricks    all have InfiniBand networks for their purpose-built HPC    clouds. Comparing these to the Azure H16 instance could have    been instructive. Even more interesting would be to see other    HPC benchmarks, like the HPCG metric or specific    application kernels tested across these platforms.  <\/p>\n<p>    Of course, what would be ideal would be some sort of cloud    computing tracker that could determine the speed and cost of    executing your code on a given cloud platform at any particular    time. Thats apt to require a fair amount of AI, not to mention    a lot more transparency by cloud providers on how their    hardware and software operates underneath the covers. Well,    maybe someday...  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Go here to read the rest: <\/p>\n<p><a target=\"_blank\" href=\"https:\/\/www.top500.org\/news\/exabyte-measures-linpack-performance-across-major-cloud-vendors\/\" title=\"Exabyte Measures Linpack Performance Across Major Cloud Vendors - TOP500 News\">Exabyte Measures Linpack Performance Across Major Cloud Vendors - TOP500 News<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Exabyte, a materials discovery cloud specialist, has published a study that compares Linpack performance on four of the largest public cloud providers. Although the studys methodology had some drawbacks, the results suggested that with the right hardware, HPC applications could not only scale well in cloud environments, but could also deliver performance on par with that of conventional supercomputers. Overall, HPC practitioners have resisted using cloud computing for a variety of reasons, one of the more significant being the lack of performant hardware available in cloud infrastructure <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/exabyte-measures-linpack-performance-across-major-cloud-vendors-top500-news.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[41],"tags":[],"class_list":["post-207867","post","type-post","status-publish","format-standard","hentry","category-super-computer"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/207867"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=207867"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/207867\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=207867"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=207867"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=207867"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}