{"id":219132,"date":"2017-06-13T05:00:53","date_gmt":"2017-06-13T09:00:53","guid":{"rendered":"http:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/early-benchmarks-on-argonnes-new-knights-landing-supercomputer-the-next-platform.php"},"modified":"2017-06-13T05:00:53","modified_gmt":"2017-06-13T09:00:53","slug":"early-benchmarks-on-argonnes-new-knights-landing-supercomputer-the-next-platform","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/early-benchmarks-on-argonnes-new-knights-landing-supercomputer-the-next-platform.php","title":{"rendered":"Early Benchmarks on Argonne&#8217;s New Knights Landing Supercomputer &#8211; The Next Platform"},"content":{"rendered":"<p><p>    June 12, 2017 Nicole    Hemsoth  <\/p>\n<p>    We are heading into International Supercomputing Conference    week (ISC) and as such, there are several new items of interest    from the HPC side of the house.  <\/p>\n<p>    As far as supercomputer architectures go for mid-2017, we can    expect to see a lot of new machines with Intels Knights    Landing architecture, perhaps a scattered few finally adding    Nvidia K80 GPUs as an upgrade from older generation    accelerators (for those who are not holding out for Volta with    NVlink ala the Summit supercomputer), and of course, it all    remains to be seen what happens with the Tianhe-2 and Sunway    machines in China in terms of new development.  <\/p>\n<p>    While we are not expecting any major new architectural surprise    shakeups on the Top 500 list when it is announced next Monday,    there is progress for some of the pre-exascale machines being    installed and put into early production, including the     Cori supercomputer at NERSC (more on that later today) and    the Theta system at Argonne National Lab. Both of these    machines sport an Intel     Knights Landing (and Haswell in Coris case) base with the    Cray Aries interconnect via the XC40 supercomputer    architectureand both are reporting early results with key    applications and how they might help centers adapt to the much    larger systems.  <\/p>\n<p>    As we pointed out last week, there are     some questions about the future of the Aurora supercomputer    at Argonne, an Intel and Cray based system that has been    expected to arrive at the lab in 2018 sporting Intels     Knights Hill and Cray architecture. However, work has been    progressing on one of the systems designed to prepare users and    codes for such a scale-shiftthe stepping-stone Theta    supercomputer, which will introduce the lab to the Cray XC40    architecture and help users make the shift from an IBM systems    focus to a completely new approach altogethersomething we        talked about with one of the labs leads when     Aurora was announced.  <\/p>\n<p>    Even though part of its purpose is to provide an on-ramp to the    next-generation Intel architecture (Knights Hill) and Cray    architecture after so many years as a BlueGene-centric lab,    Theta is still very powerful. In terms of capability, it is    very similar to Argonnes current leadership-class    supercomputer, the 10 peak petaflop Mira supercomputeran IBM    BlueGene machine at still holds steady at #9 on the Top 500    list alongside a few other IBM BlueGene systems that will be    retired in the next couple of years, bringing an end to that    architectural era. The Knights Landing and XC40 (Cray    Aries network-based) combination will deliver (along peak    Linpack benchmark performance lines) in 3,624 nodes what takes    Mira 49,152 nodes (although the architecture differences dont    allow for true apples-to-apples compare).  <\/p>\n<p>        With Theta up and    running now, we can presume in time for the upcoming Top 500    ranking (although some labs eschew this benchmark because of    its     lack of relevance to real-world HPC applications)    researchers are running microbenchmarks to evaluate    performance. On the list for a recent report were DGEMM for    peak floating point performance metrics and for more    component-centric evaluations, LAMMPS, MILC, and Nekbone were    measured. If the team ran the Top 500 Linpack benchmark to    obtain peak theoretical performance, we wont know until next    week.  <\/p>\n<p>    For DGEMM and the evaluation of the peak floating point    performance of the KNL core and nodes, the team found that they    were able to achieve 86% of the peakan impressive number    considering each node was expected to reach 2.25 teraflops    (35.2 Gflops per core).  <\/p>\n<p>    The research team adds that while the KNL core has a    theoretical peak throughput of 2 instructions per clock cycle),    actual throughput can be limited by factors such as instruction    width and power constraints. They explain that power    measurements show better computational efficiency when using    fewer hyperthreads. OS noise and the shared L2 cache contention    have been identified as the sources of core to core variability    on the node but note that Crays core specialization can    target the noise issues that have an impact on the timing of    microkernels.  <\/p>\n<p>      Theta results on the DGEMM matrix multiplication kernel. This      benchmark achieves over 1.9 teraflops on a Theta node, or 86%      of peak for a relatively small matrix size. The team points      out that on this compute-intensive benchmark, running more      than one thread per core does not improve the performance.      Further, using more than one hyperthread can issue the core      limit of two instructions per cycle. While this is not the      case with the DGEMM kernel,using more than one hyper-thread      can in some cases reduce performance due to threads sharing      resources such as L1 and L2 caches and instruction re-order      buffers.    <\/p>\n<p>    In terms of other trouble spots, it is actually OpenMP that    introduced some of the latencies. The Barrier and Reduce    construct was found to be related to the latency of main    memory access due to the lack of shared last level cache. A    simple performance model was developed to quantify the overhead    of OpenMP pragmas which scale as the square root of the thread    count, Argonne researchers note.  <\/p>\n<p>    The team also ran the STREAM Triad benchmark to evaluate memory    bandwidth. They found that considerable variation was found in    memory bandwidth between the flat and cache memory mode    configurations.  <\/p>\n<p>      Included is the power consumption and efficiency for STREAM      on one node in flatquadrant mode. The IPM and DDR4 are      evaluated separately. For both tests, 15GB memory was used      across 100 iterations. The thing to note here is that the IPM      gets a 4.3X gain in memory bandwidth power efficiency and a      1.2X increase in overall power consumption.    <\/p>\n<p>    LAMMPS, MILC, and Nekbone all showed positive scaling    characteristics (for strong and weak scaling) on Theta and were    comparable to what teams were able to achieve on Mira, which is    known for scalability via the BlueGene architecture. In short,    so far, KNL is delivering on its promises in the wildit will    be interesting to see scaling, performance, and efficiency on    real-world applications as these roll out by SC17 for Gordon    Bell, for instance.  <\/p>\n<p>    We can expect a number of stories leading into ISC around the    early benchmark results and production tales from other    supercomputers with similar architectures (Trinity, Cori,    Stampede2, etc) and will write these up as we get them. The    full benchmark results and details from Theta can be     found here.  <\/p>\n<p>    Categories: HPC, ISC17  <\/p>\n<p>    Tags: Argonne, Aurora, ISC17,    Knights Landing, Theta  <\/p>\n<p>    Clever RDMA Technique Delivers Distributed Memory    Pooling  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read more here:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/www.nextplatform.com\/2017\/06\/12\/early-benchmarks-argonnes-new-knights-landing-cray-supercomputer\/\" title=\"Early Benchmarks on Argonne's New Knights Landing Supercomputer - The Next Platform\">Early Benchmarks on Argonne's New Knights Landing Supercomputer - The Next Platform<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> June 12, 2017 Nicole Hemsoth We are heading into International Supercomputing Conference week (ISC) and as such, there are several new items of interest from the HPC side of the house. As far as supercomputer architectures go for mid-2017, we can expect to see a lot of new machines with Intels Knights Landing architecture, perhaps a scattered few finally adding Nvidia K80 GPUs as an upgrade from older generation accelerators (for those who are not holding out for Volta with NVlink ala the Summit supercomputer), and of course, it all remains to be seen what happens with the Tianhe-2 and Sunway machines in China in terms of new development. While we are not expecting any major new architectural surprise shakeups on the Top 500 list when it is announced next Monday, there is progress for some of the pre-exascale machines being installed and put into early production, including the Cori supercomputer at NERSC (more on that later today) and the Theta system at Argonne National Lab.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/early-benchmarks-on-argonnes-new-knights-landing-supercomputer-the-next-platform.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[41],"tags":[],"class_list":["post-219132","post","type-post","status-publish","format-standard","hentry","category-super-computer"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/219132"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=219132"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/219132\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=219132"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=219132"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=219132"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}