{"id":207575,"date":"2017-02-13T18:03:08","date_gmt":"2017-02-13T23:03:08","guid":{"rendered":"http:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/top-chinese-supercomputer-blazes-real-world-application-trail-the-next-platform.php"},"modified":"2017-02-13T18:03:08","modified_gmt":"2017-02-13T23:03:08","slug":"top-chinese-supercomputer-blazes-real-world-application-trail-the-next-platform","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/top-chinese-supercomputer-blazes-real-world-application-trail-the-next-platform.php","title":{"rendered":"Top Chinese Supercomputer Blazes Real-World Application Trail &#8211; The Next Platform"},"content":{"rendered":"<p><p>    February 13, 2017 Jeffrey Burt  <\/p>\n<p>    Chinas massive Sunway TaihuLight supercomputer sent ripples    through the computing world last year when it debuted in the    number-one spot on the Top500 list of the worlds fastest    supercomputers. Delivering 93,000 teraflops of performance     and a peak of more than 125,000 teraflops  the system is    nearly three times faster than the second supercomputer on the    list (the Tianhe-2, also a Chinese system) and dwarfs the Titan    system Oak Ridge National Laboratory, a Cray-based machine that    is the worlds third-fastest system, and the fastest in the    United States.  <\/p>\n<p>    However, it wasnt only the systems performance that garnered    a lot of attention. It also was the fact that the supercomputer    was powered by Sunways many-core SW26010 processors  built in    China  rather than chips from well-known US players like    Intel, AMD or Nvidia.     As weve talked about before, the TaihuLight system and the    the SW26010 chips it runs on are part of a larger push by    Chinese officials to have more components for Chinese systems    made in China rather than by US vendors, an effort that is    fueled by a number of factors, from national security issues to    national competitive pride. Another part of that push is    Chinas plan to spend $150 billion over 10 years to build out    the countrys chip-making capabilities.  <\/p>\n<p>    The chip itself is not overly impressive by the numbers  Jack    Dongarra of the University of Tennessee and Oak Ridge National    Laboratory     outlined the current state of the high-performance computing    space and the challenges it faces, and described the    SW26010s size (built on 28-nanometer technology) and speed    (1.45GHz) as modest compared with what Intel, AMD and other    vendors in the United States are coming out with. However, the    supercomputer is powered by more than 10.6 million cores. By    comparison, Tianhe-2 is running 3.12 million Intel Xeon E5-2692    cores.  <\/p>\n<p>    The size and performance capabilities of the supercomputer,    which is installed at the National Supercomputing Center in    China, makes it an attractive choice when running    computationally intensive workloads like computational fluid    dynamics (CFD), used to simulate occurrences in a broad range    of scientific areas, including meteorology, aerodynamics and    environmental sciences. A group of scientists from the Center    for High Performance Computing at Shanghai Jiao Tong University    in China and the Tokyo Institute of Technology in Japan    recently released a paper outlining experiments they conducted    running a hybrid implementation of the Open Source Field    Operation and Manipulation (OpenFOAM) CFD application on the    TaihuLight system. The researchers wanted to see if they could    develop a hybrid implementation of the software to overcome a    compiler incompatibility situation in the SW26010 processor.    They called OpenFOAM was of the most popular CFD applications    built on C++.  <\/p>\n<p>    In their study, titled Hybrid Implementation and Optimization    of OpenFOAM on the SW26010 Many-core Processor, the    researchers laid out the challenge presented by the chip when    running C++ programs.  <\/p>\n<p>    The processor includes four core groups(CGs), each of which    consists of one management processing element (MPE) and    sixty-four computing processing elements (CPEs) arranged by an    eight by eight grid, they wrote. The basic compiler    components on MPE support C\/C++ programming language, while the    compiler components on CPE only support C. The compilation    incompatibility problem makes it difficult for C++ programs to    exploit the computing power of the SW26010 processor.  <\/p>\n<p>    In order to get high performance from the OpenFOAM program    while running on the chip, the researchers  Delong Meng,    Minhua Wen, Jianwen Wei, James Lin  not only used a    mixed-language design for the application, but also leveraged    several feature-specific optimizations on the SW26010 on the    software. What they did with the OpenFOAM application can also    be used with other complex C++ programs to ensure high    performance when running on systems powered by the SW26010    processor.  <\/p>\n<p>    Details of the study can be found     here, but one of the key steps was developing a    mixed-language programming model for OpenFOAM, in party by    modifying the data storage format and reimplementing the kernel    code with C language. In addition, on the MPE, they put in a    new compilation method for OpenFOAM in which they compile    ThirdParty and OpenFOAM with GCC and swg++-4.5.3, respectively,    and changed the linking mode of OpenFOAM, using the static    library. The optimizations of the MPE included the such areas    as vectorization, data presorting and algorithm optimization.  <\/p>\n<p>    They also took steps for running OpenFOAM on the chips CPE    cluster, which only supports the C compiler, through such steps    as using the master-slave cooperative algorithm of the PCG    method and by modifying the library file. Optimizations of the    CPE were done in such areas as data structure transformation,    register communication, direct memory access (DMA),    prefetching, double buffering and data reuse.  <\/p>\n<p>    The studys authors then tested the software by running it on    both a SW26010 processor and a 2.3GHz Xeon E5-2695 v3 in a test    case involving what they described as a lid-driven cavity    flow. The top boundary of the cube is a moving wall that moves    in the x-direction, whereas the rest are static walls. In the    tests comparing the performance of the MPE, the CPE cluster and    the Intel chip, they found that after optimizing the CPE    cluster, there was an 8.03-times performance increase based on    the optimized implementation on the MPE. In addition, the CPE    cluster was 1.18 times faster than the single-core Intel chip.    However, while the CPE cluster performance was better than that    of the Intel processor, there were issues with efficiency.    Those were due to a smaller cache and scratchpad memory (SPM)    size of the SW26010, which means having to repeatedly load data    into the SPM and hindering memory access. In addition, the DMA    latency was high and the automatic optimizations of the SW26010    applied by the compiler was less efficient than with the Intel    chip.  <\/p>\n<p>    However, the researchers said they proved that the work they    did with OpenFOAM to enable it to reach high performance in the    SW26010 can be used with other C++ workloads.  <\/p>\n<p>    The implementation and results we present demonstrate how    complex codes and algorithms can be efficiently implemented on    such diverse architectures as hybrid MPE-CPEs systems, they    wrote. We can hide hardware-specific programming models into    libraries and make them general purpose. OpenFOAM is now ready    to effectively exploit the new supercomputing system based on    the SW26010 processor.  <\/p>\n<p>    Categories: Code, HPC  <\/p>\n<p>    Tags: China, Sunway, TaihuLight, Top    500  <\/p>\n<p>    Intel Gets Serious About Neuromorphic, Cognitive    Computing Future ARM Gains Stronger Foothold In China With AI And    IoT  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>See original here: <\/p>\n<p><a target=\"_blank\" href=\"https:\/\/www.nextplatform.com\/2017\/02\/13\/top-chinese-supercomputer-blazes-real-world-application-trail\/\" title=\"Top Chinese Supercomputer Blazes Real-World Application Trail - The Next Platform\">Top Chinese Supercomputer Blazes Real-World Application Trail - The Next Platform<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> February 13, 2017 Jeffrey Burt Chinas massive Sunway TaihuLight supercomputer sent ripples through the computing world last year when it debuted in the number-one spot on the Top500 list of the worlds fastest supercomputers. Delivering 93,000 teraflops of performance and a peak of more than 125,000 teraflops the system is nearly three times faster than the second supercomputer on the list (the Tianhe-2, also a Chinese system) and dwarfs the Titan system Oak Ridge National Laboratory, a Cray-based machine that is the worlds third-fastest system, and the fastest in the United States. However, it wasnt only the systems performance that garnered a lot of attention <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/super-computer\/top-chinese-supercomputer-blazes-real-world-application-trail-the-next-platform.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[41],"tags":[],"class_list":["post-207575","post","type-post","status-publish","format-standard","hentry","category-super-computer"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/207575"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=207575"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/207575\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=207575"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=207575"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=207575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}