Luxembourg’s ‘Meluxina’ Supercomputer Project to be Overseen by LuxProvide SA – HPCwire

Sept. 26, 2019 Luxembourg is acquiring a supercomputer called Meluxina which will be co-financed by European funds and will join the European network of EuroHPC supercomputers. Based on the business plan for the installation of this High Performance Computing (HPC) infrastructure prepared by the Ministry of the Economy and LuxConnect, LuxProvide SA was recently created to provide acquisition, launch and operation of Meluxina. The company is a subsidiary of LuxConnect and is headquartered in Bissen.

In addition to the implementation of the 10 petaflops power supercomputer, LuxProvide SA will also provide the various activities related to this high-performance computing capability and the provision of related services, in particular in terms of broadband connectivity and mobile applications. point. Ultimately employing up to 50 people, LuxProvide also aims to facilitate access to the use of Meluxinas capabilities by setting up a skills center to guide and support companies in their high-performance computing projects. .

Meluxina will focus on the needs of its users, including companies and players in the Luxembourg economy, with particular emphasis on the use by SMEs and start-ups as well as on applications in the context of research, personalized medicine and eHealth projects.

LuxProvide will install the Meluxina ECU in LuxConnects DC2 data center in Bissen, which is powered by green energy sourced in part from Kiowatt, the cogeneration power plant fueled by waste wood. The computing power of Meluxina will be 10 petaflops, which corresponds to 10,000,000,000,000,000 calculation operations per second.

The Luxembourg supercomputer Meluxina is a key element of the data-driven innovation strategy of the Ministry of the Economy, which aims to develop a sustainable and reliable digital economy and supports the digital transition of the economy by facilitating competitiveness and business innovation in an increasingly digital world.

A video presenting the supercomputer Meluxina is available under the following link:

Source: Ministry of Economy, Luxembourg

Read more:

Luxembourg's 'Meluxina' Supercomputer Project to be Overseen by LuxProvide SA - HPCwire

Rugby World Cup predictions: Super Computer predicts results for every match in Japan – Express

Eddie Jones is hoping he can lead England to glory in Japan - the team he coached over in England four years ago - as he looks challenge the likes of New Zealand and South Africa.

England are third favourites with most bookmakers with holders New Zealand heavy favourites to retain their crown.

The action is underway with the hosts beating Russia in the first game to kick the tournament off, and QBE Business Insurance have run the numbers to predict how things will turn out.

And its good news for the home nations who should all qualify from their pools.

Japan 40-12 Russia

Australia 36-11 Fiji

France 22-20 Argentina

New Zealand 28-17 South Africa

Italy 36-11 Namibia

Ireland 23-16 Scotland

England 33-11 Tonga

Wales 30-9 Georgia

Russia 16-22 Samoa

Fiji 35-12 Uruguay

Italy 29-12 Canada

England 45-9 USA

Argentina 29-19 Tonga

Japan 21-37 Ireland

South Africa 72-0 Namibia

View post:

Rugby World Cup predictions: Super Computer predicts results for every match in Japan - Express

Russian Nuclear Engineer Fined for Trying to Mine Bitcoin on One of the Country’s Most Powerful Supercomputers – Newsweek

A Russian scientist has been fined the equivalent of $7,000 for using a supercomputer inside a secretive nuclear facility to mine for bitcoin cryptocurrency.

Denis Baykov, an employee of the Federal Nuclear Center in Sarov, was fined 450,000 rubles on September 17 after being found guilty of violating the lab's internal computer policies, RIA Novosti reported via The Moscow Times, citing a ruling published by the city court.

Read more

Two additional staff members, Andrei Rybkin and Andrei Shatokhin, are still facing legal action. The employees were charged with unlawful access to computer information and using unauthorized computer software, RIA Novosti reported

Bitcoin, the most popular type of cryptocurrency, is created using computing power, which requires a lot of energy resources. The process is known as mining.

News of the arrests came to light in February 2018, when the Interfax news agency reported that security at the nuclear facility was alerted to the illicit mining activity. According to the BBC, the scientists raised a red flag by connecting the computer to the internet. "There was an attempt at unauthorized use of office computing power for personal purposes, including for the so-called mining," the institute said in a statement at the time.

Alexei Korolev, the lawyer for one of the defendants, told state media outlet RT that the engineers developed a special program that was supposed to keep their activities undetected. He said they managed to mine some bitcoin, but the exact amount was not immediately clear.

Korolev confirmed the nuclear scientists had pleaded guilty after their arrest. "They regret what they did," he noted. "But I think they went for it out of professional interest, not for the purpose of profit."

According to RT, the hearing date for Rybkin and Shatokhin has not yet been scheduled, but the case was received by the city court on September 11.

RT is a news outlet financed by the Russian government. The Sarov lab, founded in 1946, was responsible for producing the first Soviet nuclear weapon, The Moscow Times reported. The lab houses a supercomputer capable of conducting 1,000 trillion calculations per second.

In August, employees of a power plant in Ukraine exposed secret information after installing cryptocurrency mining rigs into the network, the website SecurityWeek reported at the time.

The Security Service of Ukraine found staffers of the South Ukraine Nuclear Power Station had been using the plant's systems to power their mining devices, but they appeared to have aided the leak of classified data after the equipment was linked up to the internet. Typically, critical computer networks can be isolated from the internet, or "air-gapped," for security purposes.

Read more here:

Russian Nuclear Engineer Fined for Trying to Mine Bitcoin on One of the Country's Most Powerful Supercomputers - Newsweek

iPhone 11 Cinematography: The 5 Breakthroughs of the New Camera, Explained – IndieWire

Despite major annual updates, progress can be incremental in the world of iPhone cinematography and photography. And Apple events feature an avalanche of impressive specs and gimmicky features geared toward making consumers feel like the latest and greatest will make them a professional shooter.

To get past the hype, IndieWire spoke with Filmic Pro CTO Chris Cohen. He shared the stage with filmmaker Sean Baker at the big Apple unveiling, and its Cohens app that allows every serious filmmaker, from Baker to Steven Soderbergh, to use the iPhone like a professional camera. We also talked to the iPhone experts at Moment, a five-year-old company that creates apps and tools for professional iPhone shooters.

Here are the five actual breakthrough camera advances in the iPhone 11 that should have filmmakers excited.

iPhone 11 Ultra Wide Lens

screenshot

1. The Ultra Wide Lens

If youve ever shot anything on an iPhone, youll notice that switching from photo to video mode tightens the image to create a more limited field of view. To widen that view, filmmakers rely on a third-party lens attachments: Soderbergh used Moments 18mm on Unsane, Baker the anamorphic Moondog lens on Tangerine. With the new iPhone 11, Apples Ultra Wide lens solves this problem.

RelatedRelated

It looks to sit right around a 13mm, said Caleb Babcock, chief content creator at Moment. Which is perfect, because any wider on the iPhone and you start to get that fish-eye look.

Director Rian Johnson (Star Wars: The Last Jedi, Knives Out) experimented with an early iPhone 11 Pro. He shot footage in Paris (shared below). It features some of the first shots weve seen from the new ultra wide lens, which he tweeted was a real game changer. The optics look solid, while being, as Babcock speculated, right on that edge of being too wide.

iPhone cinematography will likely continue to be most effective when shooting subjects who are relatively close to iPhone. The camera still lacks the ability to capture detail for images with too much scope, which makes the ability to get wider and see more in intimate situations an incredibly important feature.

2. The Selfie Camera

Until the iPhone 11, the user-facing camera commonly used for FaceTime and selfies has not been a pro tool, lacking the optics and sensor of the back-facing lenses.

Weve always discouraged it to our users, said Cohen. Weve even had internal conversations of whether we should even let users use the front-facing lens, because the quality was just poor.

Apples user-facing camera is now TrueDepth, and represents one of the most significant upgrades made to its camera system. The camera is now 12 megapixels, has a significantly wider lens, and the ability to capture in 4K up to 60 frames per second. Cohen, who got early access to the camera in order to build the new software used in Bakers demo with jazz musicians, said everyone at Filmic Pro was blown away by the massive upgrade, adding, Its a worthy addition to the lens kit now.

Heres why this matters:

iPhone 11 Shot-Reverse-Shot using upgraded user facing camera

screenshot

3. Shot Reverse Shot

Much attention has been placed on the iPhone 11s ability to simultaneously record two video streams from the back-facing cameras a great feature for photographers, less so for filmmakers. To seamlessly cut together multi-camera coverage, and avoid jump cuts, the two shots need both a different image size (which the iPhone can now do), and a change of angle (which the iPhone still cant do).

One of the only ways to make two shots cut together is a straight ahead, perfectly centered symmetrical frame think Stanley Kubrick or Wes Anderson. So while those real-world applications are limited, theres a lot more potential in the new shot-reverse-shot capabilities.

As a filmmaker, theres some really practical use cases for it, said Babcock. If someone wanted to record a podcast, youre sitting across the desk from someone, one camera in the middle, and youre getting both angles. That goes for documentary use as well.

In fact, when Apple first invited Filmic Pro into look at the technology and asked them how they could best represent its capabilities to users, Cohen and his team suggested an interview demo.

That was the first version of the pitch: A news reporter conducting an interview, with shot-reverse-shot, and in the end they wanted something more artsy, said Cohen. But thats how we envisioned this feature. We wanted to empower storytellers, and those will be our early adopters with this feature.

Director Sean Baker and Filmic Pro CTO Chris Cohen at the Apple Event unveiling the iPhone 11

Screenshot

4. Camera + Super Computer

Smartphone companies love to hype the power of their newest processing chips, and eye rolls from the software engineers usually follow. We always joke, Great all this power, I wonder how fast this will throttle. 30 seconds? 40 seconds?, said Cohen. Because even though there is a lot of peak performance on tap with the processors Apple has been making, theyre sandwiched between two pieces of glass, so for a high performance application like Filmic Pro that has a computation imagining pipeline, we can only really tap into about 30 percent of that maximum potential before the system fails.

However, the new A13 chips in all iPhone 11s are another matter. At one point, while building the demo app using an iPhone 11 prototype, Cohens Filmic team had six composites showing at once. This thing wasnt even getting hot to the touch, said Cohen. Its a breakthrough in terms of sustaining performance, and thats going to have huge implications for what we do.

Phil Pasqual, the head of Moments App team, agrees. These phones are extremely powerful and the benchmarks on the chips in them are not far off from a laptop computer, said Pasqual. Youre basically pairing a camera with a super computer.

Pasqual said the cameras ability to take multiple photos simultaneously, combined with an algorithm that can merge them intelligently and in real time, is a paradigm shift. The next two years are going to be very interesting, said Cohen. Youre going to see things with real time imaging software thats going to blow you away.

An important iPhone professional advance of the last two years was Filmic Pros Log V2. This gave cinematographers the ability to record video images that preserved maximum dynamic range information, simulating the process of recording in Log or Raw on professional cameras. These images could then be accessed in a professional post-production color grade setting.

I would say Log V2 was as far as we could push it in terms of previous versions of software, said Cohen. Now, our heads are spinning. We have a lot of things we were planning to put on the road map that we werent planning to put in there for the next two or three years. Now we are seriously considering fast-tracking them, because the sustaining performance is so good.

Apples iPhone 11

screenshot

5. Its Not Just the iPhone 11 Pro

For professional cinematographers, the focus has been on the most expensive Pro model. However, most of the camera advances are in all the new iPhone 11 models. The Pro does have the third telephoto lens in back, extra battery power, and a matte finish. Most importantly, all iPhone 11s have the A13 chip, ultra wide lens, upgraded user facing camera, and the newest capture sensors which increases the native dynamic range of the iPhone.

Apple, to their credit, said Cohen. They could have arbitrarily made the pro artificially superior to the other ones, but they did not do that.

The Local Tone Mapping Problem: Soderbergh and others have pleaded with Apple to fix, or at least allow the ability to turn off, the iPhones local tone mapping that can adjust the exposure of a portion of the frame in the middle of a shot. It would appear that issue will become more manageable with the iPhone 11.

Im not in a position to speak for Apple, said Cohen. What I am going to say is that issue looks like it Im going to use my words carefully here I dont think itll be such a problem.

When Can We Expect the new Filmic Pro App?: We have never been beholden to hard deadlines because of our internal process, said Cohen. We give early access to filmmakers and educators and, with their feedback, we go to market or we may re-tool. Were just saying the end of the year. That said, we do reserve the right to go behind that if part of the user experience need to improve.

And will some of the features shown with Baker at the Apple launch event be accessible, through updates, before then? Cohen declined to answer.

Is a Composite Zoom Through all Three Pro Lens Possible? Its possible to zoom through all the focal lengths using a combination of digital zoom and lens switching, said Cohen. It comes with some caveats. Switching between lenses, you are going to have different effective apertures. Youre also going to have different characteristics of lens compression. If you were to do, lets call it a composite, multi-cam zoom, you wouldnt notice it if the zoom was relatively fast, but you would notice it if it was very, very slow.

Sign Up: Stay on top of the latest breaking film and TV news! Sign up for our Email Newsletters here.

See original here:

iPhone 11 Cinematography: The 5 Breakthroughs of the New Camera, Explained - IndieWire

Home | TOP500 Supercomputer Sites

AMD Posts Transitional First Quarter Ahead of Rome Launch

On the eve of its 50th anniversary, Advanced Micro Devices (AMD) reported sales of $1.27 billion for Q1 2019, down 10 percent quarter-over-quarter and 23 percent year-over-year. Despite the drop, revenue came in above Wall Streets expectations, and AMD is continuing its push to win back datacenter market share ceded to Intel over the last []

The post AMD Posts Transitional First Quarter Ahead of Rome Launch appeared first on HPCwire.

TAIPEI, Taiwan,May 2, 2019 Computer and server manufacturer Inventec Enterprise Business Group (Inventec EBG) today announced the release of its P47G4 server solution, optimized for AMD deep learning technologies. The P47G4 server is one of four optimized server solutions and featuresa 2U, single-socketsystem equipped withAMD EPYC processors and up to four AMD Radeon Instinct []

The post Inventec Collaborates with AMD to Provide Deep Learning Solutions appeared first on HPCwire.

One reason China has a good chance of hitting its ambitious goal to reach exascale computing in 2020 is that the government is funding three separate architectural paths to attain that milestone.

China Fleshes Out Exascale Design for Tianhe-3 Supercomputer was written by Michael Feldman at .

Over at the IBM Blog, Rahil Garnaviwrites that IBM researchers have developed new techniques in deep learning that could help unlock earlier glaucoma detection."Earlier detection of glaucoma is critical to slowing its progression in individuals and its rise across our global population. Using deep learning to uncover valuable information in non-invasive, standard retina imaging could lay the groundwork for new and much more rapid glaucoma testing."

The post IBM Research Applies Deep Learning for Detecting Glaucoma appeared first on insideHPC.

Researchers at the University of Pittsburgh are using XSEDE supercomputing resources to develop new materials that can capture carbon dioxide and turn it into a commercially useful substances. With global climate change resulting from increasing levels of carbon dioxide in the Earth's atmosphere, the work could lead to a lasting impact on our environment. "The basic idea here is that we are looking to improve the overall energetics of CO2 capture and conversion to some useful material, as opposed to putting it in the ground and just storing it someplace," saidKarl Johnson from the University of Pittsburgh. "But capture and conversion are typically different processes."

The post Pitt Researchers using HPC to turn CO2 into Useful Products appeared first on insideHPC.

Field Programmable Gate Arrays (FPGAs) have notched some noticeable wins as a platform for machine learning, Microsofts embrace of the technology in Azure being the most notable example.

FPGAs Open Gates in Machine Learning was written by Michael Feldman at .

See the original post:

Home | TOP500 Supercomputer Sites

Supercomputer – Simple English Wikipedia, the free …

A supercomputer is a computer with great speed and memory. This kind of computer can do jobs faster than any other computer of its generation. They are usually thousands of times faster than ordinary personal computers made at that time. Supercomputers can do arithmetic jobs very fast, so they are used for weather forecasting, code-breaking, genetic analysis and other jobs that need many calculations. When new computers of all classes become more powerful, new ordinary computers are made with powers that only supercomputers had in the past, while new supercomputers continue to outclass them.

Electrical engineers make supercomputers that link many thousands of microprocessors.

Supercomputer types include: shared memory, distributed memory and array. Supercomputers with shared memory are developed by using a parallel computing and pipelining concept. Supercomputers with distributed memory consist of many (about 100~10000) nodes. CRAY series of CRAYRESERCH and VP 2400/40, NEC SX-3 of HUCIS are shared memory types. nCube 3, iPSC/860, AP 1000, NCR 3700, Paragon XP/S, CM-5 are distributed memory types.

An array type computer named ILIAC started working in 1972. Later, the CF-11, CM-2, and the Mas Par MP-2 (which is also an array type) were developed. Supercomputers that use a physically separated memory as one shared memory include the T3D, KSR1, and Tera Computer.

Organizations

Centers

Read more from the original source:

Supercomputer - Simple English Wikipedia, the free ...

Lawrence Livermore Labs turns on Sierra supercomputer …

Covering 7,000 square feet and with 240 computing racks and 4,320 nodes, a classified government lab holds what looks like a futuristic mini city of black boxes with flashing blue and green lights.

This buzzing machine, called the Sierra supercomputer, is the third most powerful computer in the world. It was unveiled Friday at its home, the Lawrence Livermore National Laboratory (LLNL) in California, after four years in the making.

At its peak, Sierra can do 125 quadrillion calculations in a second. Its simulations are 100,000 times more realistic than anything a normal desktop computer can make. The only two supercomputers that are more powerful are China's Sunway Taihulight in second place and IBM's Summit in first.

"It would take 10 years to do the calculations this machine can do in one second," said Ian Buck, vice president and general manager of accelerated computing at NVIDIA.

Powering such a massive electronic brain takes about 11 to 12 megawatts of energy, roughly the equivalent of what's needed to power 12,000 homes a relatively energy efficient level of energy consumption, according to Sierra's creators.

Right now, Sierra is partnering with medical labs to help develop cancer treatments and study traumatic brain injury before it switches to classified work.

Many of the 4,000 nuclear weapons in the government's stockpile are aging. Once the Sierra switches to classified production in early 2019, it will focus on top secret government activities and it will use simulations to test the safety and reliability of these weapons, without setting off the weapons themselves and endangering people.

Besides assessing nuclear weapons, this supercomputer can create simulations to predict the effects of cancer, earthquakes and more. In other words, it can answer questions in 3D.

Sierra supercomputer Rosalie Chan

The lab and the Department of Energy worked with IBM, NVIDIA and Mellanox on this project. Talks for Sierra began in 2012, and in 2014 the project took off. Now, it's six to ten times more powerful than its predecessor, Sequoia.

What makes the Sierra notably different is the NVLink, which connect Sierra's processing units and gives it more powerful memory.

"What's most fascinating is the scale of what it can do and the nature of the system that opens itself to the next generation workload," said Akhtar Ali, VP of technical computing software at IBM. "Now these systems will do the kind of breakthrough science that's pervasive right now.

The lab also installed another new supercomputer called Lassen, which will focus on unclassified work like speeding cancer drug discovery, research in traumatic brain injury, and studying earthquakes and the climate.

Sierra's not the last supercomputer the lab will build. They're already planning the next one: "El Capitan," which can do more than a quintillion calculations per second -- 10 times more powerful than the colossal Sierra.

The lab expects to flip the switch on El Capitan sometime in the 2021 to 2023 time frame.

In case you're wondering, the supercomputers are all named after natural landmarks in California.

And no, Lawrence Livermore National Laboratories spokesperson Jeremy Thomas says, there are no plans to use the Sierra supercomputer for bitcoin mining.

"While it would probably be great at it, mining bitcoin is definitely not part of our mission" Thomas says.

Sierra supercomputer Rosalie Chan

See the rest here:

Lawrence Livermore Labs turns on Sierra supercomputer ...

History of supercomputing – Wikipedia

The history of supercomputing goes back to the early 1920s in the United States with the IBM tabulators at Columbia University and a series of computers at Control Data Corporation (CDC), designed by Seymour Cray to use innovative designs and parallelism to achieve superior computational peak performance.[1] The CDC 6600, released in 1964, is generally considered the first supercomputer.[2][3] However, some earlier computers were considered supercomputers for their day, such as the 1954 IBM NORC[4], the 1960 UNIVAC LARC[5], and the IBM 7030 Stretch[6] and the Atlas, both in 1962.

While the supercomputers of the 1980s used only a few processors, in the 1990s, machines with thousands of processors began to appear both in the United States and in Japan, setting new computational performance records.

By the end of the 20th century, massively parallel supercomputers with thousands of "off-the-shelf" processors similar to those found in personal computers were constructed and broke through the teraflop computational barrier.

Progress in the first decade of the 21st century was dramatic and supercomputers with over 60,000 processors appeared, reaching petaflop performance levels.

The term "Super Computing" was first used in the New York World in 1929 to refer to large custom-built tabulators that IBM had made for Columbia University.

In 1957, a group of engineers left Sperry Corporation to form Control Data Corporation (CDC) in Minneapolis, Minnesota. Seymour Cray left Sperry a year later to join his colleagues at CDC.[1] In 1960, Cray completed the CDC 1604, one of the first solid-state computers, and the fastest computer in the world[dubious discuss] at a time when vacuum tubes were found in most large computers.[7]

Around 1960, Cray decided to design a computer that would be the fastest in the world by a large margin. After four years of experimentation along with Jim Thornton, and Dean Roush and about 30 other engineers Cray completed the CDC 6600 in 1964. Cray switched from germanium to silicon transistors, built by Fairchild Semiconductor, that used the planar process. These did not have the drawbacks of the mesa silicon transistors. He ran them very fast, and the speed of light restriction forced a very compact design with severe overheating problems, which were solved by introducing refrigeration, designed by Dean Roush.[8] Given that the 6600 outran all computers of the time by about 10 times, it was dubbed a supercomputer and defined the supercomputing market when one hundred computers were sold at $8 million each.[7][9]

The 6600 gained speed by "farming out" work to peripheral computing elements, freeing the CPU (Central Processing Unit) to process actual data. The Minnesota FORTRAN compiler for the machine was developed by Liddiard and Mundstock at the University of Minnesota and with it the 6600 could sustain 500kiloflops on standard mathematical operations.[10] In 1968, Cray completed the CDC 7600, again the fastest computer in the world.[7] At 36MHz, the 7600 had about three and a half times the clock speed of the 6600, but ran significantly faster due to other technical innovations. They sold only about 50 of the 7600s, not quite a failure. Cray left CDC in 1972 to form his own company.[7] Two years after his departure CDC delivered the STAR-100 which at 100megaflops was three times the speed of the 7600. Along with the Texas Instruments ASC, the STAR-100 was one of the first machines to use vector processing - the idea having been inspired around 1964 by the APL programming language.[11][12]

In 1956, a team at Manchester University in the United Kingdom, began development of MUSE a name derived from microsecond engine with the aim of eventually building a computer that could operate at processing speeds approaching onemicrosecond per instruction, about onemillion instructions per second.[13] Mu (or ) is a prefix in the SI and other systems of units denoting a factor of 106 (one millionth).

At the end of 1958, Ferranti agreed to begin to collaborate with Manchester University on the project, and the computer was shortly afterwards renamed Atlas, with the joint venture under the control of Tom Kilburn. The first Atlas was officially commissioned on 7December 1962, nearly three years before the Cray CDC 6600 supercomputer was introduced, as one of the world's first supercomputers - and was considered to be the most powerful computer in England and for a very short time was considered to be one of the most powerful computers in the world, and equivalent to four IBM 7094s.[14] It was said that whenever England's Atlas went offline half of the United Kingdom's computer capacity was lost.[14] The Atlas pioneered the use of virtual memory and paging as a way to extend the Atlas's working memory by combining its 16,384 words of primary core memory with an additional 96K words of secondary drum memory.[15] Atlas also pioneered the Atlas Supervisor, "considered by many to be the first recognizable modern operating system".[14]

Four years after leaving CDC, Cray delivered the 80MHz Cray-1 in 1976, and it became the most successful supercomputer in history.[12][16] The Cray-1 used integrated circuits with two gates per chip and was a vector processor which introduced a number of innovations such as chaining in which scalar and vector registers generate interim results which can be used immediately, without additional memory references which reduce computational speed.[8][17] The Cray X-MP (designed by Steve Chen) was released in 1982 as a 105MHz shared-memory parallel vector processor with better chaining support and multiple memory pipelines. All three floating point pipelines on the X-MP could operate simultaneously.[17]

The Cray-2 released in 1985 was a 4processor liquid cooled computer totally immersed in a tank of Fluorinert, which bubbled as it operated.[8] It could perform to 1.9gigaflops and was the world's second fastest supercomputer after M-13 (2.4gigaflops)[18] until 1990 when ETA-10G from CDC overtook both. The Cray 2 was a totally new design and did not use chaining and had a high memory latency, but used much pipelining and was ideal for problems that required large amounts of memory.[17] The software costs in developing a supercomputer should not be underestimated, as evidenced by the fact that in the 1980s the cost for software development at Cray came to equal what was spent on hardware.[19] That trend was partly responsible for a move away from the in-house, Cray Operating System to UNICOS based on Unix.[19]

The Cray Y-MP, also designed by Steve Chen, was released in 1988 as an improvement of the X-MP and could have eight vector processors at 167MHz with a peak performance of 333megaflops per processor.[17] In the late 1980s, Cray's experiment on the use of gallium arsenide semiconductors in the Cray-3 did not succeed. Seymour Roger Cray began to work on a massively parallel computer in the early 1990s, but died in a car accident in 1996 before it could be completed. Cray Research did, however, produce such computers.[16][8]

The Cray-2 which set the frontiers of supercomputing in the mid to late 1980s had only 8 processors. In the 1990s, supercomputers with thousands of processors began to appear. Another development at the end of the 1980s was the arrival of Japanese supercomputers, some of which were modeled after the Cray-1.

The SX-3/44R was announced by NEC Corporation in 1989 and a year later earned the fastest in the world title with a 4 processor model.[20] However, Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994. It had a peak speed of 1.7gigaflops per processor.[21][22] The Hitachi SR2201 on the other hand obtained a peak performance of 600gigaflops in 1996 by using 2048processors connected via a fast three-dimensional crossbar network.[23][24][25]

In the same timeframe the Intel Paragon could have 1000 to 4000 Intel i860 processors in various configurations, and was ranked the fastest in the world in 1993. The Paragon was a MIMD machine which connected processors via a high speed two-dimensional mesh, allowing processes to execute on separate nodes; communicating via the Message Passing Interface.[26] By 1995 Cray was also shipping massively parallel systems, e.g. the Cray T3E with over 2,000 processors, using a three-dimensional torus interconnect.[27][28]

The Paragon architecture soon led to the Intel ASCI Red supercomputer in the United States, which held the top supercomputing spot to the end of the 20th century as part of the Advanced Simulation and Computing Initiative. This was also a mesh-based MIMD massively-parallel system with over 9,000 compute nodes and well over 12 terabytes of disk storage, but used off-the-shelf Pentium Pro processors that could be found in everyday personal computers. ASCI Red was the first system ever to break through the 1teraflop barrier on the MP-Linpack benchmark in 1996; eventually reaching 2teraflops.[29]

Significant progress was made in the first decade of the 21st century. The efficiency of supercomputers continued to increase, but not dramatically so. The Cray C90 used 500 kilowatts of power in 1991, while by 2003 the ASCI Q used 3,000kW while being 2,000 times faster, increasing the performance per watt 300 fold.[30]

In 2004, the Earth Simulator supercomputer built by NEC at the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) reached 35.9teraflops, using 640nodes, each with eight proprietary vector processors.[31]

The IBM Blue Gene supercomputer architecture found widespread use in the early part of the 21st century, and 27 of the computers on the TOP500 list used that architecture. The Blue Gene approach is somewhat different in that it trades processor speed for low power consumption so that a larger number of processors can be used at air cooled temperatures. It can use over 60,000 processors, with 2048 processors "per rack", and connects them via a three-dimensional torus interconnect.[32][33]

Progress in China has been rapid, in that China placed 51st on the TOP500 list in June 2003, then 14th in November 2003, and 10th in June 2004 and then 5th during 2005, before gaining the top spot in 2010 with the 2.5petaflop Tianhe-I supercomputer.[34][35]

In July 2011, the 8.1petaflop Japanese K computer became the fastest in the world using over 60,000 SPARC64 VIIIfx processors housed in over 600 cabinets. The fact that K computer is over 60 times faster than the Earth Simulator, and that the Earth Simulator ranks as the 68th system in the world seven years after holding the top spot demonstrates both the rapid increase in top performance and the widespread growth of supercomputing technology worldwide.[36][37][38] By 2014, the Earth Simulator had dropped off the list and by 2018 K computer had dropped out of the top 10.

This is a list of the computers which appeared at the top of the Top500 list since 1993.[39] The "Peak speed" is given as the "Rmax" rating.

Combined performance of 500 largest supercomputers

Fastest supercomputer

Supercomputer in 500th place

The CoCom and its later replacement, the Wassenaar Arrangement, legally regulated - required licensing and approval and record-keeping; or banned entirely - the export of high-performance computers (HPCs) to certain countries. Such controls have become harder to justify, leading to loosening of these regulations. Some have argued these regulations were never justified.[40][41][42][43][44][45]

Read the original:

History of supercomputing - Wikipedia

ORNL Launches Summit Supercomputer | ORNL

OAK RIDGE, Tenn., June 8, 2018The U.S. Department of Energys Oak Ridge National Laboratory today unveiled Summit as the worlds most powerful and smartest scientific supercomputer.

With a peak performance of 200,000 trillion calculations per secondor 200 petaflops, Summit will be eight times more powerful than ORNLs previous top-ranked system, Titan. For certain scientific applications, Summit will also be capable of more than three billion billion mixed precision calculations per second, or 3.3 exaops. Summit will provide unprecedented computing power for research in energy, advanced materials and artificial intelligence (AI), among other domains, enabling scientific discoveries that were previously impractical or impossible.

Todays launch of the Summit supercomputer demonstrates the strength of American leadership in scientific innovation and technology development. Its going to have a profound impact in energy research, scientific discovery, economic competitiveness and national security, said Secretary of Energy Rick Perry. I am truly excited by the potential of Summit, as it moves the nation one step closer to the goal of delivering an exascale supercomputing system by 2021. Summit will empower scientists to address a wide range of new challenges, accelerate discovery, spur innovation and above all, benefit the American people.

The IBM AC922 system consists of 4,608 compute servers, each containing two 22-core IBM Power9 processors and six NVIDIA Tesla V100 graphics processing unit accelerators, interconnected with dual-rail Mellanox EDR 100Gb/s InfiniBand. Summit also possesses more than 10 petabytes of memory paired with fast, high-bandwidth pathways for efficient data movement. The combination of cutting-edge hardware and robust data subsystems marks an evolution of the hybrid CPUGPU architecture successfully pioneered by the 27-petaflops Titan in 2012.

ORNL researchers have figured out how to harness the power and intelligence of Summits state-of-art architecture to successfully run the worlds first exascale scientific calculation. A team of scientists led by ORNLs Dan Jacobson and Wayne Joubert has leveraged the intelligence of the machine to run a 1.88 exaops comparative genomics calculation relevant to research in bioenergy and human health. The mixed precision exaops calculation produced identical results to more time-consuming 64-bit calculations previously run on Titan.

From its genesis 75 years ago, ORNL has a history and culture of solving large and difficult problems with national scope and impact, ORNL Director Thomas Zacharia said. ORNL scientists were among the scientific teams that achieved the first gigaflops calculations in 1988, the first teraflops calculations in 1998, the first petaflops calculations in 2008 and now the first exaops calculations in 2018. The pioneering research of ORNL scientists and engineers has played a pivotal role in our nations history and continues to shape our future. We look forward to welcoming the scientific user community to Summit as we pursue another 75 years of leadership in science.

In addition to scientific modeling and simulation, Summit offers unparalleled opportunities for the integration of AI and scientific discovery, enabling researchers to apply techniques like machine learning and deep learning to problems in human health, high-energy physics, materials discovery and other areas. Summit allows DOE and ORNL to respond to the White House Artificial Intelligence for America initiative.

Summit takes accelerated computing to the next level, with more computing power, more memory, an enormous high-performance file system and fast data paths to tie it all together. That means researchers will be able to get more accurate results faster, said Jeff Nichols, ORNL associate laboratory director for computing and computational sciences. Summits AI-optimized hardware also gives researchers an incredible platform for analyzing massive datasets and creating intelligent software to accelerate the pace of discovery.

Summit moves the nation one step closer to the goal of developing and delivering a fully capable exascale computing ecosystem for broad scientific use by 2021.

Summit will be open to select projects this year while ORNL and IBM work through the acceptance process for the machine. In 2019, the bulk of access to the IBM system will go to research teams selected through DOEs Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, program.

In anticipation of Summits launch, researchers have been preparing applications for its next-generation architecture, with many ready to make effective use of the system on day one. Among the early science projects slated to run on Summit:

Astrophysics

Exploding stars, known as supernovas, supply researchers with clues related to how heavy elementsincluding the gold in jewelry and iron in bloodseeded the universe.

The highly scalable FLASH code models this process at multiple scalesfrom the nuclear level to the large-scale hydrodynamics of a stars final moments. On Summit, FLASH will go much further than previously possible, simulating supernova scenarios several thousand times longer and tracking about 12 times more elements than past projects.

Its at least a hundred times more computation than weve been able to do on earlier machines, said ORNL computational astrophysicist Bronson Messer. The sheer size of Summit will allow us to make very high-resolution models.

Materials

Developing the next generation of materials, including compounds for energy storage, conversion and production, depends on subatomic understanding of material behavior. QMCPACK, a quantum Monte Carlo application, simulates these interactions using first-principles calculations.

Up to now, researchers have only been able to simulate tens of atoms because of QMCPACKs high computational cost. Summit, however, can support materials composed of hundreds of atoms, a jump that aids the search for a more practical superconductora material that can transmit electricity with no energy loss.

Summits large, on-node memory is very important for increasing the range of complexity in materials and physical phenomena, said ORNL staff scientist Paul Kent. Additionally, the much more powerful nodes are really going to help us extend the range of our simulations.

Cancer Surveillance

One of the keys to combating cancer is developing tools that can automatically extract, analyze and sort existing health data to reveal previously hidden relationships between disease factors such as genes, biological markers and environment. Paired with unstructured data such as text-based reports and medical images, machine learning algorithms scaled on Summit will help supply medical researchers with a comprehensive view of the U.S. cancer population at a level of detail typically obtained only for clinical trial patients.

This cancer surveillance project is part of the CANcer Distributed Learning Environment, or CANDLE, a joint initiative between DOE and the National Cancer Institute.

Essentially, we are training computers to read documents and abstract information using large volumes of data, ORNL researcher Gina Tourassi said. Summit enables us to explore much more complex models in a time efficient way so we can identify the ones that are most effective.

Systems Biology

Applying machine learning and AI to genetic and biomedical datasets offers the potential to accelerate understanding of human health and disease outcomes.

Using a mix of AI techniques on Summit, researchers will be able to identify patterns in the function, cooperation and evolution of human proteins and cellular systems. These patterns can collectively give rise to clinical phenotypes, observable traits of diseases such as Alzheimers, heart disease or addiction, and inform the drug discovery process.

Through a strategic partnership project between ORNL and the U.S. Department of Veterans Affairs, researchers are combining clinical and genomic data with machine learning and Summits advanced architecture to understand the genetic factors that contribute to conditions such as opioid addiction.

The complexity of humans as a biological system is incredible, said ORNL computational biologist Dan Jacobson. Summit is enabling a whole new range of science that was simply not possible before it arrived.

Summit is part of the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility located at ORNL. UT-Battelle manages ORNL for the Department of Energys Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOEs Office of Science is working to address some of the most pressing challenges of our time. For more information, please visithttp://science.energy.gov.

Image:https://www.ornl.gov/sites/default/files/2018-P01537.jpg

Caption: Oak Ridge National Laboratory launches Summit supercomputer.

Photos, b-roll and additional resources are available at http://olcf.ornl.gov/summit.

Access Summit Flickr Photos at https://flic.kr/s/aHsmmTwKLg.

Videos of Summit available at https://www.dropbox.com/sh/fy76ppz7cvjblia/AAC0m93xBWk4poM-rRwJbiZza?dl=0.

View original post here:

ORNL Launches Summit Supercomputer | ORNL

IBM Blue Gene – Wikipedia

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the PFLOPS (petaFLOPS) range, with low power consumption.

The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. Blue Gene systems have often led the TOP500[1] and Green500[2] rankings of the most powerful and most power efficient supercomputers, respectively. Blue Gene systems have also consistently scored top positions in the Graph500 list.[3] The project was awarded the 2009 National Medal of Technology and Innovation.[4]

As of 2015, IBM seems to have ended the development of the Blue Gene family[5] though no public announcement has been made. IBM's continuing efforts of the supercomputer scene seems to be concentrated around OpenPower, using accelerators such as FPGAs and GPUs to battle the end of Moore's law.[6]

In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding.[7] The project had two main goals: to advance our understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures. The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. The initial research and development work was pursued at IBM T.J. Watson Research Center and led by William R. Pulleyblank.[8]

At IBM, Alan Gara started working on an extension of the QCDOC architecture into a more general-purpose supercomputer: The 4D nearest-neighbor interconnection network was replaced by a network supporting routing of messages from any node to any other; and a parallel I/O subsystem was added. DOE started funding the development of this system and it became known as Blue Gene/L (L for Light); development of the original Blue Gene system continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64.

In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a Linpack performance of 70.72 TFLOPS.[1] It thereby overtook NEC's Earth Simulator, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL[9] gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based Roadrunner system at Los Alamos National Laboratory, which was the first system to surpass the 1 PetaFLOPS mark. The system was built in Rochester, MN IBM plant.

While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. In November 2006, there were 27 computers on the TOP500 list using the Blue Gene/L architecture. All these computers were listed as having an architecture of eServer Blue Gene Solution. For example, three racks of Blue Gene/L were housed at the San Diego Supercomputer Center.

While the TOP500 measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real-world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005 Gordon Bell Prize.

In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (Qbox).[10] At Supercomputing 2006,[11] Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards.[12] In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).[13]

The name Blue Gene comes from what it was originally designed to do, help biologists understand the processes of protein folding and gene development.[14] "Blue" is a traditional moniker that IBM uses for many of its products and the company itself. The original Blue Gene design was renamed "Blue Gene/C" and eventually Cyclops64. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a petascale design. "Q" is just the letter after "P". There is no Blue Gene/R.[15]

The Blue Gene/L supercomputer was unique in the following aspects:[16]

The Blue Gene/L architecture was an evolution of the QCDSP and QCDOC architectures. Each Blue Gene/L Compute or I/O node was a single ASIC with associated DRAM memory chips. The ASIC integrated two 700MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS (gigaFLOPS). The two CPUs were not cache coherent with one another.

Compute nodes were packaged two per compute card, with 16 compute cards plus up to 2 I/O nodes per node board. There were 32 node boards per cabinet/rack.[17] By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated low power (about 17 watts, including DRAMs). This allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard 19-inch rack, within reasonable limits of electrical power supply and air cooling. The performance metrics, in terms of FLOPS per watt, FLOPS per m2 of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.

Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network. The I/O nodes handled filesystem operations on behalf of the compute nodes. Finally, a separate and private Ethernet network provided access to any node for configuration, booting and diagnostics. To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive integer power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.

Blue Gene/L compute nodes used a minimal operating system supporting a single user program. Only a subset of POSIX calls was supported, and only one process could run at a time on node in co-processor modeor one process per CPU in virtual mode. Programmers needed to implement green threads in order to simulate local concurrency. Application development was usually performed in C, C++, or Fortran using MPI for communication. However, some scripting languages such as Ruby[18] and Python[19] have been ported to the compute nodes.

In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and Argonne National Laboratory's Leadership Computing Facility.[20]

The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850MHz. The cores are cache coherent and the chip can operate as a 4-way symmetric multiprocessor (SMP). The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A rack contains 32 node boards (thus 1024 nodes, 4096 processor cores).[21]By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371 MFLOPS/W Blue Gene/P installations ranked at or near the top of the Green500 lists in 2007-2008.[2]

The following is an incomplete list of Blue Gene/P installations. Per November 2009, the TOP500 list contained 15 Blue Gene/P installations of 2-racks (2048 nodes, 8192 processor cores, 23.86 TFLOPS Linpack) and larger.[1]

The third supercomputer design in the Blue Gene series, Blue Gene/Q has a peak performance of 20 Petaflops,[37] reaching LINPACK benchmarks performance of 17 Petaflops. Blue Gene/Q continues to expand and enhance the Blue Gene/L and /P architectures.

The Blue Gene/Q Compute chip is an 18 core chip. The 64-bit A2 processor cores are 4-way simultaneously multithreaded, and run at 1.6GHz. Each processor core has a SIMD Quad-vector double precision floating point unit (IBM QPX). 16 Processor cores are used for computing, and a 17th core for operating system assist functions such as interrupts, asynchronous I/O, MPI pacing and RAS. The 18th core is used as a redundant spare, used to increase manufacturing yield. The spared-out core is shut down in functional operation. The processor cores are linked by a crossbar switch to a 32 MB eDRAM L2 cache, operating at half core speed. The L2 cache is multi-versioned, supporting transactional memory and speculative execution, and has hardware support for atomic operations.[38] L2 cache misses are handled by two built-in DDR3 memory controllers running at 1.33GHz. The chip also integrates logic for chip-to-chip communications in a 5D torus configuration, with 2GB/s chip-to-chip links. The Blue Gene/Q chip is manufactured on IBM's copper SOI process at 45nm. It delivers a peak performance of 204.8 GFLOPS at 1.6GHz, drawing about 55 watts. The chip measures 1919mm (359.5mm) and comprises 1.47 billion transistors. The chip is mounted on a compute card along with 16 GB DDR3 DRAM (i.e., 1 GB for each user processor core).[39]

A Q32[40] compute drawer contains 32 compute cards, each water cooled.[41]A "midplane" (crate) contains 16 Q32 compute drawers for a total of 512 compute nodes, electrically interconnected in a 5D torus configuration (4x4x4x4x2). Beyond the midplane level, all connections are optical. Racks have two midplanes, thus 32 compute drawers, for a total of 1024 compute nodes, 16,384 user cores and 16 TB RAM.[41]

Separate I/O drawers, placed at the top of a rack or in a separate rack, are air cooled and contain 8 compute cards and 8 PCIe expansion slots for Infiniband or 10 Gigabit Ethernet networking.[41]

At the time of the Blue Gene/Q system announcement in November 2011, an initial 4-rack Blue Gene/Q system (4096 nodes, 65536 user processor cores) achieved #17 in the TOP500 list[1] with 677.1 TeraFLOPS Linpack, outperforming the original 2007 104-rack BlueGene/L installation described above. The same 4-rack system achieved the top position in the Graph500 list[3] with over 250 GTEPS (giga traversed edges per second). Blue Gene/Q systems also topped the Green500 list of most energy efficient supercomputers with up to 2.1 GFLOPS/W.[2]

In June 2012, Blue Gene/Q installations took the top positions in all three lists: TOP500,[1] Graph500 [3] and Green500.[2]

The following is an incomplete list of Blue Gene/Q installations. Per June 2012, the TOP500 list contained 20 Blue Gene/Q installations of 1/2-rack (512 nodes, 8192 processor cores, 86.35 TFLOPS Linpack) and larger.[1] At a (size-independent) power efficiency of about 2.1 GFLOPS/W, all these systems also populated the top of the June 2012 Green 500 list.[2]

Record-breaking science applications have been run on the BG/Q, the first to cross 10 petaflops of sustained performance. The cosmology simulation framework HACC achieved almost 14 petaflops with a 3.6 trillion particle benchmark run,[61] while the Cardioid code,[62][63] which models the electrophysiology of the human heart, achieved nearly 12 petaflops with a near real-time simulation, both on Sequoia. A fully compressible flow solver has also achieved 14.4 PFLOP/s (originally 11 PFLOP/s) on Sequoia, 72% of the machine's nominal peak performance.[64]

See the rest here:

IBM Blue Gene - Wikipedia

SCP-866 – SCP Foundation

Item #: SCP-866

Object Class: Euclid

Special Containment Procedures: SCP-866 is to be contained in situ in the HPC Center of the University in , . Floor containing SCP-866 is to be permanently sealed off to all but authorized SCP personnel. At least two SCP personnel should monitor the diesel backup generators at all times as a complete power failure could lead to unquantifiable loss of personnel and civilian casualties, unquantifiable loss of equipment, complete loss of acquired experimental data and in the worst case [DATA EXPUNGED]. Access to the input terminals is allowed only with permission of Level 4 Staff. At least two guards should be stationed in the room of SCP-866 and prevent any individual from entering SCP-866 beyond the input terminals. Unauthorized attempts of access should be logged, but due to the location of containment extreme measures should be avoided if possible.

Description: SCP-866 is a Series Supercomputer constructed in 20. Its anomalous properties were discovered when the system proved capable of running computation jobs with more processors than physically available. Subsequent attempts to determine the reason for this behavior have failed, but have caused university employees to disappear. See Addendum 1.1a for details. Foundation operatives determined the system has non-euclidian geometry in the computation node rack topology, possibly a polydimensional n-hypercube structure. This however does not account for the reason for the anomalous computations, only for their speed. An attempt to remove SCP-866 from the power supply has resulted in immediate [DATA EXPUNGED] resulting in displacements and disappearances, including the entire recovery team. See [REDACTED] for additional information. In situ containment measures have been devised.

Addendum 1: SCP-866 has been successfully used by Foundation staff for large-scale simulations and computations. At this time, the limit, if any, to SCP-866 computational capacity is not known. Access to the machine can be made remotely by anyone possessing a student or staff account for the University System. Addition of a [REDACTED] prevents non-Foundation access.

Addendum 1.1a: of the university employees have since been discovered. Prof. has been found in the building's basement by janitorial staff. Analysis of the remains has shown that his death occurred roughly at the same time as the attempt to remove SCP-866 from the power supply. He was found embe[REDACTED]oom wall. Position of the body suggests Prof. was initially alive while in the basement, the words "[illegible] [illegible] died to a rounding error" were written in his own blood. Radar scans of the building's concrete walls are ongoing, but have failed to find anything of note. Research assistant Dr. has been found in Lagrangian point L3 through unrelated observation regarding [REDACTED].

Addendum 2: An analysis of currently running jobs shows that less than 5% of tasks are the result of foundation personnel. This value could not be increased through an increase in jobs submitted, suggesting non-linear relation between job size and machine resources. Attempts to identify the nature of the other jobs has proven so far unsuccessful. Largest observed jobs up to date, still running, are the "TSTWRLD1" to "TSTWRLD4" series submitted by "ao000002" and taking 20% of total machine resources each. Further analysis required.

Addendum 3: Log recovered after attempt to remove from power supply failed.

Addendum 4:Investigation Log of TSTWRLD2 program

UpdateActivity logs have recorded the following output:

Further investigation required. Priority [REDACTED].

See the article here:

SCP-866 - SCP Foundation

TOP500 – Wikipedia

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL,[1] a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

China currently dominates the list with 229 supercomputers, leading the second place (United States) by a record margin of 121. Since June 2018, the American Summit is the world's most powerful supercomputer, reaching 143.5 petaFLOPS on the LINPACK benchmarks.

The TOP500 list is compiled by Jack Dongarra of the University of Tennessee, Knoxville, Erich Strohmaier and Horst Simon of the National Energy Research Scientific Computing Center (NERSC) and Lawrence Berkeley National Laboratory (LBNL), and from 1993 until his death in 2014, Hans Meuer of the University of Mannheim, Germany.

Combined performance of 500 largest supercomputers

Fastest supercomputer

Supercomputer in 500th place

In the early 1990s, a new definition of supercomputer was needed to produce meaningful statistics. After experimenting with metrics based on processor count in 1992, the idea arose at the University of Mannheim to use a detailed listing of installed systems as the basis. In early 1993, Jack Dongarra was persuaded to join the project with his LINPACK benchmarks. A first test version was produced in May 1993, partly based on data available on the Internet, including the following sources:[2][3]

The information from those sources was used for the first two lists. Since June 1993, the TOP500 is produced bi-annually based on site and vendor submissions only.

Since 1993, performance of the No.1 ranked position has grown steadily in accordance with Moore's law, doubling roughly every 14 months. As of June2018[update], Summit was fastest with an Rpeak[6] of 187.6593PFLOPS. For comparison, this is over 1,432,513 times faster than the Connection Machine CM-5/1024 (1,024 cores), which was the fastest system in November 1993 (twenty-five years prior) with an Rpeak of 131.0GFLOPS.[7]

As of November2018[update], all supercomputers on TOP500 are 64-bit, mostly based on x86-64 CPUs (Intel EMT64 and AMD AMD64 instruction set architecture), with few exceptions (all based on reduced instruction set computing (RISC) architectures). Thirteen supercomputers are based on the Power Architecture used by IBM POWER microprocessors and six on Fujitsu-designed SPARC64 chips (one of which the K computer was 1st in 2011 without any GPUs (and is still 3rd on the HPCG list[8]). A further two computers are based on seemingly related Chinese designs: ShenWei and Sunway SW26010 also using Chinese co-processors; the latter ascended to 1st in 2016 (it has since been superseded by an IBM POWER-based system). Further, a few computers use another non-US design, the PEZY-SC (based on ARM[9]) as an accelerator paired with Intel's Xeon.

Two computers which first appeared on the list in 2018 are based on architectures never before seen on the Top500. One was a new x86-64 microarchitecture from Chinese vendor Sugon, using Hygon Dhyana CPUs (a collaboration between AMD and the Chinese, based on Zen) and is ranked 38th,[10] and the other was the first ever ARM-based computer on the list using Cavium ThunderX2 CPUs.[11] Before the ascendancy of 32-bit x86 and later 64-bit x86-64 in the early 2000s, a variety of RISC processor families made up most TOP500 supercomputers, including RISC architectures such as SPARC, MIPS, PA-RISC, and Alpha.

In recent years heterogeneous computing, mostly using Nvidia's graphics processing units (GPU) or Intel's x86-based Xeon Phi as coprocessors, has dominated the TOP500 because of better performance per watt ratios and higher absolute performance; it is almost required to make the top 10; the only recent exception being the aforementioned K computer.

All the fastest supercomputers in the decade since the Earth Simulator supercomputer have used operating systems based on Linux. Since November2017[update], all the listed supercomputers use an operating system based on the Linux kernel.[12][13]

Since November 2015, no computer on the list runs Windows. In November 2014, Windows Azure[14] cloud computer was no longer on the list of fastest supercomputers (its best rank was 165 in 2012), leaving the Shanghai Supercomputer Center's Magic Cube as the only Windows-based supercomputer on the list, until it also dropped off the list. It was ranked 436 in its last appearance on the list released in June 2015, while its best rank was 11 in 2008.[15]

It has been well over a decade since MIPS systems dropped entirely off the list[16] but the Gyoukou supercomputer that jumped to 4th place in November 2017 (after a huge upgrade) has MIPS as a small part of the coprocessors. Use of 2,048-core coprocessors (plus 8 6-core MIPS, for each, that "no longer require to rely on an external Intel Xeon E5 host processor"[17]) make the supercomputer much more energy efficient than the other top 10 (i.e. it is 5th on Green500 and other such ZettaScaler-2.2-based systems take first three spots).[18] At 19.86 million cores, it is by far the biggest system: almost double that of the best manycore system in the TOP500, the Chinese Sunway TaihuLight, ranked 3rd.

Legend:

Numbers below represent the number of computers in the TOP500 that are in each of the listed countries.

By number of systems as of November2018[update][31]:

Note: All operating systems of the TOP500 systems use Linux, but Linux above is generic Linux.

In November 2014, it was announced that the United States was developing two new supercomputers to exceed China's Tianhe-2 in its place as world's fastest supercomputer. The two computers, Sierra and Summit, will each exceed Tianhe-2's 55 peak petaflops. Summit, the more powerful of the two, will deliver 150300 peak petaflops.[32] On 10 April 2015, US government agencies banned selling chips, from Nvidia, to supercomputing centers in China as "acting contrary to the national security ... interests of the United States";[33] and Intel Corporation from providing Xeon chips to China due to their use, according to the US, in researching nuclear weapons research to which US export control law bans US companies from contributing "The Department of Commerce refused, saying it was concerned about nuclear research being done with the machine."[34]

On 29 July 2015, President Obama signed an executive order creating a National Strategic Computing Initiative calling for the accelerated development of an exascale (1000 petaflop) system and funding research into post-semiconductor computing.[35]

In June 2016, Japanese firm Fujitsu announced at the International Supercomputing Conference that its future exascale supercomputer will feature processors of its own design that implement the ARMv8 architecture. The Flagship2020 program, by Fujitsu for RIKEN plans to break the exaflops barrier by 2020 (and "it looks like China and France have a chance to do so and that the United States is content for the moment at least to wait until 2023 to break through the exaflops barrier."[36]) These processors will also implement extensions to the ARMv8 architecture equivalent to HPC-ACE2 that Fujitsu is developing with ARM Holdings.[36]

Inspur has been one of the largest HPC system manufacturer based out of Jinan, China. As of May2017[update], Inspur has become the third manufacturer to have manufactured 64-way system a record which has been previously mastered by IBM and HP. The company has registered over $10B in revenues and have successfully provided a number of HPC systems to countries outside China such as Sudan, Zimbabwe, Saudi Arabia, Venezuela. Inspur was also a major technology partner behind both the supercomputers from China, namely Tianhe-2 and Taihu which lead the top 2 positions of Top500 supercomputer list up to November 2017. Inspur and Supermicro released a few platforms aimed at HPC using GPU such as SR-AI and AGX-2 in May 2017.[37]

Some major systems are not listed on the list. The largest example is the NCSA's Blue Waters which publicly announced the decision not to participate in the list[38] because they do not feel it accurately indicates the ability for any system to be able to do useful work.[39] Other organizations decide not to list systems for security and/or commercial competitiveness reasons. Additional purpose-built machines that are not capable or do not run the benchmark were not included, such as RIKEN MDGRAPE-3 and MDGRAPE-4.

IBM Roadrunner[40] is no longer on the list (nor is any other using the Cell coprocessor, or PowerXCell).

Although Itanium-based systems reached second rank in 2004,[41][42] none now remain.

Similarly (non-SIMD-style) vector processors (NEC-based such as the Earth simulator that was fastest in 2002[43]) have also fallen off the list. Also the Sun Starfire computers that occupied many spots in the past now no longer appear.

The last non-Linux computers on the list the two AIX ones running on POWER7 (in July 2017 ranked 494th and 495th[44] originally 86th and 85th), dropped off the list in November 2017.

See the article here:

TOP500 - Wikipedia

TOP500 – Official Site

Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiBD Project

DK Panda from Ohio State University gave this talk at the 2019 Stanford HPC Conference. "This talk will provide an overview of challenges in designing convergent HPC and BigData software stacks on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit HPC scheduler (SLURM), parallel file systems (Lustre) and NVM-based in-memory technology will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project will be shown."

The post Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiBD Project appeared first on insideHPC.

The University of Chicago Center for Research Informatics is seeking an HPC Systems Administrator in our Job of the Week. "This position will work with the Lead HPC Systems Administrator to build and maintain the BSD High Performance Computing environment, assist life-sciences researchers to utilize the HPC resources, work with stakeholders and research partners to successfully troubleshoot computational applications, handle customer requests, and respond to suggestions for improvements and enhancements from end-users."

The post Job of the Week: HPC Systems Administrator at the University of Chicago Center for Research Informatics appeared first on insideHPC.

Cohosts Addison Snell of Intersect360 Research and Tiffany Trader of HPCwire discuss the executive AI initiative, plus highlights from the HPC-AI Advisory Council Stanford conference and IBM Think event. This Week in HPC is produced by Intersect360 Research and distributed in partnership with HPCwire.

The post This Week in HPC: American Leadership in Artificial Intelligence, and What IBM is Doing About It appeared first on HPCwire.

Feb. 15, 2019 The European High Performance Computing Joint Undertaking (EuroHPC JU) has launched its first calls for expressions of interest, to select the sites that will host the Joint Undertakings first supercomputers (petascale and precursor to exascale machines) in 2020. Supercomputing, also known as high performance computing (HPC), involves thousands of processors working in []

The post EuroHPC Joint Undertaking Takes First Steps Toward Acquiring World-Class Supercomputers appeared first on HPCwire.

What can you do with 381,392 CPU cores? For Cineca, it means enabling computational scientists to expand a large part of the worlds body of knowledge from the nanoscale to the astronomic, from calculating quantum effects in new materials to supporting bioinformatics for advanced healthcare research to screening millions of possible chemical combinations to attack []

The post Insights from Optimized Codes on Cinecas Marconi appeared first on HPCwire.

The European High Performance Computing Joint Undertaking (EuroHPC JU) has launched its first calls for expressions of interest, to select the sites that will host the Joint Undertakings first supercomputers (petascale and precursor to exascale machines) in 2020. "Deciding where Europe will host its most powerful petascale and precursor to exascale machines is only the first step in this great European initiative on high performance computing," saidMariya Gabriel, Commissioner for Digital Economy and Society. "Regardless of where users are located in Europe, these supercomputers will be used in more than 800 scientific and industrial application fields for the benefit of European citizens."

The post EuroHPC Takes First Steps Towards Exascale appeared first on insideHPC.

Quantum computing hardware tends to garner the lions share of the attention from the press, but its the software toolkits for these devices that will be key to moving this technology out of the research lab.

A Multi-Faceted Toolkit for Quantum Computing was written by Michael Feldman at .

To a certain way of looking at it, Nvidia has always been engaged in the high performance computing business and it has always been subject to the same kinds of cyclical waves that affect makers of supercomputers and enterprise systems.

The Computing Needs Of Earth Are Not Yet Satisfied was written by Timothy Prickett Morgan at .

Earlier in this decade, when the hyperscalers and the academics that run with them were building machine learning frameworks to transpose all kinds of data from one format to another speech to text, text to speech, image to text, video to text, and so on they were doing so not just for scientific curiosity.

IBM Mashes Up PowerAI And Watson Machine Learning Stacks was written by Timothy Prickett Morgan at .

More:

TOP500 - Official Site

How do supercomputers work? – Explain that Stuff

by Chris Woodford. Last updated: June 26, 2018.

Roll back time a half-century or so and thesmallest computer in the world was a gargantuan machine that filled aroom. When transistors andintegrated circuits were developed,computers could pack the same power into microchips as big as yourfingernail. So what if you build a room-sized computer today and fillit full of those same chips? What you get is a supercomputeracomputer that's millions of times faster than a desktop PC andcapable of crunching the world's most complex scientific problems.What makes supercomputers different from the machine you're usingright now? Let's take a closer look!

Photo: This is Titan, a supercomputer based at Oak Ridge National Laboratory. At the time of writing in 2018, it's the world's seventh most powerful machine(it was the third most powerful in 2017). The world's current most powerful machine, Summit, is over five times better!Picture courtesy of Oak Ridge National Laboratory, US Department of Energy, published onFlickr in 2012under a Creative Commons Licence.

Before we make a start on that question, it helpsif we understand what a computer is:it's a general-purpose machine thattakes in information (data) by a process called input, stores andprocesses it, and then generates some kind of output (result). Asupercomputer is not simply a fast or very large computer: it worksin an entirely different way, typically using parallel processinginstead of the serial processing that an ordinary computer uses.Instead of doing one thing at a time, it does many things at once.

Chart: Who has the most supercomputers? Almost 90 percent of the world's 500 most powerful machines can be found in just six countries: China, the USA, Japan, Germany, France, and the UK. Drawn in January 2018 using the latest data from TOP500, November 2017.

What's the difference between serial and parallel? An ordinary computer doesone thing at a time, so it does things in a distinct series ofoperations; that's called serial processing. It's a bit like aperson sitting at a grocery store checkout, picking up items from theconveyor belt, running them through the scanner, and then passingthem on for you to pack in your bags. It doesn't matter how fast youload things onto the belt or how fast you pack them: the speed atwhich you check out your shopping is entirely determined by how fastthe operator can scan and process the items, which is always one at atime. (Since computers first appeared, most have worked by simple, serial processing,inspired by a basic theoretical design called a Turing machine,originally conceived by Alan Turing.)

A typical modern supercomputer works much morequickly by splitting problems into pieces and working on manypieces at once, which is called parallel processing.It's like arriving at the checkout with a giant cart full of items, butthen splitting your items up between several different friends. Eachfriend can go through a separate checkout with a few of the items andpay separately. Once you've all paid, you can get together again,load up the cart, and leave. The more items there are and the morefriends you have, the faster it gets to do things by parallelprocessingat least, in theory. Parallel processing is more like what happens in our brains.

Artwork: Serial and parallel processing: Top: In serial processing, a problem is tackled one step at a time by a single processor. It doesn't matter how fast different parts of the computer are (such as the input/output or memory), the job still gets done at the speed of the central processor in the middle.Bottom: In parallel processing, problems are broken up into components, each of which is handled by a separate processor. Since the processors are working in parallel, the problem is usually tackled more quickly even if the processors work at the same speed as the one in a serial system.

Most of us do quite trivial, everyday things withour computers that don't tax them in any way: looking at web pages,sending emails, and writing documents use very little of theprocessing power in a typical PC. But if you try to do something morecomplex, like changing the colors on a very large digital photograph,you'll know that your computer does, occasionally, have to work hardto do things: it can take a minute or so to do really complexoperations on very large digital photos. If you play computer games, you'll beaware that you need a computer with a fast processor chip and quite alot of "working memory" (RAM), or things really slow down. Add afaster processor or double the memory and your computer will speed updramaticallybut there's still a limit to how fast it will go: oneprocessor can generally only do one thing at a time.

Now suppose you're a scientist charged withforecasting the weather, testing a new cancer drug, or modeling howthe climate might be in 2050. Problems like that push even theworld's best computers to the limit. Just like you can upgrade adesktop PC with a better processor and more memory, so you can do thesame with a world-class computer. But there's still a limit to howfast a processor will work and there's only so much difference morememory will make. The best way to make a difference is to useparallel processing: add more processors, split your problem intochunks, and get each processor working on a separate chunk of yourproblem in parallel.

Once computer scientists had figured out the basicidea of parallel processing, it made sense to add more and moreprocessors: why have a computer with two or three processors when youcan have one with hundreds or even thousands? Since the 1990s,supercomputers have routinely used many thousands of processors in what'sknown as massively parallel processing; at the time I'mupdating this, in June 2018, the supercomputer with more processorsthan any other in the world, the Sunway TaihuLight, has around 40,960 processing modules,each with 260 processor cores, which means 10,649,600 processor cores in total!

Unfortunately, parallel processing comes with abuilt-in drawback. Let's go back to the supermarket analogy. If youand your friends decide to split up your shopping to go throughmultiple checkouts at once, the time you save by doing this isobviously reduced by the time it takes you to go your separate ways,figure out who's going to buy what, and come together again at the end. We can guess, intuitively, thatthe more processors there are in a supercomputer, the harder it will probably be tobreak up problems and reassemble them to make maximum efficient use of parallel processing. Moreover,there will need to be some sort of centralized management system or coordinator to split the problems, allocate and control the workload between all the different processors, and reassemble the results, which will also carry an overhead.

With a simple problem like paying for a cart of shopping, that's not really an issue. But imagineif your cart contains a billion items and you have 65,000 friends helping you with the checkout.If you have a problem (like forecasting the world's weather for next week) that seems to split neatly into separate sub-problems(making forecasts for each separate country), that's one thing. Computer scientists refer to complex problems like this, which can be split up easily into independent pieces, as embarrassingly parallel computations (EPC)becausethey are trivially easy to divide.

But most problems don't cleave neatly that way. The weather in one country depends to a great extent on the weather inother places, so making a forecast for one country will need to take account of forecasts elsewhere. Often, the parallel processorsin a supercomputer will need to communicate with one another as they solve their own bits of the problems. Or one processor might have to wait for results from another before it can do a particular job. A typical problem worked on by a massively parallel computerwill thus fall somewhere between the two extremes of a completely serial problem (where every single step has to be done in an exact sequence) and an embarrassingly parallel one; while some parts can be solved in parallel, other parts will need to be solved in a serial way. A law of computing (known as Amdahl's law, for computer pioneer Gene Amdahl), explains how the part of the problem that remains serial effectively determines the maximum improvement in speed you can get from using a parallel system.

You can make a supercomputer by filling a giantbox with processors and getting them to cooperate on tackling acomplex problem through massively parallel processing. Alternatively,you could just buy a load of off-the-shelf PCs, put them in the sameroom, and interconnect them using a very fast local areanetwork (LAN) so they work in a broadly similar way. That kind ofsupercomputer is called a cluster.Google does its web searches for users with clusters ofoff-the-shelf computers dotted in data centers around the world.

Photo: Supercomputer cluster:NASA'sPleiades ICE Supercomputer is a cluster of 112,896 cores made from185 racks of Silicon Graphics (SGI) workstations. Picture by Dominic Hart courtesy ofNASA Ames Research Center.

A grid is a supercomputer similar to acluster (in that it's made up of separate computers), but thecomputers are in different places and connected through the Internet(or other computer networks). This is an example of distributedcomputing, which means that the power of a computer is spread across multiple locationsinstead of being located in one, single place (that's sometimes called centralized computing).

Grid super computing comes in two main flavors. Inone kind, we might have, say, a dozen powerful mainframe computers inuniversities linked together by a network to form a supercomputergrid. Not all the computers will be actively working in the grid allthe time, but generally we know which computers make up the network.The CERN Worldwide LHC Computing Grid, assembled to process data from the LHC (Large Hadron Collider) particle accelerator, is an example of this kind of system. It consists of two tiers of computer systems, with 11 major (tier-1) computer centers linked directlyto the CERN laboratory by private networks, which are themselves linked to 160 smaller (tier-2) computer centers around the world(mostly in universities and other research centers), using a combination of the Internet and private networks.

The other kind of grid is much more ad-hoc andinformal and involves far more individual computerstypicallyordinary home computers. Have you ever taken part in an onlinecomputing project such asSETI@home,GIMPS,FightAIDS@home,Folding@home,MilkyWay@home,or ClimatePrediction.net?If so, you've allowed your computer to be used as part of an informal,ad-hoc supercomputer grid. This kind of approach is calledopportunistic supercomputing, because it takes advantage of whatevercomputers just happen to be available at the time. Grids like this,which are linked using the Internet, are best for solvingembarrassingly parallel problems that easily break up intocompletely independent chunks.

You might be surprised to discover that mostsupercomputers run fairly ordinary operating systems much like theones running on your own PC, although that's less surprising whenwe remember that a lot of modern supercomputers are actually clusters of off-the-shelf computersor workstations. The most common supercomputer operating system used tobe Unix, but it's now been superseded by Linux (an open-source,Unix-like operating system originally developed by Linus Torvalds andthousands of volunteers). Since supercomputers generally work onscientific problems, their application programs are sometimes written in traditional scientific programming languagessuch as Fortran, as well as popular, more modern languages such asC and C++.

Photo: Supercomputers can help us crack the most complex scientific problems, including modeling Earth's climate. Picture courtesy of NASA on the Commons.

As we saw at the start of this article, oneessential feature of a computer is that it's a general-purposemachine you can use in all kinds of different ways: you can sendemails on a computer, play games, edit photos, or do any number ofother things simply by running a different program. If you're usinga high-end cellphone, such as an Android phone or an iPhoneor an iPod Touch, what you have is a powerful little pocket computer that can run programs by loading different "apps"(applications), which are simply computer programs by another name. Supercomputers are slightly different.

Typically, supercomputers have been used forcomplex, mathematically intensive scientific problems, includingsimulating nuclear missile tests, forecasting the weather, simulatingthe climate, and testing the strength of encryption (computersecurity codes). In theory, a general-purpose supercomputer can beused for absolutely anything.

While some supercomputers are general-purposemachines that can be used for a wide variety of different scientificproblems, some are engineered to do very specific jobs. Two of themost famous supercomputers of recent times were engineered this way.IBM's Deep Blue machine from 1997 was built specifically to playchess (against Russian grand master Gary Kasparov), while its laterWatson machine (named for IBM's founder, Thomas Watson, and his son) was engineered to play the game Jeopardy. Specially designed machines likethis can be optimized for particular problems; so, for example, DeepBlue would have been designed to search through huge databases ofpotential chess moves and evaluate which move was best in aparticular situation, while Watson was optimized to analyze trickygeneral-knowledge questions phrased in (natural)human language.

Look through the specifications of ordinarycomputers and you'll find their performance is usually quoted inMIPS (million instructions per second),which is how many fundamental programming commands (read, write, store, and so on) the processor can manage. It's easy tocompare two PCs by comparing the number of MIPS they can handle (or even their processor speed, which is typically rated in gigahertz orGHz).

Supercomputers are rated a different way. Sincethey're employed in scientific calculations, they're measuredaccording to how many floating point operations per second (FLOPS)they can do, which is a more meaningful measurement based on what they're actually trying to do(unlike MIPS, which is a measurement of how they are trying to do it). Since supercomputers were first developed, theirperformance has been measured in successively greater numbers of FLOPS, as the table below illustrates:

The example machines listed in the table are described in more detail in the chronology, below.

Read the original:

How do supercomputers work? - Explain that Stuff

Supercomputer – Wikipedia

A supercomputer is a computer with a high level of performance compared to a general-purpose computer. Performance of a supercomputer is measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). As of 2017, there are supercomputers which can perform up to nearly a hundred quadrillion FLOPS.[3] As of November 2017, all of the world's fastest 500 supercomputers run Linux-based operating systems.[4] Additional research is being conducted in China, the United States, the European Union, Taiwan and Japan to build even faster, more powerful and more technologically superior exascale supercomputers.[5]

Supercomputers play an important role in the field of computational science, and are used for a wide range of computationally intensive tasks in various fields, including quantum mechanics, weather forecasting, climate research, oil and gas exploration, molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), and physical simulations (such as simulations of the early moments of the universe, airplane and spacecraft aerodynamics, the detonation of nuclear weapons, and nuclear fusion). Throughout their history, they have been essential in the field of cryptanalysis.[6]

Supercomputers were introduced in the 1960s, and for several decades the fastest were made by Seymour Cray at Control Data Corporation (CDC), Cray Research and subsequent companies bearing his name or monogram. The first such machines were highly tuned conventional designs that ran faster than their more general-purpose contemporaries. Through the 1960s, they began to add increasing amounts of parallelism with one to four processors being typical. From the 1970s, the vector computing concept with specialized math units operating on large arrays of data came to dominate. A notable example is the highly successful Cray-1 of 1976. Vector computers remained the dominant design into the 1990s. From then until today, massively parallel supercomputers with tens of thousands of off-the-shelf processors became the norm.[7][8]

The US has long been a leader in the supercomputer field, first through Cray's almost uninterrupted dominance of the field, and later through a variety of technology companies. Japan made major strides in the field in the 1980s and 90s, but since then China has become increasingly active in the field. As of June 2018, the fastest supercomputer on the TOP500 supercomputer list is the Summit, in the United States, with a LINPACK benchmark score of 122.3PFLOPS, exceeding the previous record holder, Sunway TaihuLight, by around 29PFLOPS.[3][9] Sunway TaihuLight's is notable for its use of indigenous chips and is the first Chinese computer to enter the TOP500 list without using hardware from the United States. As of June 2018, China had more computers (206) on the TOP500 list than the United States (124); however, US built computers held eight of the top 20 positions;[10][11] the U.S. has six of the top 10 and China has two.

The history of supercomputing goes back to the 1960s, with the Atlas at the University of Manchester, the IBM 7030 Stretch and a series of computers at Control Data Corporation (CDC), designed by Seymour Cray. These used innovative designs and parallelism to achieve superior computational peak performance.[12]

The Atlas was a joint venture between Ferranti and the Manchester University and was designed to operate at processing speeds approaching onemicrosecond per instruction, about onemillion instructions per second.[13] The first Atlas was officially commissioned on 7 December 1962 as one of the world's first supercomputers considered to be the most powerful computer in the world at that time by a considerable margin, and equivalent to four IBM 7094s.[14]

For the CDC 6600 (which Cray designed) released in 1964, a switch from using germanium to silicon transistors was implemented, as they could run very fast, solving the overheating problem by introducing refrigeration,[15] and helped to make it the fastest in the world. Given that the 6600 outperformed all the other contemporary computers by about 10 times, it was dubbed a supercomputer and defined the supercomputing market, when one hundred computers were sold at $8 million each.[16][17][18][19]

Cray left CDC in 1972 to form his own company, Cray Research.[17] Four years after leaving CDC, Cray delivered the 80MHz Cray-1 in 1976, and it became one of the most successful supercomputers in history.[20][21] The Cray-2 released in 1985 was an 8 processor liquid cooled computer and Fluorinert was pumped through it as it operated. It performed at 1.9 gigaFLOPS and was the world's second fastest after M-13 supercomputer in Moscow .[22]

In 1982, Osaka University's LINKS-1 Computer Graphics System used a massively parallel processing architecture, with 514 microprocessors, including 257 Zilog Z8001 control processors and 257 iAPX 86/20 floating-point processors. It was mainly used for rendering realistic 3D computer graphics.[23]

While the supercomputers of the 1980s used only a few processors, in the 1990s, machines with thousands of processors began to appear in Japan and the United States, setting new computational performance records. Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994 with a peak speed of 1.7gigaFLOPS (GFLOPS) per processor.[24][25] The Hitachi SR2201 obtained a peak performance of 600GFLOPS in 1996 by using 2048 processors connected via a fast three-dimensional crossbar network.[26][27][28] The Intel Paragon could have 1000 to 4000 Intel i860 processors in various configurations and was ranked the fastest in the world in 1993. The Paragon was a MIMD machine which connected processors via a high speed two dimensional mesh, allowing processes to execute on separate nodes, communicating via the Message Passing Interface.[29]

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s.[citation needed]

Early supercomputer architectures pioneered by Seymour Cray relied on compact designs and local parallelism to achieve superior computational performance.[12] Cray had noted that increasing processor speeds did little if the rest of the system did not also improve; the CPU would end up waiting longer for data to arrive from the offboard storage units. The CDC 6600, the first mass-produced supercomputer, solved this problem by providing ten simple computers whose only purpose was to read and write data to and from main memory, allowing the CPU to concentrate solely on processing the data. This made both the main CPU and the ten "PPU" units much simpler. As such, they were physically smaller and reduced the amount of wiring between the various parts. This reduced the electrical signaling delays and allowed the system to run at a higher clock speed. The 6600 outperformed all other machines by an average of 10 times when it was introduced.

The CDC 6600's spot as the fastest computer was eventually replaced by its successor, the CDC 7600. This design was very similar to the 6600 in general organization but added instruction pipelining to further improve performance. Generally speaking, every computer instruction required several steps to process; first, the instruction is read from memory, then any required data it refers to is read, the instruction is processed, and the results are written back out to memory. Each of these steps is normally accomplished by separate circuitry. In most early computers, including the 6600, each of these steps runs in turn, and while any one unit is currently active, the hardware handling the other parts of the process is idle. In the 7600, as soon as one instruction cleared a particular unit, that unit began processing the next instruction. Although each instruction takes the same time to complete, there are parts of several instructions being processed at the same time, offering much-improved overall performance. This, combined with further packaging improvements and improvements in the electronics, made the 7600 about four to ten times as fast as the 6600.

The 7600 was intended to be replaced by the CDC 8600, which was essentially four 7600's in a small box. However, this design ran into intractable problems and was eventually canceled in 1974 in favor of another CDC design, the CDC STAR-100. The STAR was essentially a simplified and slower version of the 7600, but it was combined with new circuits that could rapidly process sequences of math instructions. The basic idea was similar to the pipeline in the 7600 but geared entirely toward math, and in theory, much faster. In practice, the STAR proved to have poor real-world performance, and ultimately only two or three were built.

Cray, meanwhile, had left CDC and formed his own company. Considering the problems with the STAR, he designed an improved version of the same basic concept but replaced the STAR's memory-based vectors with ones that ran in large registers. Combining this with his famous packaging improvements produced the Cray-1. This outperformed every computer in the world and would ultimately sell about 80 units, making it one of the most successful supercomputer systems in history. Through the 1970s, 80s, and 90s a series of machines from Cray further improved on these basic concepts.

The basic concept of using a pipeline dedicated to processing large data units became known as vector processing, and came to dominate the supercomputer field. A number of Japanese firms also entered the field, producing similar concepts in much smaller machines. Three main lines were produced by these companies, the Fujitsu VP, Hitachi HITAC and NEC SX series, all announced in the early 1980s and updated continually into the 1990s. CDC attempted to re-enter this market with the ETA10 but this was not very successful. Convex Computer took another route, introducing a series of much smaller vector machines aimed at smaller businesses.

The only computer to seriously challenge the Cray-1's performance in the 1970s was the ILLIAC IV. This machine was the first realized example of a true massively parallel computer, in which many processors worked together to solve different parts of a single larger problem. In contrast with the vector systems, which were designed to run a single stream of data as quickly as possible, in this concept, the computer instead feeds separate parts of the data to entirely different processors and then recombines the results. The ILLIAC's design was finalized in 1966 with 256 processors and offer speed up to 1 GFLOPS, compared to the 1970s Cray-1's peak of 250 MFLOPS. However, development problems led to only 64 processors being built, and the system could never operate faster than about 200 MFLOPS while being much larger and more complex than the Cray. Another problem was that writing software for the system was difficult, and getting peak performance from it was a matter of serious effort.

But the partial success of the ILLIAC IV was widely seen as pointing the way to the future of supercomputing. Cray argued against this, famously quipping that "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?"[30] But by the early 1980s, several teams were working on parallel designs with thousands of processors, notably the Connection Machine (CM) that developed from research at MIT. The CM-1 used as many as 65,536 simplified custom microprocessors connected together in a network to share data. Several updated versions followed; the CM-5 supercomputer is a massively parallel processing computer capable of many billions of arithmetic operations per second.[31]

Software development remained a problem, but the CM series sparked off considerable research into this issue. Similar designs using custom hardware were made by many companies, including the Evans & Sutherland ES-1, MasPar, nCUBE, Intel iPSC and the Goodyear MPP. But by the mid-1990s, general-purpose CPU performance had improved so much in that a supercomputer could be built using them as the individual processing units, instead of using custom chips. By the turn of the 21st century, designs featuring tens of thousands of commodity CPUs were the norm, with later machines adding graphic units to the mix.[7][8]

Throughout the decades, the management of heat density has remained a key issue for most centralized supercomputers.[32][33][34] The large amount of heat generated by a system may also have other effects, e.g. reducing the lifetime of other system components.[35] There have been diverse approaches to heat management, from pumping Fluorinert through the system, to a hybrid liquid-air cooling system or air cooling with normal air conditioning temperatures.[36][37]

Systems with a massive number of processors generally take one of two paths. In the grid computing approach, the processing power of many computers, organised as distributed, diverse administrative domains, is opportunistically used whenever a computer is available.[38] In another approach, a large number of processors are used in proximity to each other, e.g. in a computer cluster. In such a centralized massively parallel system the speed and flexibility of the interconnect becomes very important and modern supercomputers have used various approaches ranging from enhanced Infiniband systems to three-dimensional torus interconnects.[39][40] The use of multi-core processors combined with centralization is an emerging direction, e.g. as in the Cyclops64 system.[41][42]

As the price, performance and energy efficiency of general purpose graphic processors (GPGPUs) have improved,[43] a number of petaFLOPS supercomputers such as Tianhe-I and Nebulae have started to rely on them.[44] However, other systems such as the K computer continue to use conventional processors such as SPARC-based designs and the overall applicability of GPGPUs in general-purpose high-performance computing applications has been the subject of debate, in that while a GPGPU may be tuned to score well on specific benchmarks, its overall applicability to everyday algorithms may be limited unless significant effort is spent to tune the application towards it.[45][46] However, GPUs are gaining ground and in 2012 the Jaguar supercomputer was transformed into Titan by retrofitting CPUs with GPUs.[47][48][49]

High-performance computers have an expected life cycle of about three years before requiring an upgrade.[50]

A number of "special-purpose" systems have been designed, dedicated to a single problem. This allows the use of specially programmed FPGA chips or even custom ASICs, allowing better price/performance ratios by sacrificing generality. Examples of special-purpose supercomputers include Belle,[51] Deep Blue,[52] and Hydra,[53] for playing chess, Gravity Pipe for astrophysics,[54] MDGRAPE-3 for protein structure computationmolecular dynamics[55] and Deep Crack,[56] for breaking the DES cipher.

A typical supercomputer consumes large amounts of electrical power, almost all of which is converted into heat, requiring cooling. For example, Tianhe-1A consumes 4.04megawatts (MW) of electricity.[57] The cost to power and cool the system can be significant, e.g. 4MW at $0.10/kWh is $400 an hour or about $3.5 million per year.

Heat management is a major issue in complex electronic devices and affects powerful computer systems in various ways.[58] The thermal design power and CPU power dissipation issues in supercomputing surpass those of traditional computer cooling technologies. The supercomputing awards for green computing reflect this issue.[59][60][61]

The packing of thousands of processors together inevitably generates significant amounts of heat density that need to be dealt with. The Cray 2 was liquid cooled, and used a Fluorinert "cooling waterfall" which was forced through the modules under pressure.[36] However, the submerged liquid cooling approach was not practical for the multi-cabinet systems based on off-the-shelf processors, and in System X a special cooling system that combined air conditioning with liquid cooling was developed in conjunction with the Liebert company.[37]

In the Blue Gene system, IBM deliberately used low power processors to deal with heat density.[62]The IBM Power 775, released in 2011, has closely packed elements that require water cooling.[63] The IBM Aquasar system uses hot water cooling to achieve energy efficiency, the water being used to heat buildings as well.[64][65]

The energy efficiency of computer systems is generally measured in terms of "FLOPS per watt". In 2008, IBM's Roadrunner operated at 3.76MFLOPS/W.[66][67] In November 2010, the Blue Gene/Q reached 1,684MFLOPS/W.[68][69] In June 2011 the top 2 spots on the Green 500 list were occupied by Blue Gene machines in New York (one achieving 2097MFLOPS/W) with the DEGIMA cluster in Nagasaki placing third with 1375MFLOPS/W.[70]

Because copper wires can transfer energy into a supercomputer with much higher power densities than forced air or circulating refrigerants can remove waste heat,[71]the ability of the cooling systems to remove waste heat is a limiting factor.[72][73]As of 2015[update], many existing supercomputers have more infrastructure capacity than the actual peak demand of the machine designers generally conservatively design the power and cooling infrastructure to handle more than the theoretical peak electrical power consumed by the supercomputer. Designs for future supercomputers are power-limited the thermal design power of the supercomputer as a whole, the amount that the power and cooling infrastructure can handle, is somewhat more than the expected normal power consumption, but less than the theoretical peak power consumption of the electronic hardware.[74]

Since the end of the 20th century, supercomputer operating systems have undergone major transformations, based on the changes in supercomputer architecture.[75] While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been to move away from in-house operating systems to the adaptation of generic software such as Linux.[76]

Since modern massively parallel supercomputers typically separate computations from other services by using multiple types of nodes, they usually run different operating systems on different nodes, e.g. using a small and efficient lightweight kernel such as CNK or CNL on compute nodes, but a larger system such as a Linux-derivative on server and I/O nodes.[77][78][79]

While in a traditional multi-user computer system job scheduling is, in effect, a tasking problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources, as well as gracefully deal with inevitable hardware failures when tens of thousands of processors are present.[80]

Although most modern supercomputers use the Linux operating system, each manufacturer has its own specific Linux-derivative, and no industry standard exists, partly due to the fact that the differences in hardware architectures require changes to optimize the operating system to each hardware design.[75][81]

The parallel architectures of supercomputers often dictate the use of special programming techniques to exploit their speed. Software tools for distributed processing include standard APIs such as MPI and PVM, VTL, and open source-based software solutions such as Beowulf.

In the most common scenario, environments such as PVM and MPI for loosely connected clusters and OpenMP for tightly coordinated shared memory machines are used. Significant effort is required to optimize an algorithm for the interconnect characteristics of the machine it will be run on; the aim is to prevent any of the CPUs from wasting time waiting on data from other nodes. GPGPUs have hundreds of processor cores and are programmed using programming models such as CUDA or OpenCL.

Moreover, it is quite difficult to debug and test parallel programs. Special techniques need to be used for testing and debugging such applications.

Opportunistic Supercomputing is a form of networked grid computing whereby a "super virtual computer" of many loosely coupled volunteer computing machines performs very large computing tasks. Grid computing has been applied to a number of large-scale embarrassingly parallel problems that require supercomputing performance scales. However, basic grid and cloud computing approaches that rely on volunteer computing cannot handle traditional supercomputing tasks such as fluid dynamic simulations.

The fastest grid computing system is the distributed computing project Folding@home (F@h). F@h reported 101 PFLOPS of x86 processing power As of October2016[update]. Of this, over 100 PFLOPS are contributed by clients running on various GPUs, and the rest from various CPU systems.[83]

The Berkeley Open Infrastructure for Network Computing (BOINC) platform hosts a number of distributed computing projects. As of February2017[update], BOINC recorded a processing power of over 166 PetaFLOPS through over 762 thousand active Computers (Hosts) on the network.[84]

As of October2016[update], Great Internet Mersenne Prime Search's (GIMPS) distributed Mersenne Prime search achieved about 0.313 PFLOPS through over 1.3 million computers.[85] The Internet PrimeNet Server supports GIMPS's grid computing approach, one of the earliest and most successful[citation needed] grid computing projects, since 1997.

Quasi-opportunistic supercomputing is a form of distributed computing whereby the super virtual computer of many networked geographically disperse computers performs computing tasks that demand huge processing power.[86] Quasi-opportunistic supercomputing aims to provide a higher quality of service than opportunistic grid computing by achieving more control over the assignment of tasks to distributed resources and the use of intelligence about the availability and reliability of individual systems within the supercomputing network. However, quasi-opportunistic distributed execution of demanding parallel computing software in grids should be achieved through implementation of grid-wise allocation agreements, co-allocation subsystems, communication topology-aware allocation mechanisms, fault tolerant message passing libraries and data pre-conditioning.[86]

Cloud Computing with its recent and rapid expansions and development have grabbed the attention of HPC users and developers in recent years. Cloud Computing attempts to provide HPC-as-a-Service exactly like other forms of services currently available in the Cloud such as Software-as-a-Service, Platform-as-a-Service, and Infrastructure-as-a-Service. HPC users may benefit from the Cloud in different angles such as scalability, resources being on-demand, fast, and inexpensive. On the other hand, moving HPC applications have a set of challenges too. Good examples of such challenges are virtualization overhead in the Cloud, multi-tenancy of resources, and network latency issues. Much research[87][88][89][90] is currently being done to overcome these challenges and make HPC in the cloud a more realistic possibility.

Supercomputers generally aim for the maximum in capability computing rather than capacity computing. Capability computing is typically thought of as using the maximum computing power to solve a single large problem in the shortest amount of time. Often a capability system is able to solve a problem of a size or complexity that no other computer can, e.g., a very complex weather simulation application.[91]

Capacity computing, in contrast, is typically thought of as using efficient cost-effective computing power to solve a few somewhat large problems or many small problems.[91] Architectures that lend themselves to supporting many users for routine everyday tasks may have a lot of capacity but are not typically considered supercomputers, given that they do not solve a single very complex problem.[91]

In general, the speed of supercomputers is measured and benchmarked in "FLOPS" (FLoating point Operations Per Second), and not in terms of "MIPS" (Million Instructions Per Second), as is the case with general-purpose computers.[92] These measurements are commonly used with an SI prefix such as tera-, combined into the shorthand "TFLOPS" (1012 FLOPS, pronounced teraflops), or peta-, combined into the shorthand "PFLOPS" (1015 FLOPS, pronounced petaflops.) "Petascale" supercomputers can process one quadrillion (1015) (1000 trillion) FLOPS. Exascale is computing performance in the exaFLOPS (EFLOPS) range. An EFLOPS is one quintillion (1018) FLOPS (one million TFLOPS).

No single number can reflect the overall performance of a computer system, yet the goal of the Linpack benchmark is to approximate how fast the computer solves numerical problems and it is widely used in the industry.[93] The FLOPS measurement is either quoted based on the theoretical floating point performance of a processor (derived from manufacturer's processor specifications and shown as "Rpeak" in the TOP500 lists), which is generally unachievable when running real workloads, or the achievable throughput, derived from the LINPACK benchmarks and shown as "Rmax" in the TOP500 list.[94] The LINPACK benchmark typically performs LU decomposition of a large matrix.[95] The LINPACK performance gives some indication of performance for some real-world problems, but does not necessarily match the processing requirements of many other supercomputer workloads, which for example may require more memory bandwidth, or may require better integer computing performance, or may need a high performance I/O system to achieve high levels of performance.[93]

Since 1993, the fastest supercomputers have been ranked on the TOP500 list according to their LINPACK benchmark results. The list does not claim to be unbiased or definitive, but it is a widely cited current definition of the "fastest" supercomputer available at any given time.

This is a recent list of the computers which appeared at the top of the TOP500 list,[96] and the "Peak speed" is given as the "Rmax" rating.

Source: TOP500

In 2018 Lenovo became the worlds largest provider (117) for the top500 supercomputers.[97]

The stages of supercomputer application may be summarized in the following table:

The IBM Blue Gene/P computer has been used to simulate a number of artificial neurons equivalent to approximately one percent of a human cerebral cortex, containing 1.6 billion neurons with approximately 9 trillion connections. The same research group also succeeded in using a supercomputer to simulate a number of artificial neurons equivalent to the entirety of a rat's brain.[104]

Modern-day weather forecasting also relies on supercomputers. The National Oceanic and Atmospheric Administration uses supercomputers to crunch hundreds of millions of observations to help make weather forecasts more accurate.[105]

In 2011, the challenges and difficulties in pushing the envelope in supercomputing were underscored by IBM's abandonment of the Blue Waters petascale project.[106]

The Advanced Simulation and Computing Program currently uses supercomputers to maintain and simulate the United States nuclear stockpile.[107]

Currently, China, the United States, the European Union, and others are competing to be the first to create a 1 exaFLOP (1018 or one quintillion FLOPS) supercomputer, with estimates of completion ranging from 2019 to 2022.[108]

Erik P. DeBenedictis of Sandia National Laboratories theorizes that a zettaFLOPS (1021 or one sextillion FLOPS) computer is required to accomplish full weather modeling, which could cover a two-week time span accurately.[109][110][111] Such systems might be built around 2030.[112]

Many Monte Carlo simulations use the same algorithm to process a randomly generated data set; particularly, integro-differential equations describing physical transport processes, the random paths, collisions, and energy and momentum depositions of neutrons, photons, ions, electrons, etc. The next step for microprocessors may be into the third dimension; and specializing to Monte Carlo, the many layers could be identical, simplifying the design and manufacture process.[113]

There are several international efforts to understand how supercomputing will develop over the next decade. The ETP4HPC Strategic Research Agenda (SRA) outlines a technology roadmap for exascale in Europe.[114] The Eurolab4HPC Vision provides a long-term roadmap (20232030) for academic excellence in HPC[115].

High performance supercomputers usually require high energy, as well. However, Iceland may be a benchmark for the future with the world's first zero-emission supercomputer. Located at the Thor Data Center in Reykjavik, Iceland, this supercomputer relies on completely renewable sources for its power rather than fossil fuels. The colder climate also reduces the need for active cooling, making it one of the greenest facilities in the world of computers.[116]

Many science-fiction writers have depicted supercomputers in their works, both before and after the historical construction of such computers. Much of such fiction deals with the relations of humans with the computers they build and with the possibility of conflict eventually developing between them. Some scenarios of this nature appear on the AI-takeover page.

Examples of supercomputers in fiction include HAL-9000, Multivac, The Machine Stops, GLaDOS, The Evitable Conflict and Vulcan's Hammer.

More:

Supercomputer - Wikipedia

Supercomputing: This project plans one of the world’s …

One of the biggest Arm-based supercomputing installations in the world is being built across three clusters at UK universities.

The three supercomputer clusters are located at the Edinburgh Parallel Computing Centre (EPCC) at the University of Edinburgh, the University of Bristol and the University of Leicester, and will run more than 12,000 Arm-based cores, hosted by HPE Apollo 70 HPC (High-Performance Computing) systems.

The HPC clusters at each university will consist of 64 HPE Apollo 70 systems, each equipped with two 32-core Cavium ThunderX2 processors and 128GB of RAM comprising 16 DDR4 DIMMs with Mellanox InfiniBand interconnects. The systems will run SUSE Linux Enterprise Server for HPC. Each cluster is expected to occupy two computer racks and consume 30KW of power.

The installation is due to be completed in summer 2018, and is being built and supported by Hewlett Packard Enterprise (HPE). When completed, HPE says it will be one of the largest Arm-based HPC installations in the world. The cost of the systems is not being disclosed.

Arm is attempting to push further into the high-performance computing market, which is currently dominated by the likes of Intel and Nvidia: late last year Cray said it was building the world's first production-ready Arm-based supercomputer also using the Cavium ThunderX2 processors, based on 64-bit Armv8-A architecture.

Mark Parsons, director of the Edinburgh EPCC said: "We already host two national HPC services using HPE technology and this will be our first large-scale Arm-based supercomputer. If Arm processors are to be successful as a supercomputing technology we need to build a strong software ecosystem and EPCC will port many of the UK's key scientific applications to our HPE Apollo 70 system."

Mark Wilkinson, director the HPC facility at the University of Leicester, said the cluster will allow it to explore the potential of Arm-based systems to support work such as simulations of gravitational waves and planet formation, earth observation science models and fundamental particle physics calculations.

The project is part of the Catalyst UK programme which aims to drive supercomputer usage in the UK in general, and in the commercial sector in particular. The programme will work with UK industry to jointly develop critical applications and workflows to best exploit the clusters.

PREVIOUS AND RELATED COVERAGE

'Supercomputing for all' with AMD EPYCAMD announces the availability of new, high-performance EPYC-based PetaFLOPS systems.

IBM shares updates on DOE's Summit supercomputerWith expectations that it will be the world's fastest and most powerful supercomputer, Summit is expected to be completed in early 2018.

Microsoft Azure customers now can run workloads on Cray supercomputersMicrosoft and Cray are teaming to give Azure customers with data-intentsive HPC, AI workloads access to Cray supercomputers running in select Microsoft datacenters.

READ MORE ABOUT SUPERCOMPUTERS

Continued here:

Supercomputing: This project plans one of the world's ...

What is IBM Watson supercomputer? – Definition from WhatIs.com

Watson is an IBM supercomputer that combines artificial intelligence (AI) and sophisticated analytical software for optimal performance as a question answering machine. The supercomputer is named for IBMs founder, Thomas J. Watson.

By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.

You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

The Watson supercomputer processes at a rate of 80 teraflops (trillion floating-point operations per second). To replicate (or surpass) a high-functioning humans ability to answer questions, Watson accesses 90 servers with a combined data store of over 200 million pages of information, which it processes against six million logic rules. The device and its data are self-contained in a space that could accommodate 10 refrigerators.

Watson's key components include:

Applications for the Watson's underlying cognitive computingtechnologyare almost endless.Because the device can perform text mining and complex analytics on huge volumes of unstructured data, itcan support a search engine or an expert systemwith capabilities far superior to anypreviously existing. In May 2016, BakerHostetler, a century-old Ohio-based law firm, signed a contractfor a legal expert system based on Watson to work with its 50-human bankruptcy team.ROSS can mine data from about a billion text documents, analyze the information and provide precise responses to complicated questions in less than three seconds. Natural language processing allows the system to translate legalese to respond to the lawyers questions. ROSS' creators are adding more legal modules; similar expert systems are transforming medical research.

To showcase its abilities, Watson challenged two top-ranked players on Jeopardy! andbeat champions Ken Jennings and Brad Rutter in 2011. The Watson avatar sat between the two other contestants, as a human competitor would, while its considerable bulk sat on a different floor of the building. Like the other contestants, Watson had no Internet access.

In the practice round, Watson demonstrated a human-like ability for complex wordplay, correctly responding, for example, to Classic candy bar thats a female Supreme Court justice with What is Baby Ruth Ginsburg? Rutter noted that although the retrieval of information is trivial for Watson and difficult for a human, the human is still better at the complex task of comprehension. Nevertheless, machine learning allows Watson to examine its mistakes against the correct answers to see where it erred and so inform future responses.

In an interview during the Jeopardy! practice round, an IBM representative evaded the question of whether Watson might be made broadly available through a Web interface. The representative said that the company was currently more interested in vertical applications such as healthcare and decision support.

See also: Turing test, real-time analytics, health IT, Blue Gene, business analytics

See an introductory video onhow Watson works:

See more here:

What is IBM Watson supercomputer? - Definition from WhatIs.com

Tianhe-I – Wikipedia

Tianhe-1 and Tianhe-1AActiveTianhe-1 Operational 29 October 2009, Tianhe-1A Operational 28 October 2010SponsorsNational University of Defense TechnologyOperatorsNational Supercomputing CenterLocationNational Supercomputing Center, Tianjin, People's Republic of ChinaOperating systemLinux[1]Storage96 TB (98304 GB) for Tianhe-1,262TB for Tianhe-1ASpeedTianhe-1: 563 teraFLOPS (Rmax) 1,206.2 teraFLOPS (Rpeak),Tianhe-1A: 2,566.0 teraFLOPS (Rmax) 4,701.0 teraFLOPS (Rpeak)RankingTOP500: 2nd, June 2011 (Tianhe-1A)PurposePetroleum exploration, aircraft simulationSourcestop500.org

Tianhe-I, Tianhe-1, or TH-1 (Chinese: , [tjnxixau]; Sky River Number One)[2] is a supercomputer capable of an Rmax (maximum range) of 2.5 petaFLOPS. Located at the National Supercomputing Center of Tianjin, China, it was the fastest computer in the world from October 2010 to June 2011 and is one of the few Petascale supercomputers in the world.[3][4]

In October 2010, an upgraded version of the machine (Tianhe-1A) overtook ORNL's Jaguar to become the world's fastest supercomputer, with a peak computing rate of 2.57 petaFLOPS.[5][6] In June 2011 the Tianhe-1A was overtaken by the K computer as the world's fastest supercomputer, which was also subsequently superseded.[7]

Both the original Tianhe-1 and Tianhe-1A use a Linux-based operating system.[8][9]

On 12 August 2015, the 186,368-core Tianhe-1, felt the impact of the powerful Tianjin explosions and went offline for some time. Xinhua reports that "the office building of Chinese supercomputer Tianhe-1, one of the world's fastest supercomputers, suffered damage." Sources at Tianhe-1 told Xinhua the computer is not damaged, but they have shut down some of its operations as a precaution.[10] Operation resumed on 17 August 2015.[11]

Tianhe-1 was developed by the Chinese National University of Defense Technology (NUDT) in Changsha, Hunan. It was first revealed to the public on 29 October 2009, and was immediately ranked as the world's fifth fastest supercomputer in the TOP500 list released at the 2009 Supercomputing Conference (SC09) held in Portland, Oregon, on 16 November 2009. Tianhe achieved a speed of 563 teraflops in its first Top 500 test and had a peak performance of 1.2 petaflops. Thus at startup, the system had an efficiency of 46%.[12][13] Originally, Tianhe-1 was powered by 4,096 Intel Xeon E5540 processors and 1,024 Intel Xeon E5450 processors, with 5,120 AMD graphics processing units (GPUs), which were made up of 2,560 dual-GPU ATI Radeon HD 4870 X2 graphics cards.[14][15]

In October 2010, Tianhe-1A, an upgraded supercomputer, was unveiled at HPC 2010 China.[16] It is now equipped with 14,336 Xeon X5670 processors and 7,168 Nvidia Tesla M2050 general purpose GPUs. 2,048 FeiTeng 1000 SPARC-based processors are also installed in the system, but their computing power was not counted into the machine's official Linpack statistics as of October2010.[17] Tianhe-1A has a theoretical peak performance of 4.701 petaflops.[18] NVIDIA suggests that it would have taken "50,000 CPUs and twice as much floor space to deliver the same performance using CPUs alone." The current heterogeneous system consumes 4.04 megawatts compared to over 12 megawatts had it been built only with CPUs.[19]

The Tianhe-1A system is composed of 112 computer cabinets, 12 storage cabinets, 6 communications cabinets, and 8 I/O cabinets. Each computer cabinet is composed of four frames, with each frame containing eight blades, plus a 16-port switching board. Each blade is composed of two computer nodes, with each computer node containing two Xeon X5670 6-core processors and one Nvidia M2050 GPU processor.[20] The system has 3584 total blades containing 7168 GPUs, and 14,336 CPUs, managed by the SLURM job scheduler.[21] The total disk storage of the systems is 2 Petabytes implemented as a Lustre clustered file system,[2] and the total memory size of the system is 262 Terabytes.[17]

Another significant reason for the increased performance of the upgraded Tianhe-1A system is the Chinese-designed NUDT custom designed proprietary high-speed interconnect called Arch that runs at 160 Gbit/s, twice the bandwidth of InfiniBand.[17]

The system also used the Chinese made FeiTeng-1000 central processing unit.[22] The FeiTeng-1000 processor is used both on service nodes and to enhance the system interconnect.[22][23]

The supercomputer is installed at the National Supercomputing Center, Tianjin, and is used to carry out computations for petroleum exploration and aircraft design.[13] It is an "open access" computer, meaning it provides services for other countries.[24] The supercomputer will be available to international clients.[25]

The computer cost $88 million to build. Approximately $20 million is spent annually for electricity and operating expenses. Approximately 200 workers are employed in its operation.

Tianhe-IA was ranked as the world's fastest supercomputer in the TOP500 list[26][27] until July 2011 when the K computer overtook it.

In June 2011, scientists at the Institute of Process Engineering (IPE) at the Chinese Academy of Sciences (CAS) announced a record-breaking scientific simulation on the Tianhe-1A supercomputer that furthers their research in solar energy. CAS-IPE scientists ran a complex molecular dynamics simulation on all 7,168 NVIDIA Tesla GPUs to achieve a performance of 1.87 petaflops (about the same performance as 130,000 laptops).[28]

The Tianhe-1A supercomputer was shut down after the National Supercomputing Center of Tianjin was damaged by an explosion nearby. The computer was not damaged and still remains operational.[29]

See original here:

Tianhe-I - Wikipedia

Summit Oak Ridge Leadership Computing Facility

Summit is the next leap in leadership-class computing systems for open science. With Summit we will be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe.

Summit will deliver more than five times the computational performance of Titans 18,688 nodes, using only approximately 4,600 nodes when it arrives in 2018. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIAs high-speed NVLink. Each node will have over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect.

Upon completion, Summit will allow researchers in all fields of science unprecedented access to solving some of the worlds most pressing challenges.

The System User Guide is the definitive source of information about Summit, and details everything from connecting to running complex workflows. Please direct questions about Summit and its usage to the OLCF User Assistance Center by emailing help@olcf.ornl.gov.

Read more here:

Summit Oak Ridge Leadership Computing Facility

Blue Gene – Wikipedia

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the PFLOPS (petaFLOPS) range, with low power consumption.

The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. Blue Gene systems have often led the TOP500[1] and Green500[2] rankings of the most powerful and most power efficient supercomputers, respectively. Blue Gene systems have also consistently scored top positions in the Graph500 list.[3] The project was awarded the 2009 National Medal of Technology and Innovation.[4]

As of 2015, IBM seems to have ended the development of the Blue Gene family[5] though no public announcement has been made. IBM's continuing efforts of the supercomputer scene seems to be concentrated around OpenPower, using accelerators such as FPGAs and GPUs to battle the end of Moore's law.[6]

In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding.[7] The project had two main goals: to advance our understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures. The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. The initial research and development work was pursued at IBM T.J. Watson Research Center and led by William R. Pulleyblank.[8]

At IBM, Alan Gara started working on an extension of the QCDOC architecture into a more general-purpose supercomputer: The 4D nearest-neighbor interconnection network was replaced by a network supporting routing of messages from any node to any other; and a parallel I/O subsystem was added. DOE started funding the development of this system and it became known as Blue Gene/L (L for Light); development of the original Blue Gene system continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64.

In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a Linpack performance of 70.72 TFLOPS.[1] It thereby overtook NEC's Earth Simulator, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL[9] gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based Roadrunner system at Los Alamos National Laboratory, which was the first system to surpass the 1 PetaFLOPS mark. The system was built in Rochester, MN IBM plant.

While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. In November 2006, there were 27 computers on the TOP500 list using the Blue Gene/L architecture. All these computers were listed as having an architecture of eServer Blue Gene Solution. For example, three racks of Blue Gene/L were housed at the San Diego Supercomputer Center.

While the TOP500 measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005 Gordon Bell Prize.

In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (Qbox).[10] At Supercomputing 2006,[11] Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards.[12] In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).[13]

The name Blue Gene comes from that it was originally designed to do, help biologists understand the processes of protein folding and gene development.[14] "Blue" is a traditional moniker that IBM uses for many of its products and the company itself. The original Blue Gene design was renamed "Blue Gene/C" and eventually Cyclops64. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a petascale design. "Q" is just the letter after "P". There is no Blue Gene/R.[15]

The Blue Gene/L supercomputer was unique in the following aspects:[16]

The Blue Gene/L architecture was an evolution of the QCDSP and QCDOC architectures. Each Blue Gene/L Compute or I/O node was a single ASIC with associated DRAM memory chips. The ASIC integrated two 700MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS (gigaFLOPS). The two CPUs were not cache coherent with one another.

Compute nodes were packaged two per compute card, with 16 compute cards plus up to 2 I/O nodes per node board. There were 32 node boards per cabinet/rack.[17] By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated low power (about 17 watts, including DRAMs). This allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard 19-inch rack, within reasonable limits of electrical power supply and air cooling. The performance metrics, in terms of FLOPS per watt, FLOPS per m2 of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.

Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network. The I/O nodes handled filesystem operations on behalf of the compute nodes. Finally, a separate and private Ethernet network provided access to any node for configuration, booting and diagnostics. To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive integer power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.

Blue Gene/L compute nodes used a minimal operating system supporting a single user program. Only a subset of POSIX calls was supported, and only one process could run at a time on node in co-processor modeor one process per CPU in virtual mode. Programmers needed to implement green threads in order to simulate local concurrency. Application development was usually performed in C, C++, or Fortran using MPI for communication. However, some scripting languages such as Ruby[18] and Python[19] have been ported to the compute nodes.

In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and Argonne National Laboratory's Leadership Computing Facility.[20]

The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850MHz. The cores are cache coherent and the chip can operate as a 4-way symmetric multiprocessor (SMP). The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A rack contains 32 node boards (thus 1024 nodes, 4096 processor cores).[21] By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371 MFLOPS/W Blue Gene/P installations ranked at or near the top of the Green500 lists in 2007-2008.[2]

The following is an incomplete list of Blue Gene/P installations. Per November 2009, the TOP500 list contained 15 Blue Gene/P installations of 2-racks (2048 nodes, 8192 processor cores, 23.86 TFLOPS Linpack) and larger.[1]

The third supercomputer design in the Blue Gene series, Blue Gene/Q has a peak performance of 20 Petaflops,[37] reaching LINPACK benchmarks performance of 17 Petaflops. Blue Gene/Q continues to expand and enhance the Blue Gene/L and /P architectures.

The Blue Gene/Q Compute chip is an 18 core chip. The 64-bit A2 processor cores are 4-way simultaneously multithreaded, and run at 1.6GHz. Each processor core has a SIMD Quad-vector double precision floating point unit (IBM QPX). 16 Processor cores are used for computing, and a 17th core for operating system assist functions such as interrupts, asynchronous I/O, MPI pacing and RAS. The 18th core is used as a redundant spare, used to increase manufacturing yield. The spared-out core is shut down in functional operation. The processor cores are linked by a crossbar switch to a 32 MB eDRAM L2 cache, operating at half core speed. The L2 cache is multi-versioned, supporting transactional memory and speculative execution, and has hardware support for atomic operations.[38] L2 cache misses are handled by two built-in DDR3 memory controllers running at 1.33GHz. The chip also integrates logic for chip-to-chip communications in a 5D torus configuration, with 2GB/s chip-to-chip links. The Blue Gene/Q chip is manufactured on IBM's copper SOI process at 45nm. It delivers a peak performance of 204.8 GFLOPS at 1.6GHz, drawing about 55 watts. The chip measures 1919mm (359.5mm) and comprises 1.47 billion transistors. The chip is mounted on a compute card along with 16 GB DDR3 DRAM (i.e., 1 GB for each user processor core).[39]

A Q32[40] compute drawer contains 32 compute cards, each water cooled.[41] A "midplane" (crate) contains 16 Q32 compute drawers for a total of 512 compute nodes, electrically interconnected in a 5D torus configuration (4x4x4x4x2). Beyond the midplane level, all connections are optical. Racks have two midplanes, thus 32 compute drawers, for a total of 1024 compute nodes, 16,384 user cores and 16 TB RAM.[41]

Separate I/O drawers, placed at the top of a rack or in a separate rack, are air cooled and contain 8 compute cards and 8 PCIe expansion slots for Infiniband or 10 Gigabit Ethernet networking.[41]

At the time of the Blue Gene/Q system announcement in November 2011, an initial 4-rack Blue Gene/Q system (4096 nodes, 65536 user processor cores) achieved #17 in the TOP500 list[1] with 677.1 TeraFLOPS Linpack, outperforming the original 2007 104-rack BlueGene/L installation described above. The same 4-rack system achieved the top position in the Graph500 list[3] with over 250 GTEPS (giga traversed edges per second). Blue Gene/Q systems also topped the Green500 list of most energy efficient supercomputers with up to 2.1 GFLOPS/W.[2]

In June 2012, Blue Gene/Q installations took the top positions in all three lists: TOP500,[1] Graph500 [3] and Green500.[2]

The following is an incomplete list of Blue Gene/Q installations. Per June 2012, the TOP500 list contained 20 Blue Gene/Q installations of 1/2-rack (512 nodes, 8192 processor cores, 86.35 TFLOPS Linpack) and larger.[1] At a (size-independent) power efficiency of about 2.1 GFLOPS/W, all these systems also populated the top of the June 2012 Green 500 list.[2]

Record-breaking science applications have been run on the BG/Q, the first to cross 10 petaflops of sustained performance. The cosmology simulation framework HACC achieved almost 14 petaflops with a 3.6 trillion particle benchmark run,[60] while the Cardioid code,[61][62] which models the electrophysiology of the human heart, achieved nearly 12 petaflops with a near real-time simulation, both on Sequoia. A fully compressible flow solver has also achieved 14.4 PFLOP/s (originally 11 PFLOP/s) on Sequoia, 72% of the machine's nominal peak performance.[63]

Excerpt from:

Blue Gene - Wikipedia