How do supercomputers work? – Explain that Stuff

by Chris Woodford. Last updated: June 26, 2018.

Roll back time a half-century or so and thesmallest computer in the world was a gargantuan machine that filled aroom. When transistors andintegrated circuits were developed,computers could pack the same power into microchips as big as yourfingernail. So what if you build a room-sized computer today and fillit full of those same chips? What you get is a supercomputeracomputer that's millions of times faster than a desktop PC andcapable of crunching the world's most complex scientific problems.What makes supercomputers different from the machine you're usingright now? Let's take a closer look!

Photo: This is Titan, a supercomputer based at Oak Ridge National Laboratory. At the time of writing in 2018, it's the world's seventh most powerful machine(it was the third most powerful in 2017). The world's current most powerful machine, Summit, is over five times better!Picture courtesy of Oak Ridge National Laboratory, US Department of Energy, published onFlickr in 2012under a Creative Commons Licence.

Before we make a start on that question, it helpsif we understand what a computer is:it's a general-purpose machine thattakes in information (data) by a process called input, stores andprocesses it, and then generates some kind of output (result). Asupercomputer is not simply a fast or very large computer: it worksin an entirely different way, typically using parallel processinginstead of the serial processing that an ordinary computer uses.Instead of doing one thing at a time, it does many things at once.

Chart: Who has the most supercomputers? Almost 90 percent of the world's 500 most powerful machines can be found in just six countries: China, the USA, Japan, Germany, France, and the UK. Drawn in January 2018 using the latest data from TOP500, November 2017.

What's the difference between serial and parallel? An ordinary computer doesone thing at a time, so it does things in a distinct series ofoperations; that's called serial processing. It's a bit like aperson sitting at a grocery store checkout, picking up items from theconveyor belt, running them through the scanner, and then passingthem on for you to pack in your bags. It doesn't matter how fast youload things onto the belt or how fast you pack them: the speed atwhich you check out your shopping is entirely determined by how fastthe operator can scan and process the items, which is always one at atime. (Since computers first appeared, most have worked by simple, serial processing,inspired by a basic theoretical design called a Turing machine,originally conceived by Alan Turing.)

A typical modern supercomputer works much morequickly by splitting problems into pieces and working on manypieces at once, which is called parallel processing.It's like arriving at the checkout with a giant cart full of items, butthen splitting your items up between several different friends. Eachfriend can go through a separate checkout with a few of the items andpay separately. Once you've all paid, you can get together again,load up the cart, and leave. The more items there are and the morefriends you have, the faster it gets to do things by parallelprocessingat least, in theory. Parallel processing is more like what happens in our brains.

Artwork: Serial and parallel processing: Top: In serial processing, a problem is tackled one step at a time by a single processor. It doesn't matter how fast different parts of the computer are (such as the input/output or memory), the job still gets done at the speed of the central processor in the middle.Bottom: In parallel processing, problems are broken up into components, each of which is handled by a separate processor. Since the processors are working in parallel, the problem is usually tackled more quickly even if the processors work at the same speed as the one in a serial system.

Most of us do quite trivial, everyday things withour computers that don't tax them in any way: looking at web pages,sending emails, and writing documents use very little of theprocessing power in a typical PC. But if you try to do something morecomplex, like changing the colors on a very large digital photograph,you'll know that your computer does, occasionally, have to work hardto do things: it can take a minute or so to do really complexoperations on very large digital photos. If you play computer games, you'll beaware that you need a computer with a fast processor chip and quite alot of "working memory" (RAM), or things really slow down. Add afaster processor or double the memory and your computer will speed updramaticallybut there's still a limit to how fast it will go: oneprocessor can generally only do one thing at a time.

Now suppose you're a scientist charged withforecasting the weather, testing a new cancer drug, or modeling howthe climate might be in 2050. Problems like that push even theworld's best computers to the limit. Just like you can upgrade adesktop PC with a better processor and more memory, so you can do thesame with a world-class computer. But there's still a limit to howfast a processor will work and there's only so much difference morememory will make. The best way to make a difference is to useparallel processing: add more processors, split your problem intochunks, and get each processor working on a separate chunk of yourproblem in parallel.

Once computer scientists had figured out the basicidea of parallel processing, it made sense to add more and moreprocessors: why have a computer with two or three processors when youcan have one with hundreds or even thousands? Since the 1990s,supercomputers have routinely used many thousands of processors in what'sknown as massively parallel processing; at the time I'mupdating this, in June 2018, the supercomputer with more processorsthan any other in the world, the Sunway TaihuLight, has around 40,960 processing modules,each with 260 processor cores, which means 10,649,600 processor cores in total!

Unfortunately, parallel processing comes with abuilt-in drawback. Let's go back to the supermarket analogy. If youand your friends decide to split up your shopping to go throughmultiple checkouts at once, the time you save by doing this isobviously reduced by the time it takes you to go your separate ways,figure out who's going to buy what, and come together again at the end. We can guess, intuitively, thatthe more processors there are in a supercomputer, the harder it will probably be tobreak up problems and reassemble them to make maximum efficient use of parallel processing. Moreover,there will need to be some sort of centralized management system or coordinator to split the problems, allocate and control the workload between all the different processors, and reassemble the results, which will also carry an overhead.

With a simple problem like paying for a cart of shopping, that's not really an issue. But imagineif your cart contains a billion items and you have 65,000 friends helping you with the checkout.If you have a problem (like forecasting the world's weather for next week) that seems to split neatly into separate sub-problems(making forecasts for each separate country), that's one thing. Computer scientists refer to complex problems like this, which can be split up easily into independent pieces, as embarrassingly parallel computations (EPC)becausethey are trivially easy to divide.

But most problems don't cleave neatly that way. The weather in one country depends to a great extent on the weather inother places, so making a forecast for one country will need to take account of forecasts elsewhere. Often, the parallel processorsin a supercomputer will need to communicate with one another as they solve their own bits of the problems. Or one processor might have to wait for results from another before it can do a particular job. A typical problem worked on by a massively parallel computerwill thus fall somewhere between the two extremes of a completely serial problem (where every single step has to be done in an exact sequence) and an embarrassingly parallel one; while some parts can be solved in parallel, other parts will need to be solved in a serial way. A law of computing (known as Amdahl's law, for computer pioneer Gene Amdahl), explains how the part of the problem that remains serial effectively determines the maximum improvement in speed you can get from using a parallel system.

You can make a supercomputer by filling a giantbox with processors and getting them to cooperate on tackling acomplex problem through massively parallel processing. Alternatively,you could just buy a load of off-the-shelf PCs, put them in the sameroom, and interconnect them using a very fast local areanetwork (LAN) so they work in a broadly similar way. That kind ofsupercomputer is called a cluster.Google does its web searches for users with clusters ofoff-the-shelf computers dotted in data centers around the world.

Photo: Supercomputer cluster:NASA'sPleiades ICE Supercomputer is a cluster of 112,896 cores made from185 racks of Silicon Graphics (SGI) workstations. Picture by Dominic Hart courtesy ofNASA Ames Research Center.

A grid is a supercomputer similar to acluster (in that it's made up of separate computers), but thecomputers are in different places and connected through the Internet(or other computer networks). This is an example of distributedcomputing, which means that the power of a computer is spread across multiple locationsinstead of being located in one, single place (that's sometimes called centralized computing).

Grid super computing comes in two main flavors. Inone kind, we might have, say, a dozen powerful mainframe computers inuniversities linked together by a network to form a supercomputergrid. Not all the computers will be actively working in the grid allthe time, but generally we know which computers make up the network.The CERN Worldwide LHC Computing Grid, assembled to process data from the LHC (Large Hadron Collider) particle accelerator, is an example of this kind of system. It consists of two tiers of computer systems, with 11 major (tier-1) computer centers linked directlyto the CERN laboratory by private networks, which are themselves linked to 160 smaller (tier-2) computer centers around the world(mostly in universities and other research centers), using a combination of the Internet and private networks.

The other kind of grid is much more ad-hoc andinformal and involves far more individual computerstypicallyordinary home computers. Have you ever taken part in an onlinecomputing project such asSETI@home,GIMPS,FightAIDS@home,Folding@home,MilkyWay@home,or ClimatePrediction.net?If so, you've allowed your computer to be used as part of an informal,ad-hoc supercomputer grid. This kind of approach is calledopportunistic supercomputing, because it takes advantage of whatevercomputers just happen to be available at the time. Grids like this,which are linked using the Internet, are best for solvingembarrassingly parallel problems that easily break up intocompletely independent chunks.

You might be surprised to discover that mostsupercomputers run fairly ordinary operating systems much like theones running on your own PC, although that's less surprising whenwe remember that a lot of modern supercomputers are actually clusters of off-the-shelf computersor workstations. The most common supercomputer operating system used tobe Unix, but it's now been superseded by Linux (an open-source,Unix-like operating system originally developed by Linus Torvalds andthousands of volunteers). Since supercomputers generally work onscientific problems, their application programs are sometimes written in traditional scientific programming languagessuch as Fortran, as well as popular, more modern languages such asC and C++.

Photo: Supercomputers can help us crack the most complex scientific problems, including modeling Earth's climate. Picture courtesy of NASA on the Commons.

As we saw at the start of this article, oneessential feature of a computer is that it's a general-purposemachine you can use in all kinds of different ways: you can sendemails on a computer, play games, edit photos, or do any number ofother things simply by running a different program. If you're usinga high-end cellphone, such as an Android phone or an iPhoneor an iPod Touch, what you have is a powerful little pocket computer that can run programs by loading different "apps"(applications), which are simply computer programs by another name. Supercomputers are slightly different.

Typically, supercomputers have been used forcomplex, mathematically intensive scientific problems, includingsimulating nuclear missile tests, forecasting the weather, simulatingthe climate, and testing the strength of encryption (computersecurity codes). In theory, a general-purpose supercomputer can beused for absolutely anything.

While some supercomputers are general-purposemachines that can be used for a wide variety of different scientificproblems, some are engineered to do very specific jobs. Two of themost famous supercomputers of recent times were engineered this way.IBM's Deep Blue machine from 1997 was built specifically to playchess (against Russian grand master Gary Kasparov), while its laterWatson machine (named for IBM's founder, Thomas Watson, and his son) was engineered to play the game Jeopardy. Specially designed machines likethis can be optimized for particular problems; so, for example, DeepBlue would have been designed to search through huge databases ofpotential chess moves and evaluate which move was best in aparticular situation, while Watson was optimized to analyze trickygeneral-knowledge questions phrased in (natural)human language.

Look through the specifications of ordinarycomputers and you'll find their performance is usually quoted inMIPS (million instructions per second),which is how many fundamental programming commands (read, write, store, and so on) the processor can manage. It's easy tocompare two PCs by comparing the number of MIPS they can handle (or even their processor speed, which is typically rated in gigahertz orGHz).

Supercomputers are rated a different way. Sincethey're employed in scientific calculations, they're measuredaccording to how many floating point operations per second (FLOPS)they can do, which is a more meaningful measurement based on what they're actually trying to do(unlike MIPS, which is a measurement of how they are trying to do it). Since supercomputers were first developed, theirperformance has been measured in successively greater numbers of FLOPS, as the table below illustrates:

The example machines listed in the table are described in more detail in the chronology, below.

Read the original:

How do supercomputers work? - Explain that Stuff

Related Posts

Comments are closed.