Supercomputing has come a long way since its beginnings in the 1960s. Initially, many supercomputers were based on mainframes, however, their cost and complexity were significant barriers to entry for many institutions. The idea of utilizing multiple low-cost PCs over a network to provide a cost-effective form of parallel computing led research institutions along the path of high-performance computing (HPC) clusters starting with "Beowulf clusters in the 90s.
Beowulf clusters are very much the predecessors to todays HPC clusters. The fundamentals of the Beowulf architecture are still relevant to modern-day HPC deployments; however, multiple desktop PCs have been replaced with purpose-built, high-density server platforms. Networking has significantly improved, with High Bandwidth/Low Latency InfiniBand (or, as a nod to the past, increasingly Ethernet) and high-performance parallel filesystems such as SpectrumScale, Lustre and BeeGFS have been developed to allow the storage to keep up with the compute. The development of excellent, often open-source, tools for managing high-performance distributed computing has also made adoption a lot easier.
More recently, we have witnessed the advancement of HPC from the original, CPU-based clusters to systems that do the bulk of their processing on Graphic Processing Units (GPUs), resulting in the growth of GPU accelerated computing.
While HPC was scaling up with more compute resource, the data was growing at a far faster pace. Since the outset of 2010, there has been a huge explosion in unstructured data from sources like webchats, cameras, sensors, video communications and so on. This has presented big data challenges for storage, processing, and transfer. Newer technology paradigms such as big data, parallel computing, cloud computing, Internet of Things (IoT) and artificial intelligence (AI) came into the mainstream to cope with the problems caused by the data onslaught.
What these paradigms all have in common is that they are capable of being parallelized to a high degree. HPCs GPU parallel computing has been a real game-changer for AI as parallel computing can process all this data, in a short amount of time using GPUs. As workloads have grown, so too have GPU parallel computing and AI machine learning. Image analysis is a good example of how the power of GPU computing can support an AI project. With one GPU it would take 72 hours to process an imaging deep learning model, but it only takes 20 minutes to run the same AI model on an HPC cluster with 64 GPUs.
Beowulf is still relevant to AI workloads. Storage, networking, and processing are important to make AI projects work at scale, this is when AI can make use of the large-scale, parallel environments that HPC infrastructure (with GPUs) provides to help process workloads quickly. Training an AI model takes more far more time than testing one. The importance of coupling AI with HPC is that it significantly speeds up the training stage and boosts the accuracy and reliability of AI models, whilst keeping the training time to a minimum.
The right software is needed to support the HPC and AI combination. There are traditional products and applications that are being used to run AI workloads from within HPC environments, as many share the same requirements for aggregating large pools of resources and managing them. However, everything from the underlying hardware, the schedulers used, Message Passing Interface (MPI) and even to how software is packaged up is beginning to change towards more flexible models, and a rize in hybrid environments is a trend that we expect to see continue.
As traditional use cases for HPC applications are so well established, changes often happen relatively slowly. However, the updates for many HPC applications are only necessary every 6 to 12 months. On the other hand, AI development is happening so fast, updates and new applications, tools and libraries are being released roughly daily.
If you employed the same update strategies to manage your AI as you do for your HPC platforms, you would get left behind. That is why a solution like NVIDIAs DGX containerized platform allows you to quickly and easily keep up to date with rapid developments from NVIDIA GPU CLOUD (NGC), an online database of AI and HPC tools encapsulated in easy to consume containers.
It is becoming standard practice within the HPC community to use a containerized platform for managing instances that are beneficial for AI deployment. Containerization has accelerated support for AI workloads on HPC clusters.
AI models can be used to predict the outcome of a simulation without having to run the full, resource-intensive, simulation. By using an AI model in this way input variables/design points of interest can be narrowed down to a candidate list quickly and at much lower cost. These candidate variables can be run through the known simulation to verify the AI models prediction.
Quantum Molecular Simulations (QMS), Chip Design and Drug Discovery are areas this technique is increasingly being applied, IBM also recently launched a product that does exactly this known as IBM Bayesian Optimization Accelerator (BOA).
Start with a few simple questions; How big is my problem? How fast do I want my results back? How much data do I have to process? How many users are sharing the resource?
HPC techniques will help the management of an AI project if the existing dataset is substantial, or if contention issues are being experienced on the infrastructure from having multiple users. If you are at a point where you need to put four GPUs in a workstation and this is becoming a problem by causing a bottleneck, you need to consult with an HPC integrator, with experience in scaling up infrastructure for these types of workloads.
Some organizations might be running AI workloads on a large machine or multiple machines with GPUs and your AI infrastructure might look more like HPC infrastructure than you realize. There are HPC techniques, software and other aspects that can really help to manage that infrastructure. The infrastructure looks quite similar, but there are some clever ways of installing and managing it specifically geared towards AI modeling.
Storage is very often overlooked when organizations are building infrastructure for AI workloads, and you may not be getting the full ROI on your AI infrastructure if your compute is waiting for your storage to be freed up. It is important to seek the best advice for sizing and deploying the right storage solution for your cluster.
Big data doesnt necessarily need to be that big, it is just when it reaches that point when it becomes unmanageable for an organization. When you cant get out of it what you want, then it becomes too big for you. HPC can provide the compute power to deal with the large amounts of data in AI workloads.
It is an exciting time for both HPC and AI, as we are seeing incremental adaptation by both technologies. The challenges are getting bigger every day, with newer and more distinct problems which need faster solutions. For example, countering cyber-attacks, discovering new vaccines, detecting enemy missiles and so on.
It will be interesting to see what happens next in terms of inclusion of 100% containerized environments onto HPC clusters, and technologies such as Singularity and Kubernetes environments.
Schedulers today initiate jobs and wait until they finish which may not be an ideal scenario for AI environments. More recently, newer schedulers monitor the real-time performance and execute jobs based on priority and runtime and will be able to work alongside containerization technologies and environments such as Kubernetes to orchestrate the resource needed.
Storage will become increasingly important to support large deployments, as vast volumes of data need to be stored, classified, labeled, cleansed, and moved around quickly. Infrastructure such as flash storage and networking become vital to your project, alongside storage software that can scale with demand.
Both HPC and AI will continue to have an impact on both organizations and each other and their symbiotic relationship will only grow stronger as both traditional HPC users and AI infrastructure modelers realize the full potential of each other.
Vibin Vijay, AI Product Specialist, OCF
Read more:
Solving the data conundrum with HPC and AI - ITProPortal
- Chinese national arrested and charged with stealing AI trade secrets from Google - NPR - March 8th, 2024 [March 8th, 2024]
- President Biden Calls for Ban on AI Voice Impersonations During State of the Union - Variety - March 8th, 2024 [March 8th, 2024]
- Revolutionize Your Business with AWS Generative AI Competency Partners | Amazon Web Services - AWS Blog - March 8th, 2024 [March 8th, 2024]
- Broadcom Expects AI Demand to Help Offset Weakness Elsewhere - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- Micron Hits Record High With Analysts Calling It an 'Under-Appreciated AI Beneficiary' - Investopedia - March 8th, 2024 [March 8th, 2024]
- The Adams administration quietly hired its first AI czar. Who is he? - City & State New York - March 8th, 2024 [March 8th, 2024]
- AI likely to increase energy use and accelerate climate misinformation report - The Guardian - March 8th, 2024 [March 8th, 2024]
- This Artificial Intelligence (AI) Stock Could Double, and It Is Way Cheaper Than Nvidia - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- Fake images made to show Trump with Black supporters highlight concerns around AI and elections - The Associated Press - March 8th, 2024 [March 8th, 2024]
- Artificial intelligence and illusions of understanding in scientific research - Nature.com - March 8th, 2024 [March 8th, 2024]
- Analysis | House AI task force leaders take long view on regulating the tools - The Washington Post - March 8th, 2024 [March 8th, 2024]
- Don't Give Your Business Data to AI Companies - Dark Reading - March 8th, 2024 [March 8th, 2024]
- NIST, the lab at the center of Bidens AI safety push, is decaying - The Washington Post - March 8th, 2024 [March 8th, 2024]
- Essay | AI is Coming! Tips for Staying Calm and Carrying On - The Wall Street Journal - March 8th, 2024 [March 8th, 2024]
- AI can be easily used to make fake election photos - report - BBC.com - March 8th, 2024 [March 8th, 2024]
- 5 Artificial Intelligence (AI) Stocks That Could Make You a Millionaire - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- AI could be an extraordinary force for good. So why do our politicians still not have a plan? - The Guardian - March 8th, 2024 [March 8th, 2024]
- Mapping Disease Trajectories from Birth to Death with AI - Neuroscience News - March 8th, 2024 [March 8th, 2024]
- India plans 10,000-GPU sovereign AI supercomputer - The Register - March 8th, 2024 [March 8th, 2024]
- SAP enhances Datasphere and SAC for AI-driven transformation - CIO - March 8th, 2024 [March 8th, 2024]
- Jim Cramer names companies and sectors poised to rally on the AI wave - CNBC - March 8th, 2024 [March 8th, 2024]
- The job applicants shut out by AI: The interviewer sounded like Siri - The Guardian - March 8th, 2024 [March 8th, 2024]
- Microsoft confirms Surface and Windows AI event for March 21st - The Verge - March 8th, 2024 [March 8th, 2024]
- Adobes new Express app brings Firefly AI tools to iOS and Android - The Verge - March 8th, 2024 [March 8th, 2024]
- A Google AI Watched 30,000 Hours of Video GamesNow It Makes Its Own - Singularity Hub - March 8th, 2024 [March 8th, 2024]
- Palantir CEO Karp on TITAN, AI Warfare Technology - Bloomberg - March 8th, 2024 [March 8th, 2024]
- Elliptic Curve Murmurations Found With AI Take Flight - Quanta Magazine - March 8th, 2024 [March 8th, 2024]
- 5 AI Stocks to Buy in March 2024, According to Analysts - TipRanks.com - TipRanks - March 8th, 2024 [March 8th, 2024]
- Wix's new AI chatbot builds websites in seconds based on prompts - The Verge - March 8th, 2024 [March 8th, 2024]
- Amid record high energy demand, America is running out of electricity - The Washington Post - March 8th, 2024 [March 8th, 2024]
- AI Crypto Tokens in 5 Minutes: What to Know and Where to Start - Inc. - February 26th, 2024 [February 26th, 2024]
- 'The Worlds I See' by AI visionary Fei-Fei Li '99 selected as Princeton Pre-read - Princeton University - February 26th, 2024 [February 26th, 2024]
- AI is having a 1995 moment, analyst says - Business Insider - February 26th, 2024 [February 26th, 2024]
- Vatican research group's book outlines AI's 'brave new world' - National Catholic Reporter - February 26th, 2024 [February 26th, 2024]
- Honor's Magic 6 Pro launches internationally with AI-powered eye tracking on the way - The Verge - February 26th, 2024 [February 26th, 2024]
- Google explains Gemini's embarrassing AI pictures of diverse Nazis - The Verge - February 26th, 2024 [February 26th, 2024]
- Google cut a deal with Reddit for AI training data - The Verge - February 26th, 2024 [February 26th, 2024]
- What's the point of Elon Musk's AI company? - The Verge - February 26th, 2024 [February 26th, 2024]
- AI agents like Rabbit aim to book your vacation and order your Uber - NPR - February 26th, 2024 [February 26th, 2024]
- Announcing Microsofts open automation framework to red team generative AI Systems - Microsoft - February 26th, 2024 [February 26th, 2024]
- After Nvidia's latest blowout, here are 20 AI stocks expected to rise as much as 44% - Yahoo Finance - February 26th, 2024 [February 26th, 2024]
- 1 Exceptional AI Chip Stock Investors Need to Know About in 2024 - The Motley Fool - February 26th, 2024 [February 26th, 2024]
- Nvidia briefly hits $2 trillion valuation as AI frenzy grips Wall Street - Reuters - February 26th, 2024 [February 26th, 2024]
- AI Chatbots Can Guess Your Personal Information From What You ... - WIRED - October 18th, 2023 [October 18th, 2023]
- Harvard IT Launches Pilot of AI Sandbox to Enable Walled-Off Use ... - Harvard Crimson - October 18th, 2023 [October 18th, 2023]
- Advancing policing through AI: Insights from the global law ... - Police News - October 18th, 2023 [October 18th, 2023]
- Hochul announces new SUNY, IBM investments in AI - Olean Times Herald - October 18th, 2023 [October 18th, 2023]
- Nvidia's banking on TensorRT to expand its generative AI dominance - The Verge - October 18th, 2023 [October 18th, 2023]
- AI expands from MRFs to vehicles - Plastics Recycling Update - October 18th, 2023 [October 18th, 2023]
- AI Reads Ancient Scroll Charred by Mount Vesuvius in Tech First - Scientific American - October 18th, 2023 [October 18th, 2023]
- A DEEPer (squared) dive into AI Harvard Gazette - Harvard Gazette - October 18th, 2023 [October 18th, 2023]
- Florida bar weighs whether lawyers using AI need client consent - Reuters - October 18th, 2023 [October 18th, 2023]
- Cognizant and Vianai Systems Announce Strategic Partnership to ... - PR Newswire - October 18th, 2023 [October 18th, 2023]
- How AI could speed up scientific discoveries, from proteins to ... - NPR - October 18th, 2023 [October 18th, 2023]
- AI challenge to deliver better healthcare | Western Australian ... - Government of Western Australia - October 18th, 2023 [October 18th, 2023]
- Henry Kissinger: The Path to AI Arms Control - Foreign Affairs Magazine - October 18th, 2023 [October 18th, 2023]
- Stability AI releases StableStudio in latest push for open-source AI - The Verge - May 18th, 2023 [May 18th, 2023]
- Google CEO Sundar Pichai Predicts That This Profession Will Be ... - The Motley Fool - May 18th, 2023 [May 18th, 2023]
- Frances privacy watchdog eyes protection against data scraping in AI action plan - TechCrunch - May 18th, 2023 [May 18th, 2023]
- Investing in Hippocratic AI - Andreessen Horowitz - May 18th, 2023 [May 18th, 2023]
- As Alphabet flexes its AI prowess, there's a 'new elephant in the room' for Google - MarketWatch - May 18th, 2023 [May 18th, 2023]
- The Boring Future of Generative AI | WIRED - WIRED - May 18th, 2023 [May 18th, 2023]
- OpenAI readies new open-source AI model, The Information reports - Reuters.com - May 18th, 2023 [May 18th, 2023]
- What every CEO should know about generative AI - McKinsey - May 18th, 2023 [May 18th, 2023]
- AI creates images of the 'perfect' man and woman - Sky News - May 18th, 2023 [May 18th, 2023]
- Audit AI search tools now, before they skew research - Nature.com - May 18th, 2023 [May 18th, 2023]
- 3 Reasons C3.ai Stock Could Be Your Golden Ticket to the AI ... - InvestorPlace - May 18th, 2023 [May 18th, 2023]
- Zoom makes a big bet on AI with investment in Anthropic - VentureBeat - May 18th, 2023 [May 18th, 2023]
- AI voice phone scams are on the rise. Here's how to avoid them - USA TODAY - May 18th, 2023 [May 18th, 2023]
- Amazon is building an AI-powered conversational experience for ... - The Verge - May 18th, 2023 [May 18th, 2023]
- AI speculators need to 'differentiate between actual spending and investment' and hype: Strategist - Yahoo Finance - May 18th, 2023 [May 18th, 2023]
- AI Can Be Both Accurate and Transparent - HBR.org Daily - May 18th, 2023 [May 18th, 2023]
- You're Probably Underestimating AI Chatbots | WIRED - WIRED - May 18th, 2023 [May 18th, 2023]
- AI presents political peril for 2024 with threat to mislead voters - The Associated Press - May 18th, 2023 [May 18th, 2023]
- We need AI to help us face the challenges of the future - The Guardian - May 18th, 2023 [May 18th, 2023]
- End Of Googles Dominance? Stock Gets Rare Analyst Downgrade Over AI Fears - Forbes - May 18th, 2023 [May 18th, 2023]
- Watch 44 million atoms simulated using AI and a supercomputer - New Scientist - May 18th, 2023 [May 18th, 2023]
- AI Is The New Electricity: Bank Of America Picks 20 Stocks To Cash In On ChatGPT Hype - Forbes - March 2nd, 2023 [March 2nd, 2023]
- Tech Giants Are Barreling Headfirst Into an AI Arms Race - February 20th, 2023 [February 20th, 2023]
- Bing's AI Is Threatening Users. That's No Laughing Matter - TIME - February 20th, 2023 [February 20th, 2023]