For the average AI shop, sparse models and cheap memory will win – The Register

Posted: June 11, 2022 at 2:13 am

As compelling as the leading large-scale language models may be, the fact remains that only the largest companies have the resources to actually deploy and train them at meaningful scale.

For enterprises eager to leverage AI to a competitive advantage, a cheaper, pared-down alternative may be a better fit, especially if it can be tuned to particular industries or domains.

Thats where an emerging set of AI startups hoping to carve out a niche: by building sparse, tailored models that, maybe not as powerful as GPT-3, are good enough for enterprise use cases and run on hardware that ditches expensive high-bandwidth memory (HBM) for commodity DDR.

German AI startup Aleph Alpha is one such example. Founded in 2019, the Heidelberg, Germany-based companys Luminous natural-language model boasts many of the same headline-grabbing features as OpenAIs GPT-3: copywriting, classification, summarization, and translation, to name a few.

The model startup has teamed up with Graphcore to explore and develop sparse language models on the British chipmaker's hardware.

Graphcores IPUs present an opportunity to evaluate the advanced technological approaches such as conditional sparsity, Aleph Alpha CEO Jonas Andrulius said in a statement. These architectures will undoubtedly play a role in Aleph Alphas future research.

Conditionally sparse models sometimes called mixture of experts or routed models only process data against the applicable parameters, something that can significantly reduce the compute resources needed to run them.

For example, if a language model was trained in all the languages on the internet, and then is asked a question in Russian, it wouldnt make sense to run that data through the entire model, only the parameters related to the Russian language, explained Graphcore CTO Simon Knowles, in an interview with The Register.

Its completely obvious. This is how your brain works, and its also how an AI ought to work, he said. Ive said this many times, but if an AI can do many things, it doesnt need to access all of its knowledge to do one thing.

Knowles, whos company builds accelerators tailored for these kinds of models, unsurprisingly believes theyre the future of AI. Id be surprised if, by next year, anyone is building dense-language models, he added.

Sparse language models arent without their challenges. One of the most pressing, according to Knowles, has to do with the memory. The HBM used in high-end GPUs to achieve the necessary bandwidth and capacities required by these models is expensive and attached to an even more expensive accelerator.

This isnt an issue for dense-language models where you might need all of that compute and memory, but it poses a problem for sparse models, which favor memory over compute, he explained.

Interconnect tech, like Nvidias NVLink, can be used to pool memory across multiple GPUs, but if the model doesnt require all that compute, the GPUs could be left sitting idle. Its a really expensive way to buy memory, Knowles said.

Graphcores accelerators attempt to sidestep this challenge by borrowing a technique as old as computing itself: caching. Each IPU features a relatively large SRAM cache 1GB to satiate the bandwidth requirements of these models, while raw capacity is achieved using large pools of inexpensive DDR4 memory.

The more SRAM you've got, the less DRAM bandwidth you need, and this is what allows us to not use HBM, Knowles said.

By decoupling memory from the accelerator, its far less expensive the cost of a few commodity DDR modules for enterprises to support larger AI models.

In addition to supporting cheaper memory, Knowles claims the companys IPUs also have an architectural advantage over GPUs, at least when it comes to sparse models.

Instead of running on a small number of large matrix multipliers like you find in a tensor processing unit Graphcores chips feature a large number of smaller matrix math units that can address the memory independently.

This provides greater granularity for sparse models, where you need the freedom to fetch relevant subsets, and the smaller the unit youre obliged to fetch, the more freedom you have, he explained.

Put together, Knowles argues this approach enables its IPUs to train large AI/ML models with hundreds of billions or even trillions of parameters, at substantially lower cost compared to GPUs.

However, the enterprise AI market is still in its infancy, and Graphcore faces stiff competition in this space from larger, more established rivals.

So while development on ultra-sparse, cut-rate language models for AI are unlikely to abate anytime soon, it remains to be seen whether itll be Graphcores IPUs or someone elses accelerator that ends up powering enterprise AI workloads.

Original post:

For the average AI shop, sparse models and cheap memory will win - The Register

Related Posts