{"id":1027265,"date":"2023-08-04T10:42:55","date_gmt":"2023-08-04T14:42:55","guid":{"rendered":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/is-running-ai-on-cpus-making-a-comeback-techhq-2.php"},"modified":"2023-08-04T10:42:55","modified_gmt":"2023-08-04T14:42:55","slug":"is-running-ai-on-cpus-making-a-comeback-techhq-2","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/neural-networks\/is-running-ai-on-cpus-making-a-comeback-techhq-2.php","title":{"rendered":"Is running AI on CPUs making a comeback? &#8211; TechHQ"},"content":{"rendered":"<p><p>    If somebody told you that a refurbished laptop could eclipse    the performance of an NVIDIA A100 GPU when training a 200    million-parameter neural network, youd want to know the    secret. Running AI routines on CPUs is supposed to be slow,    which is why GPUs are in high demand, and NVIDIA shareholders    are celebrating. But maybe its not that simple.  <\/p>\n<p>    Part of the issue is that the development and availability of    GPUs, which can massively parallelize matrix multiplications,    has made it possible to brute force progress in AI. Bigger is    better when it comes to both the amount of data used to train    neural networks and the size of the models, reflected in the    number of parameters.  <\/p>\n<p>    Considering state-of-the-art large language models (LLMs) such    as OpenAIs GPT-4, the number of parameters is now measured in    the billions. And training what is, in effect, a vast,    multi-layered equation  by first specifying model weights at    random and then refining those parameters through    backpropagation and gradient descent  is now firmly GPU    territory.  <\/p>\n<p>    Nobody runs high-performance AI routines on CPUs, or at least    thats the majority view. The growth in model size, driven by    the gains in accuracy, has led users to overwhelmingly favor    much faster GPUs to carry out billions of calculations back and    forth.  <\/p>\n<p>    But the scale of the latest generative AI models is putting    this brute force GPU approach to the test. And many developers    no longer have the time, money, or computing resources to    compete  fine-tuning billions of artificial neurons that    comprise the many-layered networks.  <\/p>\n<p>    Experts in the field are asking if theres another, more    efficient way of training neural networks to perform tasks such    as image recognition, product recommendation, and natural    language processing (NLP) search.  <\/p>\n<p>    Artificial neural networks are compared to the workings of the    human brain. But the comparison is a loose one as the human    brain operates using the power of a dim light bulb, whereas    state-of-the-art AI models require vast amounts of power, have    worryingly large carbon footprints, and     require large amounts of cooling.  <\/p>\n<p>    That being said, the human brain consumes a considerable amount    of energy compared with other organs in the body. But its    orders of magnitude GPU-beating capabilities stem from the fact    that the brains chemistry only recruits the neurons that it    needs  rather than having to perform calculations in bulk.  <\/p>\n<p>    AI developers are trying to mimic those brain-like efficiencies    in computing hardware by engineering architectures known as    spiking neural networks. Neurons behave more like accumulators    and fire only when repeatedly prompted. But its a work in    progress.  <\/p>\n<p>    However, its long been known that training AI algorithms could    be made much more efficient. Matrix multiplications assume    dense computations, but researchers have shown a decade ago    that just picking the top ten percent of neuron activations    will still produce high-quality results.  <\/p>\n<p>    The issue is that to identify the top ten percent you would    still have to run all of those sums in bulk, which would remain    wasteful. But what if you could look up a list of those most    active neurons based on a given input?  <\/p>\n<p>    And its the answer to this question that opens up the path to    running AI on CPUs, which is potentially game-changing  as the    observation that a refurbished laptop can eclipse the    performance of an NVIDIA A100 GPU hints at.  <\/p>\n<p>        So what is this magic? At the heart of the approach is the use    of hash tables, which famously run in constant time (or    thereabouts). In other words, searching for an entry in a hash    table is independent of the number of locations. And Google    puts this principle to work on its web search.  <\/p>\n<p>    For example, if you type Best restaurants in London into    Google Chrome, that query  thanks to hashing, which turns the    input into a unique fingerprint  provides the index to a list    of topical websites that Google has filed away at that    location. And its why, despite having billions of websites    stored in its vast index, Google can deliver search results to    users in a matter of milliseconds.  <\/p>\n<p>    And, just as your search query  in effect  provides a lookup    address for Google, a similar approach can be used to identify    which artificial neurons are most strongly associated with a    piece of training data, such as a picture of a cat.  <\/p>\n<p>    In neural networks, hash tables can be used to tell the    algorithm which activations need to be calculated, dramatically    reducing the computational burden to a fraction of brute force    methods, which makes it possible to run AI on CPUs.  <\/p>\n<p>    In fact, the class of hash functions that turn out to be most    useful are dubbed locally sensitive hash (LSH) functions.    Regular hash functions are great for fast memory addressing and    duplicate detection, whereas locally sensitive hash functions    provide near-duplicate detection.  <\/p>\n<p>    LSH functions can be used to hash data points that are near to    each other  in other words, similar  into the same buckets    with high probability. And this, in terms of deep learning,    dramatically improves the sampling performance during model    training.  <\/p>\n<p>    Hash functions can also be used to improve the user experience    once models have been trained. And computer scientists based in    the US at Rice University, Texas, Stanford University,    California, and from the Pocket LLM pioneer ThirdAI, have    proposed a method dubbed     HALOS: Hashing Large Output Space for Cheap Inference,    which speeds up the process without compromising model    performance.  <\/p>\n<p>    As the team explains, HALOS reduces inference into sub-linear    computation by selectively activating only a small set of    likely-to-be-relevant output layer neurons. Given a query    vector, the computation can be focused on a tiny subset of the    large database, write the authors in their conference paper.    Our extensive evaluations show that HALOS matches or even    outperforms the accuracy of given models with 21 speed up and    87% energy reduction.  <\/p>\n<p>    Commercially, this approach is helping merchants such as    Wayfair  an online retailer that enables customers to find    millions of products for their homes. Over the years, the firm    has worked hard to improve its recommendation engine, noting a    study by Amazon that even a 100-millisecond delay in serving    results can put a noticeable dent in sales.  <\/p>\n<p>    And, sticking briefly with online shopping habits,     more recent findings published by Akamai report that over half    of mobile website visitors will leave a page that takes more    than three seconds to load  food for thought as half of    consumers are said to browse for products and services on their    smartphones.  <\/p>\n<p>    All of this puts pressure on claims that clever use of hash    functions can enable AI to run on CPUs. But the approach more    than lived up to expectations, as     Wayfair has confirmed in a blog post. We were able to    train our version three classifier model on commodity CPUs,    while at the same time achieve a markedly lower latency rate,    commented Weiyi Sun  Associate Director of Machine Learning at    the company.  <\/p>\n<p>    Plus, as the computer scientists described in their study, the    use of hash-based processing algorithms accelerated inference    too.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Here is the original post: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/techhq.com\/2023\/08\/is-running-ai-on-cpus-making-a-comeback\" title=\"Is running AI on CPUs making a comeback? - TechHQ\">Is running AI on CPUs making a comeback? - TechHQ<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> If somebody told you that a refurbished laptop could eclipse the performance of an NVIDIA A100 GPU when training a 200 million-parameter neural network, youd want to know the secret.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/neural-networks\/is-running-ai-on-cpus-making-a-comeback-techhq-2.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1238175],"tags":[],"class_list":["post-1027265","post","type-post","status-publish","format-standard","hentry","category-neural-networks"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1027265"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1027265"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1027265\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1027265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1027265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1027265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}