{"id":1122961,"date":"2024-03-14T00:11:07","date_gmt":"2024-03-14T04:11:07","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/meta-hooks-up-with-hammerspace-for-advanced-ai-infrastructure-project-blocks-and-files-blocks-files\/"},"modified":"2024-03-14T00:11:07","modified_gmt":"2024-03-14T04:11:07","slug":"meta-hooks-up-with-hammerspace-for-advanced-ai-infrastructure-project-blocks-and-files-blocks-files","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/artificial-general-intelligence\/meta-hooks-up-with-hammerspace-for-advanced-ai-infrastructure-project-blocks-and-files-blocks-files\/","title":{"rendered":"Meta hooks up with Hammerspace for advanced AI infrastructure project  Blocks and Files &#8211; Blocks &amp; Files"},"content":{"rendered":"<p><p>    Meta has confirmed Hammerspace is its data orchestration    software supplier, supporting 49,152 Nvidia H100 GPUs split    into two equal clusters.  <\/p>\n<p>    The parent of Facebook, Instgram and other social media    platforms, says its long-term vision is to create artificial    general intelligence (AGI) that is open and built responsibly    so that it can be widely available for everyone to benefit    from. The     blog authors say: Marking a major investment in Metas AI    future, we are announcing two 24k GPU clusters. We are sharing    details on the hardware, network, storage, design, performance,    and software that help us extract high throughput and    reliability for various AI workloads.  <\/p>\n<p>    Hammerspace has been saying for some weeks that it has a        huge hyperscaler AI customer, which we suspected to be    Meta, and now Meta has described the role of Hammerspace in two        Llama 3 AI training systems.  <\/p>\n<p>    Metas bloggers say: These clusters support our current and    next generation AI models, including Llama 3, the successor to        Llama 2, our publicly released LLM, as well as AI research    and development across GenAI and other areas.  <\/p>\n<p>    A precursor     AI Research SuperCluster, with 16,000 Nvidia A100 GPUs, was    used to build Metas gen 1 AI models and continues to play an    important role in the development of Llama and Llama 2, as well as    advanced AI models for applications ranging from computer    vision, NLP, and speech recognition, to     image generation, and even     coding. That cluster uses Pure Storage FlashArray and    FlashBladeall-flash arrays.  <\/p>\n<p>    Metas two newer and larger clusters are diagrammed in the    blog:  <\/p>\n<p>    They support models larger and more complex than that could be    supported in the RSC and pave the way for advancements in GenAI    product development and AI research. The scale here is    overwhelming as they help handle hundreds of trillions of AI    model executions per day.  <\/p>\n<p>    The two clusters each start with 24,576 Nvidia H100 GPUs. One    has an RDMA over RoCE 400 Gbps Ethernet network system, using        Arista 7800 switches with     Wedge400 and     Minipack2 OCP rack switches, while the other has an Nvidia    Quantum2 400Gbps InfiniBand setup.  <\/p>\n<p>    Metas Grand Teton OCP hardware    chassis houses the GPUs, which rely on Metas Tectonic    distributed, flash-optimized and exabyte scale storage    system.  <\/p>\n<p>    This is accessed though a Meta-developed Linux Filesystem in    Userspace (FUSE) API and used for AI model data needs and model    checkpointing. The blog says: This solution enables thousands    of GPUs to save and load checkpoints in a synchronized fashion    (a     challenge for any storage solution) while also providing a    flexible and high-throughput exabyte scale storage required for    data loading.  <\/p>\n<p>    Meta has partnered with Hammerspace to co-develop and land a    parallel network file system (NFS) deployment to meet the    developer experience requirements for this AI cluster     Hammerspace enables engineers to perform interactive debugging    for jobs using thousands of GPUs as code changes are    immediately accessible to all nodes within the environment.    When paired together, the combination of our Tectonic    distributed storage solution and Hammerspace enable fast    iteration velocity without compromising on    scale.  <\/p>\n<p>    The Hammerspace diagramabove provides its view of the    co-developed AI cluster storage system.  <\/p>\n<p>    Both the Tectonic and Hammerspace-backed storage deployments    use Metas     YV3 Sierra Point server fitted with the highest-capacity    E1.S format SSDs available. These are OCP servers customized    to achieve the right balance of throughput capacity per server,    rack count reduction, and associated power efficiency as well    as fault tolerance.  <\/p>\n<p>    Meta is not stopping here. The blog authors say: This    announcement is one step in our ambitious infrastructure    roadmap. By the end of 2024, were aiming to continue to grow    our infrastructure build-out that will include 350,000 NVIDIA    H100 GPUs as part of a portfolio that will feature compute    power equivalent to nearly 600,000 H100s.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Go here to see the original: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/blocksandfiles.com\/2024\/03\/13\/meta-hammerspace-ai\" title=\"Meta hooks up with Hammerspace for advanced AI infrastructure project  Blocks and Files - Blocks &amp; Files\">Meta hooks up with Hammerspace for advanced AI infrastructure project  Blocks and Files - Blocks &amp; Files<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Meta has confirmed Hammerspace is its data orchestration software supplier, supporting 49,152 Nvidia H100 GPUs split into two equal clusters. The parent of Facebook, Instgram and other social media platforms, says its long-term vision is to create artificial general intelligence (AGI) that is open and built responsibly so that it can be widely available for everyone to benefit from <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/artificial-general-intelligence\/meta-hooks-up-with-hammerspace-for-advanced-ai-infrastructure-project-blocks-and-files-blocks-files\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1214666],"tags":[],"class_list":["post-1122961","post","type-post","status-publish","format-standard","hentry","category-artificial-general-intelligence"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1122961"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1122961"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1122961\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1122961"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1122961"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1122961"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}