The Milan Epyc 7003 processors, the third generation of AMDs revitalized server CPUs, is now in the field, and we await the entry of the Ice Lake Xeon SPs from Intel for the next jousting match in the datacenter to begin.
The stakes are high for both companies, who are vying for what seems to be a reasonably elastic demand for compute capacity in aggregate around the world, even if there are eddies where demand slows and chutes where it accelerates. The room enough for Intel and AMD in the market, but it is the technical and economic jousting between these two that is going to make this fun and help spur future competition in the years to come.
We did our announcement day first pass on the Milan SKU stack, with the salient feeds and speeds, and slots and watts, the 19 new processors, and we also covered the actual launch event for the Milan chips by chief executive officer Lisa Su and her launch crew, which included Forrest Norrod, general manager of AMDs Datacenter and Embedded Solutions Group, and Mark Papermaster, the companys chief technical officer, and Dan McNamara, general manager of AMDs server business.
Now it is time to get into the weeds for a little bit and talk about the Milan architecture and how this processors Zen 3 cores are delivering 19 percent higher instructions per clock than the Rome processors Zen 2 cores from August 2019. It is hard to squeeze more and more performance out of a core while maintaining compatibility, but Intel, AMD, IBM, and Arm Holdings are clever engineering companies and they often go back to the drawing board and rethink how the elements of a core are organized and pipelined. They always seem to find new ways to do things better, and it really is a testament to human engineering that this is true.
Someday, we presume, AI will be used to create blocks of logic and data from transistors and place them in a 2D or 3D chip layout and do a better job than people and their EDA tools; we talked about Googles research in this area last year, in fact, and IP block placement in an EDA tool makes the games of Chess and Go look like a joke. So far, people are a necessary part of the process of designing a processor, so today is not that day. Ironically, better compute engines will hasten that day, and perhaps chip designers should not be so eager for such big improvements. . . . But, if history shows anything, you cant stop progress because people just plain have faith in it. For better or worse. Often both. And we here at The Next Platform are no different in this regard, so dont think we are taking some high brow view. Consider us raised eyebrows with the occasional furrowed brows. We admire what engineers do; we worry about what people do with what they create sometimes.
Mike Clark, an AMD Fellow who cut his teeth on the single-core K5 processors, the first in-house designed AMD X86 chip from back in March 1996, and the lead architect on the Zen 3 cores, walked us through the nitty gritty detail of the Zen 3 core that is at the heart of the Milan system on chip complex. Lets dive in.
Right off the bat, this is a whole new, ground up redesign of the core similar to what Intel is doing with Ice Lake Xeon SPs and their Sunny Cove cores and what IBM will be doing with the Power10 processor and its brand new core later this year. And the reason is simple: Everyone needs to push the IPC as hard as possible to boost single threaded performance, and then make tradeoffs in the SKUs between high clock speeds across a small number of cores and lower clock speeds against a larger number of cores to hit performance targets that are better on each class of workloads and those in between than their respective Rome Epyc 7002, Cascade Lake Xeon SP, and Power9 predecessors. You cant just have more cores with a new generation, and you need to show better thermal efficiency at different performance points, too.
So how did AMD get that 19 percent better IPC with the Zen 3 cores used in the Milan server chips? By doing a whole lot of things all at the same time, as you will see. And when you contrast this with the lack of IPC improvements as the Sunny Cove cores are coming years late to market because of Intels delays with its 10 nanometer processes that these Sunny Cove cores and their Ice Lake processors were tied to, it really shows:
Here are some of the top-level performance improvements, says Clark. We improved branch prediction, and not just accuracy, but actually being able to get the correct target address out sooner and the correct target instructions out sooner and feeding them to the machine so we get more throughput, more performance. We have beefed up the width of integer throughput. We have doubled the intake floating point for inference, as we see those workloads evolving going forward and we are reacting to that. And by pulling the eight Zen 3 cores under the larger 32 MB L3 complex, we have better communication paths and we have more cache available for in lighter-threaded workloads and therefore we can reduce the effective latency to memory and provide more performance.
The Zen 3 core still has two-way simultaneous multithreading, as we pointed out in our initial coverage, and AMD has resisted the temptation to add more threads to goose performance as IBM does with its Power architecture, which can dynamically switch from 2, 4, or 8 threads per core. (Sometimes, the threading in the Power9 and Power10 chips is set in firmware and the cores are fat or skinny, depending.)
If you look on the right in the chart above at the Zen 3 block diagram, you can see there are two ways into the machine. The 32 KB instruction cache is still driven by a decoder that can drive four instructions per clock cycle into the op queue. And the way into the chip is through the branch predictor on the far right that can put instructions into the op cache and deliver eight macro ops per cycle. The dispatcher decouples the two sides of the Zen 3 pipeline integer and floating point and can do six macro ops per cycle to either unit.
That front end to the integer and floating point units in the Zen 3 core has a lot of tweaks, starting with a n L1 cache branch target buffer that is twice the size of the one in the Zen 2 core, at 1,024 entries.
Clark says that the branch predictor on the front end has more bandwidth, which means it can pull more branches out per clock cycle. The Zen 3 core also features what Clark calls a no bubble branch prediction mechanism, which he explains thus:
When you pull out a target address from the branch predictor, you then need to obviously put that back into the branch predictor to get the next address. Typically, that turnaround time creates a bubble. We have a unique mechanism where we can eliminate that bubble cycle and therefore be able to continuously pull out branch targets every cycle. We do still get some branches wrong in the execution units, but getting those addresses back and getting the target instructions of the machine we improve the latency of that from Zen 2.
There are also some efficiency improvements in the op caches faster sequencing of fetches and finer-grained switching of op cache pipelines that help that Zen 3 front end drive that 19 percent IPC improvement (which is an average across a bunch of different workloads that have been used to gauge IPC on cores in the Opteron days that is now used on Zen cores in the Epyc era).
So thats the front end the air intake manifold and fuel lines in a car engine analogy, we supposed. What about the integer and floating point cylinders? Here is a zoom into the execution engine in the Zen 3 core:
With the Zen 3 core, there is a much wider integer unit now, with four ALUs and dedicated branch and storage units, as you can see from the chart above comparing and contrasting with the Zen 2 block diagram in the chart further up in this story.
Here is a drill down into the integer execution unit, which Clark says has a design goal of having larger structures to extract more instruction level parallelism (ILP) from applications to feed this part of the execution engine; its units, in general, have lower latency, too. The combined effect is more integer IPC.
As you can see, everything increases by a little bit or a lot, bringing a different set of throughout and balance to the combined set of units.
(We wonder if the engineers play a kind of video game, tweaking this or that in a simulator so see the effects, or if the EDA tools do this work, as well. We suspect the former and that like much design there is a knack for it and as an architecture hardens a bit, you throw it out and start over. This Zen 3 core does not look that different from a Zen 2 core to our eyes certainly not like the jump from Sledgehammer to Bulldozer to Piledriver to Steamroller cores.)
With Zen 3, there are four integer scheduler units instead of seven with Zen 2 (why seven, which so not base 2 and therefore violates our sensibilities?), and the same eight ports come out of the integer register file as with the Zen 2 integer unit. Rather than the schedulers being paired to an arithmetic logic unit (ALU) or an address generation unit (AGU), they are shared, allowing for balanced use across workloads. There are still four ALUs on the integer block, as you can see, but one of them has its own branch unit embedded in it and another one has a store unit embedded in it. Similarly, there are still three AGUs, but one has a store unit embedded in it. And, there is a branch unit pulled out separately.
Its still the same number of ALUs, but they are much more available and have much higher utilization, says Clark. With queue combinations, with shared ALU/AGU schedulers, which we can pick from independently, the pickers can get a better view of more operations to therefore find more instruction level parallelism in the workloads. And by offloading those extra store data and branches, those things dont really return things back to the register file so you dont really have to take the cost of having more write ports into the register file just more read ports.
With the Zen 3 core, the floating point unit is also wider, with six pipelines to be able to accept the input from that six-wide dispatch unit.
The floating point multiply/accumulate and add units have store units pulled out separately now, too. The reorder buffer has been increased in size so the Zen 3 core has a larger window to get more floating point instructions in flight. And, as with the integer units, the floating point to integer conversion units and store units are separated out from the add and multiple/accumulate units so they dont collide or cause backups while still preserving the number of add and multiple/accumulate units compared to Zen 2. The floating point register file is 256 bits wide (same as with the Zen 2, which had a pair of 128-bit registers), and importantly for AI inference workloads, the INT8 bandwidth is twice that of the Zen 2 core, with two IMACs and two ALU pipes. The Zen 3 core can do two 256-but multiply accumulate operations per cycle.
If you are going to chew on more data and instructions, you have to be able to load and store more data and instructions, so the load/store units in the Zen 3 core have also been beefed up:
The Zen 3 core can do three loads per cycle or two stores per cycle, compared to two loads and one store per cycle (tied together, that is an and statement, not an or statement for the Zen 2 core). The load/store units have higher bandwidth per clock and, like the integer and floating point units, have greater flexibility in what they can do at any given time. Which drives up the ILP to get to that higher IPC.
Importantly, the Zen 3 core has six translation lookaside buffer (TLB) walkers, which walk that memory cache, which stores virtual memory addresses for physical memory in the DDR4 DRAM attached to each processor. This increased TLB capability, says Clarke, helps deal with server workloads that have a lot of random accesses to main memory or that have applications that have large memory footprints that span multiple pages of main memory.
And finally, the Zen 3 core has a bunch of instructions that are added, as follows:
Next up, we will be taking a look at the competitive landscape as AMD sees it for the Milan Epyc 7003 processors.
Here is the original post:
Deep Dive Into AMDs Milan Epyc 7003 Architecture - The Next Platform
- The Silicon Gambit: How AI is Reshaping the World's Oldest Game - Chess.com - April 24th, 2024 [April 24th, 2024]
- Gukesh wins Candidates: The boy raised without chess engines wholl challenge Ding Liren at World Championships - The Indian Express - April 24th, 2024 [April 24th, 2024]
- Stars of the future shine in chess's ancestral homeland - Washington Times - September 19th, 2023 [September 19th, 2023]
- The 15 Best Episodes of Cowboy Bebop - MovieWeb - September 19th, 2023 [September 19th, 2023]
- Charge of the knight brigade: Indian teens storm global chess - IndiaTimes - August 20th, 2023 [August 20th, 2023]
- Knowing when to insist - ChessBase - August 20th, 2023 [August 20th, 2023]
- World Cup: Pragg and Salimova win tiebreakers - ChessBase - August 20th, 2023 [August 20th, 2023]
- What do F-16 and MiG-29 fighter jets do? - Times of Oman - August 20th, 2023 [August 20th, 2023]
- Xbox game releases August 21 to 27 - TrueAchievements - August 20th, 2023 [August 20th, 2023]
- Go! Guide Aug. 17 - The Republic - August 20th, 2023 [August 20th, 2023]
- MinStrength: An Alternative to Performance Rating - ChessBase - June 2nd, 2023 [June 2nd, 2023]
- Mittens (chess engine) - Wikipedia - January 31st, 2023 [January 31st, 2023]
- AlphaZero - Chess Engines - Chess.com - December 28th, 2022 [December 28th, 2022]
- 2022 U.S. Chess Championships, Round 3: Earning Respect! | US Chess.org - uschess.org - October 13th, 2022 [October 13th, 2022]
- Go! Guide Oct. 13 - The Republic - October 13th, 2022 [October 13th, 2022]
- Events, sales and more things happening Downriver The News Herald - Southgate News Herald - October 13th, 2022 [October 13th, 2022]
- Chess cheating drama: What are the different ways to cheat in chess? - The Indian Express - September 11th, 2022 [September 11th, 2022]
- Formula 1 2022: How to Watch the Italian Grand Prix Today - CNET - September 11th, 2022 [September 11th, 2022]
- The Machines That Made 500 Years of Circumnavigation Possible - Popular Mechanics - September 11th, 2022 [September 11th, 2022]
- Formula 1 2022: How to Watch the Belgian Grand Prix Today - CNET - August 29th, 2022 [August 29th, 2022]
- Kids want to grow, learn; are we planting seeds of knowledge? - Las Cruces Sun-News - August 29th, 2022 [August 29th, 2022]
- New: 3.h4 against the Kings Indian and Grnfeld - ChessBase India - August 25th, 2022 [August 25th, 2022]
- A bright chess champ emerges from Thiruvallur - The New Indian Express - August 25th, 2022 [August 25th, 2022]
- Interviewing The Coach Of Olympiad Sensation Gukesh - Chess.com - August 25th, 2022 [August 25th, 2022]
- Virtual Psychiatry is Here to Stay - Psychiatric Times - August 25th, 2022 [August 25th, 2022]
- Whatever Happened to the Transhumanists? - Gizmodo - August 2nd, 2022 [August 2nd, 2022]
- Beyond Carlsen: the devaluation of the World Chess Championship - TheArticle - July 31st, 2022 [July 31st, 2022]
- Go! Guide July 21 - The Republic - July 27th, 2022 [July 27th, 2022]
- Chennai Chess Olympiad and AI - Analytics India Magazine - June 24th, 2022 [June 24th, 2022]
- Go! Guide July 23 - The Republic - June 24th, 2022 [June 24th, 2022]
- Was Basman right? Iconoclasm, ridicule and chess - TheArticle - June 20th, 2022 [June 20th, 2022]
- Formula 1 Canadian Grand Prix Is Today: How to Watch the Race Live - CNET - June 20th, 2022 [June 20th, 2022]
- Sentience is the wrong discussion to have on AI right now - TechTalks - June 20th, 2022 [June 20th, 2022]
- Headlines at 10:30 am on 20th June 2022 - The Indian Express - June 20th, 2022 [June 20th, 2022]
- 5 Chess Brilliancies That Stockfish Hates - Chess.com - June 11th, 2022 [June 11th, 2022]
- Carlsen Wins, Leads, Hits A 2870 Live Rating - Chess.com - June 11th, 2022 [June 11th, 2022]
- 21 things to do with kids in San Diego County in June - The San Diego Union-Tribune - June 11th, 2022 [June 11th, 2022]
- Is This Cooling Technology Company Ready To Heat Up? - Benzinga - Benzinga - June 3rd, 2022 [June 3rd, 2022]
- Calendar of events and activities throughout Downriver - Southgate News Herald - June 3rd, 2022 [June 3rd, 2022]
- Tilting Point partners with Polygon on Web3 games - VentureBeat - May 11th, 2022 [May 11th, 2022]
- Online booking agents have been behaving like kings - it's time to topple them - City A.M. - April 17th, 2022 [April 17th, 2022]
- Chess Games - Play Chess Games on CrazyGames - March 29th, 2022 [March 29th, 2022]
- A tale of two universities and two engines - Chess News - March 26th, 2022 [March 26th, 2022]
- Charity Cup: Anton wins three in a row to reach knockout - Chess News - March 26th, 2022 [March 26th, 2022]
- Formula 1: How to Watch the Bahrain Grand Prix and F1 Racing in 2022 - CNET - March 26th, 2022 [March 26th, 2022]
- Praggnanandhaa, 16, becomes only third Indian to beat Magnus Carlsen in stunning upset - ESPN - February 21st, 2022 [February 21st, 2022]
- Is Artificial Intelligence as Intelligent as We Think it is? - Analytics Insight - February 17th, 2022 [February 17th, 2022]
- Didnt Become a Hostage- Former World Chess Champion Calls Magnus Carlsen the Bridge Between Traditional and Modern Chess - EssentiallySports - February 17th, 2022 [February 17th, 2022]
- Can the academy rein in Big Tech? - Times Higher Education - February 17th, 2022 [February 17th, 2022]
- FIDE World Women's Team Championship Final: Russia Wins Gold In Victory Over India - Chess.com - February 17th, 2022 [February 17th, 2022]
- Researchers warn that social media may be fundamentally at odds with science - TechCrunch - February 15th, 2022 [February 15th, 2022]
- Battle of the Sexes: Men triumph! - Chessbase News - February 9th, 2022 [February 9th, 2022]
- Battle of the Sexes: Men increase lead - Chessbase News - February 5th, 2022 [February 5th, 2022]
- Chairman of the board | Boris Starling - The Critic - February 5th, 2022 [February 5th, 2022]
- Using AI in Recruiting - Onrec - February 5th, 2022 [February 5th, 2022]
- Arena Download - Complete GUI for chess engines that will ... - January 24th, 2022 [January 24th, 2022]
- A hundred years of exactitude: Jos Ral Capablanca - TheArticle - January 24th, 2022 [January 24th, 2022]
- Intel Core i5-12400 vs AMD Ryzen 5 5600X Face-Off: The Gaming Value Showdown - Tom's Hardware - January 24th, 2022 [January 24th, 2022]
- software - Why dont chess engines use Node.js? - Chess ... - December 29th, 2021 [December 29th, 2021]
- Stockfish - Chess Engines - Chess.com - December 27th, 2021 [December 27th, 2021]
- Top 10 Strongest Chess Engines In 2021 - Hercules Chess - December 23rd, 2021 [December 23rd, 2021]
- The 10 Greatest Blitz Chess Games Of All Time - Chess.com - December 23rd, 2021 [December 23rd, 2021]
- Ninja, the worlds top streamer, on how video games can make you smarter about money and investing - MarketWatch - December 17th, 2021 [December 17th, 2021]
- 8 Reasons To Play In The 2022 Daily Chess Championship - Chess.com - December 15th, 2021 [December 15th, 2021]
- World Chess Championship - the Arena - Chessbase News - December 7th, 2021 [December 7th, 2021]
- The World Chess Championship Opens With An Endless Knight-Rook Dance - FiveThirtyEight - November 27th, 2021 [November 27th, 2021]
- Play chess: online and computer chess on real boards in the test - Market Research Telecast - November 27th, 2021 [November 27th, 2021]
- The 5 Best Computer Chess Engines - Chess.com - November 15th, 2021 [November 15th, 2021]
- 10 Strongest Free Chess Engines [all above 3000 ELO] at ... - November 15th, 2021 [November 15th, 2021]
- Chessprogramming wiki - November 3rd, 2021 [November 3rd, 2021]
- Stockfish can crush you at chess even more efficiently in the 14.1 update - Neowin - November 3rd, 2021 [November 3rd, 2021]
- Grand Swiss: Shirov and Najer join Firouzja in the lead - Chessbase News - November 3rd, 2021 [November 3rd, 2021]
- Deadmau5's 'Oberhasli' is what it looks like when the metaverse comes for music fans - Mashable South East Asia - October 26th, 2021 [October 26th, 2021]
- Deadmau5's 'Oberhasli' is what it looks like when the metaverse comes for music fans - Mashable - October 24th, 2021 [October 24th, 2021]
- Deep Blue - Chess.com - October 17th, 2021 [October 17th, 2021]
- AlphaZero Crushes Stockfish In New 1,000-Game Match - Chess.com - October 17th, 2021 [October 17th, 2021]
- Free UCI-Compatible Chess Programs for the Stockfish Engine - HobbyLark - October 17th, 2021 [October 17th, 2021]
- CORRECTING and REPLACING RazerCon Is Back for Round II: Tune in for a Keynote By CEO Min-Liang Tan Filled With Exclusive New Announcements and Guest... - September 29th, 2021 [September 29th, 2021]
- Going back in time in La vie sans applis - The Concordian - September 8th, 2021 [September 8th, 2021]
- The Road to 2030 in the Age of Intelligence - Huawei - September 8th, 2021 [September 8th, 2021]