Deep Dive Into AMDs Milan Epyc 7003 Architecture – The Next Platform

The Milan Epyc 7003 processors, the third generation of AMDs revitalized server CPUs, is now in the field, and we await the entry of the Ice Lake Xeon SPs from Intel for the next jousting match in the datacenter to begin.

The stakes are high for both companies, who are vying for what seems to be a reasonably elastic demand for compute capacity in aggregate around the world, even if there are eddies where demand slows and chutes where it accelerates. The room enough for Intel and AMD in the market, but it is the technical and economic jousting between these two that is going to make this fun and help spur future competition in the years to come.

We did our announcement day first pass on the Milan SKU stack, with the salient feeds and speeds, and slots and watts, the 19 new processors, and we also covered the actual launch event for the Milan chips by chief executive officer Lisa Su and her launch crew, which included Forrest Norrod, general manager of AMDs Datacenter and Embedded Solutions Group, and Mark Papermaster, the companys chief technical officer, and Dan McNamara, general manager of AMDs server business.

Now it is time to get into the weeds for a little bit and talk about the Milan architecture and how this processors Zen 3 cores are delivering 19 percent higher instructions per clock than the Rome processors Zen 2 cores from August 2019. It is hard to squeeze more and more performance out of a core while maintaining compatibility, but Intel, AMD, IBM, and Arm Holdings are clever engineering companies and they often go back to the drawing board and rethink how the elements of a core are organized and pipelined. They always seem to find new ways to do things better, and it really is a testament to human engineering that this is true.

Someday, we presume, AI will be used to create blocks of logic and data from transistors and place them in a 2D or 3D chip layout and do a better job than people and their EDA tools; we talked about Googles research in this area last year, in fact, and IP block placement in an EDA tool makes the games of Chess and Go look like a joke. So far, people are a necessary part of the process of designing a processor, so today is not that day. Ironically, better compute engines will hasten that day, and perhaps chip designers should not be so eager for such big improvements. . . . But, if history shows anything, you cant stop progress because people just plain have faith in it. For better or worse. Often both. And we here at The Next Platform are no different in this regard, so dont think we are taking some high brow view. Consider us raised eyebrows with the occasional furrowed brows. We admire what engineers do; we worry about what people do with what they create sometimes.

Mike Clark, an AMD Fellow who cut his teeth on the single-core K5 processors, the first in-house designed AMD X86 chip from back in March 1996, and the lead architect on the Zen 3 cores, walked us through the nitty gritty detail of the Zen 3 core that is at the heart of the Milan system on chip complex. Lets dive in.

Right off the bat, this is a whole new, ground up redesign of the core similar to what Intel is doing with Ice Lake Xeon SPs and their Sunny Cove cores and what IBM will be doing with the Power10 processor and its brand new core later this year. And the reason is simple: Everyone needs to push the IPC as hard as possible to boost single threaded performance, and then make tradeoffs in the SKUs between high clock speeds across a small number of cores and lower clock speeds against a larger number of cores to hit performance targets that are better on each class of workloads and those in between than their respective Rome Epyc 7002, Cascade Lake Xeon SP, and Power9 predecessors. You cant just have more cores with a new generation, and you need to show better thermal efficiency at different performance points, too.

So how did AMD get that 19 percent better IPC with the Zen 3 cores used in the Milan server chips? By doing a whole lot of things all at the same time, as you will see. And when you contrast this with the lack of IPC improvements as the Sunny Cove cores are coming years late to market because of Intels delays with its 10 nanometer processes that these Sunny Cove cores and their Ice Lake processors were tied to, it really shows:

Here are some of the top-level performance improvements, says Clark. We improved branch prediction, and not just accuracy, but actually being able to get the correct target address out sooner and the correct target instructions out sooner and feeding them to the machine so we get more throughput, more performance. We have beefed up the width of integer throughput. We have doubled the intake floating point for inference, as we see those workloads evolving going forward and we are reacting to that. And by pulling the eight Zen 3 cores under the larger 32 MB L3 complex, we have better communication paths and we have more cache available for in lighter-threaded workloads and therefore we can reduce the effective latency to memory and provide more performance.

The Zen 3 core still has two-way simultaneous multithreading, as we pointed out in our initial coverage, and AMD has resisted the temptation to add more threads to goose performance as IBM does with its Power architecture, which can dynamically switch from 2, 4, or 8 threads per core. (Sometimes, the threading in the Power9 and Power10 chips is set in firmware and the cores are fat or skinny, depending.)

If you look on the right in the chart above at the Zen 3 block diagram, you can see there are two ways into the machine. The 32 KB instruction cache is still driven by a decoder that can drive four instructions per clock cycle into the op queue. And the way into the chip is through the branch predictor on the far right that can put instructions into the op cache and deliver eight macro ops per cycle. The dispatcher decouples the two sides of the Zen 3 pipeline integer and floating point and can do six macro ops per cycle to either unit.

That front end to the integer and floating point units in the Zen 3 core has a lot of tweaks, starting with a n L1 cache branch target buffer that is twice the size of the one in the Zen 2 core, at 1,024 entries.

Clark says that the branch predictor on the front end has more bandwidth, which means it can pull more branches out per clock cycle. The Zen 3 core also features what Clark calls a no bubble branch prediction mechanism, which he explains thus:

When you pull out a target address from the branch predictor, you then need to obviously put that back into the branch predictor to get the next address. Typically, that turnaround time creates a bubble. We have a unique mechanism where we can eliminate that bubble cycle and therefore be able to continuously pull out branch targets every cycle. We do still get some branches wrong in the execution units, but getting those addresses back and getting the target instructions of the machine we improve the latency of that from Zen 2.

There are also some efficiency improvements in the op caches faster sequencing of fetches and finer-grained switching of op cache pipelines that help that Zen 3 front end drive that 19 percent IPC improvement (which is an average across a bunch of different workloads that have been used to gauge IPC on cores in the Opteron days that is now used on Zen cores in the Epyc era).

So thats the front end the air intake manifold and fuel lines in a car engine analogy, we supposed. What about the integer and floating point cylinders? Here is a zoom into the execution engine in the Zen 3 core:

With the Zen 3 core, there is a much wider integer unit now, with four ALUs and dedicated branch and storage units, as you can see from the chart above comparing and contrasting with the Zen 2 block diagram in the chart further up in this story.

Here is a drill down into the integer execution unit, which Clark says has a design goal of having larger structures to extract more instruction level parallelism (ILP) from applications to feed this part of the execution engine; its units, in general, have lower latency, too. The combined effect is more integer IPC.

As you can see, everything increases by a little bit or a lot, bringing a different set of throughout and balance to the combined set of units.

(We wonder if the engineers play a kind of video game, tweaking this or that in a simulator so see the effects, or if the EDA tools do this work, as well. We suspect the former and that like much design there is a knack for it and as an architecture hardens a bit, you throw it out and start over. This Zen 3 core does not look that different from a Zen 2 core to our eyes certainly not like the jump from Sledgehammer to Bulldozer to Piledriver to Steamroller cores.)

With Zen 3, there are four integer scheduler units instead of seven with Zen 2 (why seven, which so not base 2 and therefore violates our sensibilities?), and the same eight ports come out of the integer register file as with the Zen 2 integer unit. Rather than the schedulers being paired to an arithmetic logic unit (ALU) or an address generation unit (AGU), they are shared, allowing for balanced use across workloads. There are still four ALUs on the integer block, as you can see, but one of them has its own branch unit embedded in it and another one has a store unit embedded in it. Similarly, there are still three AGUs, but one has a store unit embedded in it. And, there is a branch unit pulled out separately.

Its still the same number of ALUs, but they are much more available and have much higher utilization, says Clark. With queue combinations, with shared ALU/AGU schedulers, which we can pick from independently, the pickers can get a better view of more operations to therefore find more instruction level parallelism in the workloads. And by offloading those extra store data and branches, those things dont really return things back to the register file so you dont really have to take the cost of having more write ports into the register file just more read ports.

With the Zen 3 core, the floating point unit is also wider, with six pipelines to be able to accept the input from that six-wide dispatch unit.

The floating point multiply/accumulate and add units have store units pulled out separately now, too. The reorder buffer has been increased in size so the Zen 3 core has a larger window to get more floating point instructions in flight. And, as with the integer units, the floating point to integer conversion units and store units are separated out from the add and multiple/accumulate units so they dont collide or cause backups while still preserving the number of add and multiple/accumulate units compared to Zen 2. The floating point register file is 256 bits wide (same as with the Zen 2, which had a pair of 128-bit registers), and importantly for AI inference workloads, the INT8 bandwidth is twice that of the Zen 2 core, with two IMACs and two ALU pipes. The Zen 3 core can do two 256-but multiply accumulate operations per cycle.

If you are going to chew on more data and instructions, you have to be able to load and store more data and instructions, so the load/store units in the Zen 3 core have also been beefed up:

The Zen 3 core can do three loads per cycle or two stores per cycle, compared to two loads and one store per cycle (tied together, that is an and statement, not an or statement for the Zen 2 core). The load/store units have higher bandwidth per clock and, like the integer and floating point units, have greater flexibility in what they can do at any given time. Which drives up the ILP to get to that higher IPC.

Importantly, the Zen 3 core has six translation lookaside buffer (TLB) walkers, which walk that memory cache, which stores virtual memory addresses for physical memory in the DDR4 DRAM attached to each processor. This increased TLB capability, says Clarke, helps deal with server workloads that have a lot of random accesses to main memory or that have applications that have large memory footprints that span multiple pages of main memory.

And finally, the Zen 3 core has a bunch of instructions that are added, as follows:

Next up, we will be taking a look at the competitive landscape as AMD sees it for the Milan Epyc 7003 processors.

Here is the original post:

Deep Dive Into AMDs Milan Epyc 7003 Architecture - The Next Platform

Chess - Wikipedia [Last Updated On: May 3rd, 2017] [Originally Added On: May 3rd, 2017]
Chess Engines list @wiki - Computer Chess Wiki [Last Updated On: May 3rd, 2017] [Originally Added On: May 3rd, 2017]
Top Chess Engine Championship - Wikipedia [Last Updated On: May 3rd, 2017] [Originally Added On: May 3rd, 2017]
Complete mastery: Gaylord Perry's durable legacy - Kitsap Sun [Last Updated On: May 8th, 2017] [Originally Added On: May 8th, 2017]
Chess notes - The Boston Globe [Last Updated On: May 8th, 2017] [Originally Added On: May 8th, 2017]
Russia's richest billionaire Alexei Mordashov's incredible 40million Lady M 'super yacht' dwarfs fishing boats as ... - The Sun [Last Updated On: May 11th, 2017] [Originally Added On: May 11th, 2017]
Garry Kasparov's next move: teaming up with machines - Toronto Star [Last Updated On: May 11th, 2017] [Originally Added On: May 11th, 2017]
Final Frontier Friday: 'Q Who' - Science Fiction [Last Updated On: May 13th, 2017] [Originally Added On: May 13th, 2017]
chess set - Hackaday [Last Updated On: May 30th, 2017] [Originally Added On: May 30th, 2017]
Download free chess engines - Komodo 10, Houdini [Last Updated On: May 30th, 2017] [Originally Added On: May 30th, 2017]
New Star Trek VR Game Really Is Like Manning Your Own Starfleet Vessel - Kotaku Australia [Last Updated On: June 1st, 2017] [Originally Added On: June 1st, 2017]
Detonation; Enthusiastic Racing - TruckTrend Network [Last Updated On: June 8th, 2017] [Originally Added On: June 8th, 2017]
Carlsen-Nakamura Norway Clash Ends In Draw - Chess.com [Last Updated On: June 8th, 2017] [Originally Added On: June 8th, 2017]
Rouhani should play chess where Trump is playing the fool - Trend News Agency [Last Updated On: June 8th, 2017] [Originally Added On: June 8th, 2017]
Landry: 5 takeaways from the first week of pre-season - CFL.ca [Last Updated On: June 12th, 2017] [Originally Added On: June 12th, 2017]
Literature, Films on Chess Captivates Enthusiasts - High on Sports (blog) [Last Updated On: June 14th, 2017] [Originally Added On: June 14th, 2017]
Ditmas Park's City Council Candidates Debate Major Issues - BKLYNER [Last Updated On: June 16th, 2017] [Originally Added On: June 16th, 2017]
The Fourth Industrial Revolution Is About Empowering People, Not The Rise Of The Machines - Forbes [Last Updated On: June 16th, 2017] [Originally Added On: June 16th, 2017]
Worry about people, not jobs: Garry Kasparov - Economic Times [Last Updated On: June 17th, 2017] [Originally Added On: June 17th, 2017]
ET Recommendations: Get Google Daydream View for Rs 6499 - Economic Times [Last Updated On: June 18th, 2017] [Originally Added On: June 18th, 2017]
Free Chess Engine recommendation? - Chess Forums - Chess.com [Last Updated On: June 22nd, 2017] [Originally Added On: June 22nd, 2017]
Calendar of events for June 29 and beyond - Ocala [Last Updated On: June 29th, 2017] [Originally Added On: June 29th, 2017]
Ford Daytona Notes and Quotes - 13abc Action News [Last Updated On: July 4th, 2017] [Originally Added On: July 4th, 2017]
How logic games have advanced AI thinking - ComputerWeekly.com [Last Updated On: August 6th, 2017] [Originally Added On: August 6th, 2017]
Carlsen Falters In Winning Position, Loses To MVL - Chess.com [Last Updated On: August 6th, 2017] [Originally Added On: August 6th, 2017]
What Can You Do with Continuous Intelligence? - RTInsights [Last Updated On: October 16th, 2019] [Originally Added On: October 16th, 2019]
Fifty years ago, it was Boris Spassky's turn to shine at the chessboard - Washington Times [Last Updated On: October 16th, 2019] [Originally Added On: October 16th, 2019]
Lennart Ootes: "Chess is a sport and sport is emotion" - Chessbase News [Last Updated On: October 16th, 2019] [Originally Added On: October 16th, 2019]
How To Win With The Halloween Gambit - Chess.com [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
GM Larry Kaufman Interview: 'New Repertoire For Black And White' - Chess.com [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
Geek of the Week: If theres roadwork ahead, Kurt Stiles uses 3D modeling and more to drive project - GeekWire [Last Updated On: November 17th, 2019] [Originally Added On: November 17th, 2019]
Introducing Fritz 17 with Fat Fritz and other goodies - Chessbase News [Last Updated On: November 17th, 2019] [Originally Added On: November 17th, 2019]
Hamburg Grand Prix Final Goes To Tiebreak - Chess.com [Last Updated On: November 17th, 2019] [Originally Added On: November 17th, 2019]
100 Years Ago | 22 November 2019 - The Statesman [Last Updated On: November 23rd, 2019] [Originally Added On: November 23rd, 2019]
Magnus Carlsen takes on the Vishy Anand best games quiz - Chessbase News [Last Updated On: November 23rd, 2019] [Originally Added On: November 23rd, 2019]
Garry Kasparov on chess, tech, Trump and Putin - Chessbase News [Last Updated On: November 23rd, 2019] [Originally Added On: November 23rd, 2019]
Tata Steel 2: Wesley So beats Anand as five lead - chess24 [Last Updated On: January 17th, 2020] [Originally Added On: January 17th, 2020]
Ju vs Goryachkina all tied at the half - Chessbase News [Last Updated On: January 17th, 2020] [Originally Added On: January 17th, 2020]
Xavier Litt: Chess shows that humans and AI work better together - Irish Examiner [Last Updated On: January 17th, 2020] [Originally Added On: January 17th, 2020]
The Clipper Race Leg 5 - Race 6, Day 3: Le Mans Race start and finding the wind - Sail World [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
Google Teaches AI To Play The Game Of Chip Design - The Next Platform [Last Updated On: February 27th, 2020] [Originally Added On: February 27th, 2020]
Top 10 Richest Tech Company CEO's Ranked By Net Worth | TheTalko - TheTalko [Last Updated On: March 13th, 2020] [Originally Added On: March 13th, 2020]
Beating the Philidor - BusinessWorld Online [Last Updated On: March 13th, 2020] [Originally Added On: March 13th, 2020]
Out-preparing the Candidates with Fat Fritz (Part 1) - Chessbase News [Last Updated On: March 24th, 2020] [Originally Added On: March 24th, 2020]
8 Reasons Vanderpump Rules Needs to Be Rebooted - Variety [Last Updated On: April 11th, 2020] [Originally Added On: April 11th, 2020]
Chess greats face off online, webcams, arbiters to watch moves - The Indian Express [Last Updated On: April 24th, 2020] [Originally Added On: April 24th, 2020]
Chess: Breaking the Code - TheArticle [Last Updated On: April 24th, 2020] [Originally Added On: April 24th, 2020]
"Chess makes me happy": An interview with Boris Gelfand - Chessbase News [Last Updated On: April 24th, 2020] [Originally Added On: April 24th, 2020]
With new rules and a new normal, NASCAR set to return this weekend - ESPN [Last Updated On: May 15th, 2020] [Originally Added On: May 15th, 2020]
Who Are The 8 Best U.S. Chess Players Ever? - Chess.com [Last Updated On: July 6th, 2020] [Originally Added On: July 6th, 2020]
Welcome to the Status Quo of the Streaming Wars - The Ringer [Last Updated On: July 25th, 2020] [Originally Added On: July 25th, 2020]
The Cockroach's Carapace (and other opening disasters) - Chessbase News [Last Updated On: July 25th, 2020] [Originally Added On: July 25th, 2020]
These are the best Chess games you can play on Android phone - The Indian Express [Last Updated On: July 25th, 2020] [Originally Added On: July 25th, 2020]
Early Fire Season Puts Weary Northern California Firefighters On Front Lines For Months - CBS San Francisco [Last Updated On: September 15th, 2020] [Originally Added On: September 15th, 2020]
AI Ruined Chess. Now, It's Making the Recreation Lovely Once more - editorials360.com [Last Updated On: September 15th, 2020] [Originally Added On: September 15th, 2020]
The 10 Best Chess Moves Of All Time - Chess.com [Last Updated On: September 15th, 2020] [Originally Added On: September 15th, 2020]
Is Creativity Dying in Sports? - NYU Washington Square News [Last Updated On: September 15th, 2020] [Originally Added On: September 15th, 2020]
Norway Chess: Caruana and Firouzja get off to a good start - Chessbase News [Last Updated On: October 7th, 2020] [Originally Added On: October 7th, 2020]
Chess Online: How to Play and Win Chess | Chess Tips & Strategies - Popular Mechanics [Last Updated On: October 7th, 2020] [Originally Added On: October 7th, 2020]
How to Experience the Best Games of the Star Wars Universe - Fantha Tracks [Last Updated On: October 27th, 2020] [Originally Added On: October 27th, 2020]
Plumbing the Depths of Ethanol Ignorance - The Auto Channel [Last Updated On: October 27th, 2020] [Originally Added On: October 27th, 2020]
Best Free Chess Engines Every Chess Player Should Download ... [Last Updated On: October 27th, 2020] [Originally Added On: October 27th, 2020]
Netflix's 'The Queen's Gambit' is the best sports show on TV right now - Business Insider - Business Insider [Last Updated On: November 6th, 2020] [Originally Added On: November 6th, 2020]
The Queen's Gambit: That ending explained and all your questions answered - CNET [Last Updated On: November 6th, 2020] [Originally Added On: November 6th, 2020]
The joy of hacking - Chessbase News [Last Updated On: November 6th, 2020] [Originally Added On: November 6th, 2020]
Cognitive Abilities Of Humans Peak At The Age Of 35: Chess Study - Analytics India Magazine [Last Updated On: November 6th, 2020] [Originally Added On: November 6th, 2020]
Ed Miliband: 'If the Conservatives want a climate election in the next election, I say bring it on' - PoliticsHome.com [Last Updated On: November 29th, 2020] [Originally Added On: November 29th, 2020]
Online Chess and Working from Home - Chessbase News [Last Updated On: December 4th, 2020] [Originally Added On: December 4th, 2020]
Superfinals: Nepomniachtchi and Karjakin still tied on top - Chessbase News [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Adults, children, cheating, and online chess - Chessbase News [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Technology - AI and yachting - Superyacht News - The Superyacht Report [Last Updated On: December 29th, 2020] [Originally Added On: December 29th, 2020]
DeepMind's MuZero AI masters games without knowing the rules - The Burn-In [Last Updated On: December 29th, 2020] [Originally Added On: December 29th, 2020]
2020: The year of a pandemic of cheating in online chess - Livemint [Last Updated On: December 29th, 2020] [Originally Added On: December 29th, 2020]
How Tech Has Changed Traditional Indian Games - United News of India [Last Updated On: December 29th, 2020] [Originally Added On: December 29th, 2020]
Tata Steel R12: Almost there - Chessbase News [Last Updated On: January 31st, 2021] [Originally Added On: January 31st, 2021]
Komodo - Chess Engines - Chess.com [Last Updated On: January 31st, 2021] [Originally Added On: January 31st, 2021]
Computer Chess Engines: A Quick Guide - Chess.com [Last Updated On: January 31st, 2021] [Originally Added On: January 31st, 2021]
Gravwell 2nd Edition Will Be Coming Out Later This Year - Bleeding Cool News [Last Updated On: February 14th, 2021] [Originally Added On: February 14th, 2021]
Fat Fritz 2: The Best of Both Worlds - Chessbase News [Last Updated On: February 14th, 2021] [Originally Added On: February 14th, 2021]
Fat Fritz 2.0 - The new number 1 - Chessbase News [Last Updated On: February 14th, 2021] [Originally Added On: February 14th, 2021]

Deep Dive Into AMDs Milan Epyc 7003 Architecture – The Next Platform

The Prometheus League

Breaking News and Updates

Prometheism

Forbidden Fruit

The Evolutionary Perspective

Transtopia Menu

Library Updates

Library Books

Future Euvolution

Lucid Dreams from Childhood

Genetic Revolution

Speciation + Self-Directed Evolution