{"id":215758,"date":"2017-04-08T16:46:41","date_gmt":"2017-04-08T20:46:41","guid":{"rendered":"http:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/xeon-e3-a-lesson-in-moores-law-and-dennard-scaling-the-next-platform.php"},"modified":"2017-04-08T16:46:41","modified_gmt":"2017-04-08T20:46:41","slug":"xeon-e3-a-lesson-in-moores-law-and-dennard-scaling-the-next-platform","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/moores-law\/xeon-e3-a-lesson-in-moores-law-and-dennard-scaling-the-next-platform.php","title":{"rendered":"Xeon E3: A Lesson In Moore&#8217;s Law And Dennard Scaling &#8211; The Next Platform"},"content":{"rendered":"<p><p>    April 6, 2017 Timothy    Prickett Morgan  <\/p>\n<p>    If you want an object lesson in the interplay between Moores Law,    Dennard    scaling, and the desire to make money from selling chips,    you need look no further than the past several years of Intels    Xeon E3 server chip product lines.  <\/p>\n<p>    The Xeon E3 chips are illustrative particularly because Intel    has kept the core count constant for these processors, which    are used in a variety of gear, from workstations (remote and    local), entry servers to storage controllers to microservers    employed at hyperscalers and even for certain HPC workloads    (like Intels own massive EDA chip design and validation    farms). In a sense, it is now the Xeon E3, not the workhorse    Xeon E5, that is literally driving Moores Law in    terms of chip design. (Ironic, isnt it?)  <\/p>\n<p>    In the wake of the recent Kaby Lake Xeon E3 v6 server chip    announced,     which we covered here in detail, we decided to take a look    at how the Xeon E3 has evolved over time, complete with details    tables and charts comparing the performance and    price\/performance of the family of single-socket server chips    over their lifetime and specifically compared to the Nehalem    Xeon 5500 processors from March 2009 that represent the    resurgence, both economically and technically, of the Xeon    platform in the datacenter after a few years of AMDs Opterons    gaining considerable share.  <\/p>\n<p>    To get started, lets just line up the feeds and speeds of the    various generations of chips, ranging from the Sandy Bridge    chips from 2012 through the Kaby Lake chips this year.  <\/p>\n<p>    As we have done for past Xeon family comparisons, we have    calculated the aggregate and relative oomph of each processor    by multiplying the clock speeds by the core counts to give a    kind of aggregate peak clocks for each chip. This is called Raw    Clocks in our tables, and you can reckon a cost per gigahertz    of clock speed to get a very rough relative performance metric.    We have also ginned up a more precise relative performance    metric, called Rel Perf, that takes into account the    instructions per clock (IPC) enhancements from each Xeon core    generation, and then scaled this with the clock speed    enhancements and core expansion in the Xeon lines. We created    this Rel Perf metric for the first time when comparing     the Xeon E5 processors from the Nehalem Xeon 5500 processors in    2009 through the Broadwell Xeon E5 v4 processors that    came out this time last year. We reckoned the relative    performance of each processor SKU across all of the families    against the performance of the Nehalem E5540, which was a    four-core processor with eight threads that had a 2.53 GHz    clock speed. The top-bin Broadwell Xeon E5-2699 v4 processor,    which has 22 cores running at 2.2 GHz, for example, has 6.34X    the performance of this baseline Nehalem E5540 processor.    (Intel did not have the distinction between the E3 for    uniprocessor and E5 for dual-socket machines back then.)  <\/p>\n<p>    The relative performance metric presumes that the workload is    not memory capacity or memory bandwidth constrained, of course.    Meaning, it fits in a relatively small memory footprint and is    not bandwidth sensitive. A lot of workloads are like this,    particularly for hyperscalers and HPC shops.  <\/p>\n<p>    Here is the full lineup of the Kaby Lake Xeons just unveiled    last week:  <\/p>\n<\/p>\n<p>    As you can see, the top-bin Kaby Lake chip, which has four    cores running at 3.9 GHz, has only 2.18X the performance of    that baseline Nehalem E5540 processor. About 54 percent of that    118 percent performance increase comes from clock speeds alone,    which have been enabled from the shrink from 45 nanometer to 14    nanometer processes. The rest of that performance bump (and    this is really a gauge of integer performance) is due to    improvements in IPC in the cores and tweaks in the cache    hierarchy. Floating point performance has increased by leaps    and bounds over these years, of course, and so has the    integrated GPU performance, which can be used to do    calculations with OpenCL if you are adventurous.  <\/p>\n<p>    You will note that the L3 cache sized on the Xeon E3s do not    change that much; it is usually 8 MB, sometimes 4 MB or 6 MB in    special cases.  <\/p>\n<p>    With the core count and L3 cache constant, and only IPC and    process changing, there is just not the same room to expand    performance (as measured by throughput) as can be done when you    let Moores Law push the core counts up and the clock speeds    down. What is interesting is that the successive process    shrinks have allowed Intel to boost the bang for the buck on    E3-class chips considerably over the past eight years. The    Nehalem E5540, which definitely could be deployed in a    single-socket machine cost $744, or $744 per unit of relative    performance as we reckon it since it is the touchstone in our    comparisons. As you can see, the top bin  and therefore most    expensive in terms of performance  Kaby Lake E3-1280 v6 part    costs $612, and that yields a $280 per unit of relative    performance rating. That is a factor of 2.65X better bang for    the buck. And for the mainstream Kaby Lake Xeon E3 chips (those    that have HyperThreading activated on their cores), the    price\/performance is averaging around $142 per unit of relative    performance, and that is a factor of 5.4X better    price\/performance compared to that baseline Nehalem Xeon E5540.    The Broadwell Xeon E5s have a bang for the buck that ranges    from a low of $159 to a high of $649 per unit of relative    performance. In other words, those top bin parts in the Xeon E5    have lots more throughput, but they have lower clocks and they    have not shown the same kind of price\/performance improvements.  <\/p>\n<p>    This is how Intel has really benefitted from its    manufacturing process prowess. Intel has, we think, been able    to wring a lot more profit out its Xeon E5 parts and the middle    line (operating profits) of the Data Center Group shows it. Our    point is this: Intel has profits to burn when and if ARM server    chip makers and AMD with its X86 alternatives get aggressive.    How it will make up profits it sacrifices to maintain market    share remains to be seen. But it will take some cajoling to    keep the server makers of the world in line, and this time    around, the hyperscalers do exactly and precisely what the hell    they want to  unlike in 2003 when the Opterons plan was    unveiled and 2005 when they were shipping in volume and kicking    Intels tail.  <\/p>\n<p>    Here is how the Skylake Xeon E3 processors,     including specialized ones that were announced last June for    datacenter and media processing uses and implemented in 14    nanometer processes like Kaby Lake Xeon E3s, line up:  <\/p>\n<\/p>\n<p>    The Skylake Xeon E3s were focused more on low power consumption    than performance for these specialized parts, but the more    standard Skylake Xeon E3s have higher wattages but not really    much higher relative performance.  <\/p>\n<\/p>\n<p>    Things got interesting back in the Broadwell generation, shown    above and also using 14 nanometer wafer etching, when Intel did    regular single-socket Xeon E3 processors and also kicked out    specialized Xeon D system-on-chip designs for Facebook and, we    presume, others. The Xeon D chips were single socket    processors, but had an integrated southbridge on the package    and a lot more cores. They also had much higher price tags, and    Intel charged a hefty premium for low voltage versions that had    higher performance. These are not Xeon E3 processors, per se,    but they are close to a Xeon E3 than they are to a Xeon E5.  <\/p>\n<p>    Here are the Haswell and Ivy Bridge Xeon E3s, implemented    in 22 nanometer processes, stack up:  <\/p>\n<\/p>\n<p>    And finishing up, here are the 32 nanometer Sandy Bridge Xeon    E3 parts and the 45 nanometer Nehalem 5500 parts:  <\/p>\n<\/p>\n<p>    That is a lot of tabular data to chew on, so we made some arts    and charts to get some general trends. The first is a trend    line showing the performance and price\/performance of the top    bin Xeon E3 parts over time compared to the top bin Nehalem    X5570, which had four cores running at 2.93 GHz, and the top    bin Xeon D-1581, which had sixteen cores running at 1.9 GHz. As    with other Xeon processors, the price\/performance curves have    flattened out.  <\/p>\n<\/p>\n<p>    And that curve, dear friends, represents Intels profits. We    think Intel is able to make chips a lot cheaper over this same    period of time, and     the company spent the better part of a whole day last week    bragging about this very fact. In great and glorious    detail. We think Intel always suspected it would eventually get    competition again in datacenter compute, and it has been making    hay  tons and tons of it  while the sun was shining on its    fields alone.  <\/p>\n<p>    This is really smart. Even right up to the moment that it    encourages intense competition that causes a compute price war,    which we think is coming this year. It is better to make the    money between 2010 and 2017 than not make it  that is for    sure.  <\/p>\n<p>    As you can see from the top bin chart, the Xeon D really sticks    out, and if offers about the same bang for the buck as the real    Haswell and Broadwell Xeon E3 parts. Over the past eight years,    performance has gradually trended up, but again, it has only    slightly more than doubled for the real four-core Xeon E3    variants. (FYI: We have picked the Xeon E3 chips without a    graphics processor in the package or on the die wherever    possible to get the rawest X86 compute comparison.)  <\/p>\n<p>    Now, lets step back from the top bin and see how it looks:  <\/p>\n<\/p>\n<p>    As you can see, the bang for the buck for these chips has    fallen lower, but the performance and price\/performance curves    are not that different. And the Xeon D does not stick out so    much like a sore thumb, either. (It is also interesting that    there is not a Skylake or Kaby Lake Xeon D. Hmmmm. . .    . )  <\/p>\n<p>    And at the low end of the Xeon E3 lineup, the performance gains    are more choppy and so is the price\/performance:  <\/p>\n<\/p>\n<p>    In fact, after the Nehalems, Intel kept the price\/performance    dead steady except for the Broadwell E3-1265L and the Skylake    E3-1265L, both of which are specialized low voltage parts that    fit the performance profile we were looking at. You could draw    any number of charts from this data, and you have our full    permission to do so. Have fun.  <\/p>\n<p>    Categories: Compute, Enterprise, HPC, Hyperscale  <\/p>\n<p>    Tags: Intel, Kaby    Lake, Skylake, Xeon,    Xeon D, Xeon    E3  <\/p>\n<p>    Fujitsu Takes On IBM Power9 With    Sparc64-XII From Mainframes to Deep Learning Clusters: IBMs    Speech Journey  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>More: <\/p>\n<p><a target=\"_blank\" href=\"https:\/\/www.nextplatform.com\/2017\/04\/06\/xeon-e3-lesson-moores-law-dennard-scaling\/\" title=\"Xeon E3: A Lesson In Moore's Law And Dennard Scaling - The Next Platform\">Xeon E3: A Lesson In Moore's Law And Dennard Scaling - The Next Platform<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> April 6, 2017 Timothy Prickett Morgan If you want an object lesson in the interplay between Moores Law, Dennard scaling, and the desire to make money from selling chips, you need look no further than the past several years of Intels Xeon E3 server chip product lines. The Xeon E3 chips are illustrative particularly because Intel has kept the core count constant for these processors, which are used in a variety of gear, from workstations (remote and local), entry servers to storage controllers to microservers employed at hyperscalers and even for certain HPC workloads (like Intels own massive EDA chip design and validation farms). In a sense, it is now the Xeon E3, not the workhorse Xeon E5, that is literally driving Moores Law in terms of chip design.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/moores-law\/xeon-e3-a-lesson-in-moores-law-and-dennard-scaling-the-next-platform.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[14],"tags":[],"class_list":["post-215758","post","type-post","status-publish","format-standard","hentry","category-moores-law"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/215758"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=215758"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/215758\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=215758"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=215758"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=215758"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}