{"id":177820,"date":"2017-02-15T21:23:16","date_gmt":"2017-02-16T02:23:16","guid":{"rendered":"http:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/why-googles-spanner-database-wont-do-as-well-as-its-clone-the-next-platform\/"},"modified":"2017-02-15T21:23:16","modified_gmt":"2017-02-16T02:23:16","slug":"why-googles-spanner-database-wont-do-as-well-as-its-clone-the-next-platform","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/cloning\/why-googles-spanner-database-wont-do-as-well-as-its-clone-the-next-platform\/","title":{"rendered":"Why Google&#8217;s Spanner Database Won&#8217;t Do As Well As Its Clone &#8211; The Next Platform"},"content":{"rendered":"<p><p>    February 15, 2017 Timothy    Prickett Morgan  <\/p>\n<p>    Google has proven time and again it is on the extreme bleeding    edge of invention when it comes to scale out architectures that    make supercomputers look like toys. But what would the world    look like if the search engine giant had started selling    capacity on its vast infrastructure back in 2005, before Amazon    Web Services launched, and then shortly thereafter started    selling capacity on its high level platform services? And what    if it had open sourced these technologies, as it has done with    the Kubernetes container controller?  <\/p>\n<p>    The world would be surely different, and the reason it is not    is because there is a lesson to be learned, one that applies    equally well to traditional HPC systems for simulation and    modeling as well as the Web application, analytics, and    transactional systems forged by hyperscalers and cloud    builders. And the lesson is this: Making a tool or system that    was created for a specific task more general purpose and    enterprise-grade  meaning mere mortals, not just Site    Reliability Engineers at hyperscalers can make it work and keep    it working  is very, very hard. And just because something    scales up and out does not mean that it scales down, as it    needs to do be appropriate for enterprises.  <\/p>\n<p>    That, in a nutshell, is why it has taken nearly ten years since    Google first started development of Spanner and five years from    when Google released its paper on this globe-spanning,    distributed SQL database to when it is available as a service    on Googles Cloud Platform public cloud, aimed at more generic    workloads than its own AdWords and Google Play.  <\/p>\n<p>    If this was easy, Google would have long since done it or    someone else cloning Googles ideas would have, and thus    relational databases that provide high availability, horizontal    scalability, and transactional consistency on a vast scale    would be normal. They are not, and that is why the availability    of Spanner on Cloud Platform is a big deal.  <\/p>\n<p>    It would have been bigger news if Google had open sourced    Spanner or some tool derived from Spanner, much as it has done    with the guts of its Borg cluster and container controller        through the Kubernetes project, and that may yet happen as    Cockroach Labs, the New York City startup that is cloning    Spanner much as     Yahoo did with Googles MapReduce to create Hadoop or the    HBase and Cassandra NoSQL databases     that were derived from ideas in Googles BigTable NoSQL    database.  <\/p>\n<p>    To put it bluntly, it would have been more interesting to see    Google endorse CockroachDB and support it on Cloud Platform,    creating an open source community as well as a cloud service    for its Cloud Platform customers. But, as far as we know, it    did not do that. (We will catch up with the Cockroach Labs    folks, who all came from Google, to see what they think about    all this.) And we think that the groundswell of support for    Kubernetes, which Google open sourced and let go, is a great    example of how to build a community with momentum very fast.  <\/p>\n<p>    For all we know, Google will eventually embrace CockroachDB as    a service on Cloud Platform not just for external customers but    for internal Google workloads as well,     much as is starting to happen with Kubernetes jobs running on    Cloud Platform through the Container Engine service among    Googlers.  <\/p>\n<p>    Back in 2007, Google was frustrated by the limitations of its    Megastore NoSQL and BigTable NoSQL databases, which were fine    in that they provided horizontal scalability and reasonably    fast performance, but Google wanted to also have these data    services be more like traditional relational databases and also    have them be geographically distributed for high availability    and for maximum throughput on a set of global applications that    also ran geographically. And so it embarked on a means to take    BigTable, which had been created back in 2004 to house data    stored for Googles eponymous search engines as well as Gmail    and other servers, and allow it to span global distances and    still be usable as a single database for Googles developers,    who could care less about how a database or datastore is    architected and implemented so long as it gets closer and    closer to the SQL-based relational database that is the    foundation of enterprise computing.  <\/p>\n<\/p>\n<p>    And, by the way, a pairing of relational data models and    database schemas with the SQL query language that was invented    by IBM nearly forty years ago and cloned by Oracle, Sybase,    Informix, and anyone else you can think of including Postgres    and MySQL. Moreover, IBM has been running clustered databases    on its mainframes for as long as we can remember  they are    called Parallel Sysplexes  and they can be locally clustered    as well as geographically distributed and run a cluster of DB2    database instances as if there were one giant, logical    database. Just like Spanner. Google databases like Spanner may    dwarf those that can be implemented on IBM mainframes, but    Google was not the first company to come up with this stuff.    Contrary to what Silicon Valley may believe.  <\/p>\n<p>    With any relational database, the big problem when many users    (be they people or applications) is deciding who has access to    the data and who can change that data as they are sharing the    database. There are very sophisticated timestamping and locking    mechanisms for deciding who has the right to change data and    what that data is  these are the so-called     ACID properties of databases. Google luminary Eric Brewer,    who is vice president of infrastructure at Google and who    helped create many of the data services at the search engine    giant, coined the CAP Theorem back in 1998 and the ideas were    developed by the database community in the following years. The    gist of CAP Theorem is that all distributed databases have to    worry about three things  consistency, availability, and    partition tolerance  and no matter what you do, you can only    have no more than two of these properties being fully    implemented at any time in the datastore or database. Brewer        explained this theorem in some detail in a blog post    related to the Cloud Spanner service Google has just launched,    and also explained that the theory is about having 100 percent    of two of these properties, and that in the real world, as with    NoSQL and NewSQL databases, the real issue is how you can get    close enough to 100 percent on all three to have a workable,    usable database that is reliable enough to run enterprise    applications.  <\/p>\n<p>    With Spanner, after a decade of work, Google has been able to    achieve this. (You can read all about Spanner     in the paper that Google release back in October 2012.)    Part of the reason why Google can do this is because it has    developed a sophisticated timestamping scheme for the globally    distributed parts of Spanner that creates a kind of universal    and internally consistent time that is synchronized by Googles    own network and is not dependent on outside services like the    Network Time Protocol (NTP) that is used by servers to keep    relatively in synch. Google needed a finer-grained control of    timestamping with Spanner, so it came up with a scheme based on    atomic clocks and GPS receivers in its datacenters that could    provide a kind of superclock that spanned all of its    datacenters, ordering transactions across the distributed    systems. This feature, called TrueTime by Google, is neat, but    the real thing that makes Googles Spanner work at the speed    and scale that it does is the internal Google network that    lashes those datacenters to the same heartbeat of time as it    passes.  <\/p>\n<p>    Brewer said as much     in a white paper that was published about Spanner and    TrueTime in conjunction with the Cloud Spanner service this    week.  <\/p>\n<p>    Many assume that Spanner somehow gets around CAP via its use    of TrueTime, which is a service that enables the use of    globally synchronized clocks. Although remarkable, TrueTime    does not significantly help achieve CA; its actual value is    covered below. To the extent there is anything special, it is    really Googles wide-area network, plus many years of    operational improvements, that greatly limit partitions in    practice, and thus enable high availability.  <\/p>\n<p>    The CA here refers to Consistency and Availability, and these    are possible because Google has a very high throughput, global    fiber optic network linking its datacenters with at least three    links between the datacenters and the network backbone, called    B1. This means that Spanner partitions that are being written    to and that are trying to replicate data to other Spanner    partitions running around the Google facilities have many paths    to reach each other and eventually get all of the data    synchronized  eventually being a matter of tens of    milliseconds, not tens of nanoseconds like a port to port hop    on a very fast switch and not hundreds of milliseconds, which    is the time it takes for a human being to see an application    moving too slow.  <\/p>\n<p>    The important thing about Spanner is that it is a database with    SQL semantics that allows reads without any locking of the    database and massive scalability on local Spanner slices to    thousands (and we would guess tens of thousands) of server    nodes, with very fast replication on a global scale to many    difference Spanner slices. When we pressed Google about the    local and global scalability limits on the Cloud Spanner    service, a Google spokesperson said: Technically speaking,    there are no limits to Cloud Spanners scale.  <\/p>\n<p>    Ahem. If we had a dollar for every time someone told    us that. . . . What we know from the original paper is that    Spanner was designed to, in theory, scale across millions of    machines across hundreds of datacenters and juggle trillions of    rows of data in its database. What Google has done in practice,    that is another thing.  <\/p>\n<p>    We also asked how many geographically distributed copies of the    database are provided through the Cloud Spanner service, and    this was the reply: Everything is handled automatically, but    customers have full view into where their data is located via    our UI\/menu.  <\/p>\n<p>    We will seek to get better answers to these and other    questions.  <\/p>\n<p>    The other neat thing about the paper that Brewer released this    week is that it provided some availability data for Spanner as    it is running inside of Google, and this chart counts    incidents  unexpected things that happened  rather    than failures  times when Spanner was unavailable    itself. Incidents can cause failures, but not always, and    Google claims that Spanner is available more than 99.999    percent (so called 5 9s) of the time.  <\/p>\n<\/p>\n<p>    As you can see from the chart above, the most frequent cause of    incidents relating to Spanner running internally were user    errors, such as overloading the system or not configuring    something correctly; in this case, only that user is affected    and everyone else using Spanner is woefully unaware of the    issue. (Not my circus, not my monkeys. . . .) The    cluster incidents, which made up 12.1 percent of Spanner    incidents, were when servers or datacenter power or other    components crashed, and often a Site Reliability Engineer is    needed to fix something here. The operator incidents are when    SREs do something wrong, and yes, that happens. The bugs, which    are true software errors, presumably in Spanner code as well as    applications, and Brewer said that the two biggest outages    (meaning the time and impact) were related to such software    errors. Networking errors for Spanner are when the network goes    kaplooey, and it usually caused datacenters or regions with    Spanner nodes to be cut off from the rest of the Spanner    cluster. To be a CA system in the CAP Theorem categorization,    the A has to be pretty good and not caused by the network    partitions being an issue.  <\/p>\n<p>    With under 8 percent of Spanner failures being due to network    and partition issues and with north of 5 9s availability, you    can make a pretty good argument that Spanner  and therefore    the Cloud Spanner service  being a pretty good fuzzy CAP    database, not just hewing to the CP definition that both    Spanner and CockroachDB technically fall under.  <\/p>\n<p>    The inside Spanner at Google underlies hundreds of its    applications and petabytes of capacity and churns through tens    of millions of queries per second, and it is obviously battle    tested enough for Google to trust other applications on it    besides its own.  <\/p>\n<p>    At the moment, Cloud Spanner is only available as a beta    service to Cloud Platform customers, and Google is not talking    about a timeline for when it will be generally available, but    we expect a lot more detail at the Next 17 conference in early    March that Google is hosting. What we know for sure is that    Google is aiming Cloud Spanner at customers who are, like it    was a decade ago, frustrated by MySQL databases that are    chopped up into shards, as Google was using at the time as its    relational datastore, as well as those who have chosen the    Postgres path once Oracle bought MySQL. The important thing is    that Spanner and now Cloud Spanner support distributed    transactions, schemas, and DDL statements as well as SQL    queries and JDBC drivers that are commonly used in the    enterprise to tickle databases. Cloud Spanner has libraries for    the popular languages out there, including Java, Node.js, Go,    and Python.  <\/p>\n<\/p>\n<p>    As is the case with any cloud data service out there, putting    data in is free, but moving it around different regions is not    and neither would be downloading it off Cloud Platform to    another service or a private datacenter, should that be    necessary.  <\/p>\n<p>    Categories: Cloud, Hyperscale, Store  <\/p>\n<p>    Tags: BigTable, Cloud Platform, Cloud Spanner, Google, Megastore, Spanner  <\/p>\n<p>    Memristor Research Highlights Neuromorphic Device    Future  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>More:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.nextplatform.com\/2017\/02\/15\/googles-spanner-database-wont-well-clone\/\" title=\"Why Google's Spanner Database Won't Do As Well As Its Clone - The Next Platform\">Why Google's Spanner Database Won't Do As Well As Its Clone - The Next Platform<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> February 15, 2017 Timothy Prickett Morgan Google has proven time and again it is on the extreme bleeding edge of invention when it comes to scale out architectures that make supercomputers look like toys. But what would the world look like if the search engine giant had started selling capacity on its vast infrastructure back in 2005, before Amazon Web Services launched, and then shortly thereafter started selling capacity on its high level platform services? And what if it had open sourced these technologies, as it has done with the Kubernetes container controller?  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/cloning\/why-googles-spanner-database-wont-do-as-well-as-its-clone-the-next-platform\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":9,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187749],"tags":[],"class_list":["post-177820","post","type-post","status-publish","format-standard","hentry","category-cloning"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/177820"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=177820"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/177820\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=177820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=177820"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=177820"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}