What database does Google use?

261,463

Solution 1

Bigtable

A Distributed Storage System for Structured Data

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

Some features

  • fast and extremely large-scale DBMS
  • a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
  • designed to scale into the petabyte range
  • it works across hundreds or thousands of machines
  • it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
  • each table has multiple dimensions (one of which is a field for time, allowing versioning)
  • tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

Architecture

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."

In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.

Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.

The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Implementation

BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data.

Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.

There are three primary server types of interest in the Bigtable system:

  1. Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
  2. Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
  3. Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

Example from Google's research paper:

alt text

A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family contains the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN's home page is referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.com and anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t3, t5, and t6.

API

Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.


Here is the link to the PDF of the research paper.

And here you can find a video showing Google's Jeff Dean in a lecture at the University of Washington, discussing the Bigtable content storage system used in Google's backend.

Solution 2

It's something they've built themselves - it's called Bigtable.

http://en.wikipedia.org/wiki/BigTable

There is a paper by Google on the database:

http://research.google.com/archive/bigtable.html

Solution 3

Spanner is Google's globally distributed relational database management system (RDBMS), the successor to BigTable. Google claims it is not a pure relational system because each table must have a primary key.

Here is the link of the paper.

Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

Another database invented by Google is Megastore. Here is the abstract:

Megastore is a storage system developed to meet the requirements of today's interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore's semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.

Solution 4

As others have mentioned, Google uses a homegrown solution called BigTable and they've released a few papers describing it out into the real world.

The Apache folks have an implementation of the ideas presented in these papers called HBase. HBase is part of the larger Hadoop project which according to their site "is a software platform that lets one easily write and run applications that process vast amounts of data." Some of the benchmarks are quite impressive. Their site is at http://hadoop.apache.org.

Solution 5

Although Google uses BigTable for all their main applications, they also use MySQL for other (perhaps minor) apps.

Share:
261,463
Perry
Author by

Perry

I am now a full-time carer but until recently have been head of infrastructure and operations, a full stack software engineer and a devops engineer. With a wealth of technical skills and natural passion for technology I am also Microsoft certified. A lover of C# but not tied to exclusively the Microsoft stack and have enjoyed writing some cloud automation code in both Python and Node.js. Currently, I am a big fan of the .NET Core cross-platform development platform and am taking an interest in Blazor WebAssembly. I write code in Visual Studio Code, Sublime Text and the full Microsoft Visual Studio IDE. I also like to switch between macOS, Windows and Linux for development. I enjoy full stack development, devops and cloud-based work and am enjoying writing front end code in VueJS. Commercially I have worked with the Amazon Web Services stack and using various technologies have thrived in providing start-ups with a scalable continuous deployment architecture and delivered proof of concept, prototype applications and innovate software solutions.

Updated on July 22, 2022

Comments

  • Perry
    Perry almost 2 years

    Is it Oracle or MySQL or something they have built themselves?

  • OscarRyz
    OscarRyz about 15 years
    Do anyone know if it was that built from scratch or based on some product? I heard somewhere I don't remember where, that google used Oracle once, but they drop it because they need some modifications that Oracle won't do nor allow them to do. I'll try to get the link.
  • Matt J
    Matt J about 15 years
    It's from scratch, like most of their other core competencies (web server, GFS, ...).
  • josh3736
    josh3736 almost 14 years
    @smoothdeveloper's link is dead; read an archived copy here: web.archive.org/web/20071102233627/http://xooglers.blogspot.‌​com/…
  • helios
    helios over 12 years
    I was looking for info about the compression algorithms (BMDiff and Zippy) and found that now Zippy is called Snappy and it's published in Google Code: code.google.com/p/snappy
  • user
    user over 10 years
    Google also use Oracle - reference needed.
  • Mikko Rantalainen
    Mikko Rantalainen about 10 years
    It's a shame that Spanner is closed source project. According to the description, I'd love to use that for my projects, too.
  • deltonio2
    deltonio2 about 9 years
    Now they use Spanner, the successor to BigTable
  • Admin
    Admin about 9 years
    @user cloud.google.com/sql/docs ? If developers can use MySQL, Google must, at least, have created a "database translator" with MySQL and Bigtable.
  • dualed
    dualed about 9 years
    @MikkoRantalainen You may want to check out the Apache Hadoop ecosystem or CockroachDB (though Cockroach is alpha)
  • Mikko Rantalainen
    Mikko Rantalainen about 9 years
    Thanks, CockroachDB looks interesting. I have to test it to see what kind of performance it has. Features look like the stuff I would like to have.
  • stuckedoverflow
    stuckedoverflow about 8 years
    So, it looks similar to nosql database such Mongodb or Marklogic.
  • Miscreant
    Miscreant almost 6 years
    Spanner has been available for everyone to use on Google Cloud since 2017: cloud.google.com/spanner
  • Shivam Jha
    Shivam Jha over 4 years
    Link is 404 not found
  • S. Liu
    S. Liu almost 2 years
    however Google is no longer in the list of MySQL Customers...