Google Cloud Bigtable vs Google Cloud Datastore

53,603

Solution 1

Based on experience with Datastore and reading the Bigtable docs, the main differences are:

  • Bigtable was originally designed for HBase compatibility, but now has client libraries in multiple languages. Datastore was originally more geared towards Python/Java/Go web app developers (originally App Engine)
  • Bigtable is 'a bit more IaaS' than Datastore in that it's not 'just there' but requires a cluster to be configured.
  • Bigtable supports only one index - the 'row key' (the entity key in Datastore)
    • This means queries are on the Key, unlike Datastore's indexed properties
  • Bigtable supports atomicity only on a single row - there are no transactions
  • Mutations and deletions appear not to be atomic in Bigtable, whereas Datastore provides eventual and strong consistency, depending on the read/query method
  • The billing model is very different:
    • Datastore charges for read/write operations, storage and bandwidth
    • Bigtable charges for 'nodes', storage and bandwidth

Solution 2

Bigtable is optimized for high volumes of data and analytics

  • Cloud Bigtable doesn’t replicate data across zones or regions (data within a single cluster is replicated and durable), which means Bigtable is faster and more efficient, and costs are much lower, though it is less durable and available in the default configuration
  • It uses the HBase API - there’s no risk of lock-in or new paradigms to learn
  • It is integrated with the open-source Big Data tools, meaning you can analyze the data stored in Bigtable in most analytics tools customers use (Hadoop, Spark, etc.)
  • Bigtable is indexed by a single Row Key
  • Bigtable is in a single zone

Cloud Bigtable is designed for larger companies and enterprises who often have larger data needs with complex backend workloads.

Datastore is optimized to serve high-value transactional data to applications

  • Cloud Datastore has extremely high availability with replication and data synchronization
  • Datastore, because of its versatility and high availability, is more expensive
  • Datastore is slower writing data due to synchronous replication
  • Datastore has much better functionality around transactions and queries (since secondary indexes exist)

Solution 3

Bigtable and Datastore are extremely different. Yes, the datastore is build on top of Bigtable, but that does not make it anything like it. That is kind of like saying a car is build on top of wheels, and so a car is not much different from wheels.

Bigtable and Datastore provide very different data models and very different semantics in how the data is changed.

The main difference is that the Datastore provides SQL-database-like ACID transactions on subsets of the data known as entity groups (though the query language GQL is much more restrictive than SQL). Bigtable is strictly NoSQL and comes with much weaker guarantees.

Solution 4

I am going to try to summarize all the answers above plus what is given in Coursea Google Cloud Platform Big Data and Machine Learning Fundamentals

+---------------------+------------------------------------------------------------------+------------------------------------------+--+
|      Category       |                             BigTable                             |                Datastore                 |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+
| Technology          | Based on HBase(uses HBase API)                                   | Uses BigTable itself                     |  |
| ----------------    |                                                                  |                                          |  |
| Access Mataphor     | Key/Value (column-families) like Hbase                           | Persistent hashmap                       |  |
| ----------------    |                                                                  |                                          |  |
| Read                | Scan Rows                                                        | Filter Objects on property               |  |
| ----------------    |                                                                  |                                          |  |
| Write               | Put Row                                                          | Put Object                               |  |
| ----------------    |                                                                  |                                          |  |
| Update Granularity  | can't update row ( you should write a new row, can't update one) | can update attribute                     |  |
| ----------------    |                                                                  |                                          |  |
| Capacity            | Petabytes                                                        | Terbytes                                 |  |
| ----------------    |                                                                  |                                          |  |
| Index               | Index key only (you should properly design the key)              | You can index any property of the object |  |
| Usage and use cases | High throughput, scalable flatten data                           | Structured data for Google App Engine    |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+

Check this image too: enter image description here

enter image description here

Solution 5

If you read papers, BigTable is this and Datastore is MegaStore. Datastore is BigTable plus replication, transaction, and index. (and is much more expensive).

Share:
53,603
Andrei F
Author by

Andrei F

Updated on November 29, 2020

Comments

  • Andrei F
    Andrei F over 3 years

    What is the difference between Google Cloud Bigtable and Google Cloud Datastore / App Engine datastore, and what are the main practical advantages/disadvantages? AFAIK Cloud Datastore is build on top of Bigtable.

  • Daniel Roseman
    Daniel Roseman almost 9 years
    You were doing well until the last paragraph. The datastore provides transactions, but they are nothing like SQL and definitely not ACID.
  • user2771609
    user2771609 almost 9 years
    @DanielRoseman Actually, it very much does. Here is a quote from the paper on Megastore (on which Datastore is built): "Each Megastore entity group functions as a mini-database that provides serializable ACID semantics." "we partition the datastore and replicate each partition separately, providing full ACID semantics within partitions". (research.google.com/pubs/pub36971.html)
  • Zig Mandel
    Zig Mandel almost 9 years
    I think its misleading to call it Sql. A subset at most. Has no efficient count/group, all queries must use indexes etc
  • user2771609
    user2771609 almost 9 years
    Query language and transaction isolation are different things, you seem to be mixing them up. I am making a claim about the latter (ACID transactions). In your comment you are assuming I am talking about the former. Perhaps some hyphens will clarify? I'll explicitly mentions the query language issue to remove any doubt.
  • Aram Paronikyan
    Aram Paronikyan over 7 years
    As of November 2016, the same is for Java
  • Brandon DuRette
    Brandon DuRette over 5 years
    Bigtable now replicates across zones to provide availability in the face of a zonal outage: cloudplatform.googleblog.com/2018/07/…
  • zyxue
    zyxue over 5 years
    I thought transaction is not a strong selling point for datastore. From its [doc|cloud.google.com/datastore/docs/concepts/transactions] "A transaction is a set of Google Cloud Datastore operations on one or more entities in up to 25 entity groups. " Also, datastore is built on top of Bigtable, right?
  • benji
    benji over 5 years
    Is it really more expensive? the minimum for BigTable is 3 nodes, at 10GB HDD it's $1400/mo. Seems pretty high no?
  • Justin Zhang
    Justin Zhang over 5 years
    @ben, in my past experience it was. Datastore is charged per-operation instead of per-hour. (If you don't use it that much then yes you don't pay Datastore much. But if you have high traffic and then I think bigtable is much cheaper.) I think Bigtable claims 10k ops per second? In reality I found it to be lower, like around 1-2k, but still 3 nodes is > 5k/s. If you maintain that throughput for a month and maps that to Datastore pricing, it's probably much higher than 1.4k.
  • gstackoverflow
    gstackoverflow about 4 years
    MegaStore link is broken