Storing JSON in database vs. having a new column for each key

mysql sql sql-server database optimization

199,191

Solution 1

Updated 4 June 2017

Given that this question/answer have gained some popularity, I figured it was worth an update.

When this question was originally posted, MySQL had no support for JSON data types and the support in PostgreSQL was in its infancy. Since 5.7, MySQL now supports a JSON data type (in a binary storage format), and PostgreSQL JSONB has matured significantly. Both products provide performant JSON types that can store arbitrary documents, including support for indexing specific keys of the JSON object.

However, I still stand by my original statement that your default preference, when using a relational database, should still be column-per-value. Relational databases are still built on the assumption of that the data within them will be fairly well normalized. The query planner has better optimization information when looking at columns than when looking at keys in a JSON document. Foreign keys can be created between columns (but not between keys in JSON documents). Importantly: if the majority of your schema is volatile enough to justify using JSON, you might want to at least consider if a relational database is the right choice.

That said, few applications are perfectly relational or document-oriented. Most applications have some mix of both. Here are some examples where I personally have found JSON useful in a relational database:

When storing email addresses and phone numbers for a contact, where storing them as values in a JSON array is much easier to manage than multiple separate tables
Saving arbitrary key/value user preferences (where the value can be boolean, textual, or numeric, and you don't want to have separate columns for different data types)
Storing configuration data that has no defined schema (if you're building Zapier, or IFTTT and need to store configuration data for each integration)

I'm sure there are others as well, but these are just a few quick examples.

Original Answer

If you really want to be able to add as many fields as you want with no limitation (other than an arbitrary document size limit), consider a NoSQL solution such as MongoDB.

For relational databases: use one column per value. Putting a JSON blob in a column makes it virtually impossible to query (and painfully slow when you actually find a query that works).

Relational databases take advantage of data types when indexing, and are intended to be implemented with a normalized structure.

As a side note: this isn't to say you should never store JSON in a relational database. If you're adding true metadata, or if your JSON is describing information that does not need to be queried and is only used for display, it may be overkill to create a separate column for all of the data points.

Solution 2

Like most things "it depends". It's not right or wrong/good or bad in and of itself to store data in columns or JSON. It depends on what you need to do with it later. What is your predicted way of accessing this data? Will you need to cross reference other data?

Other people have answered pretty well what the technical trade-off are.

Not many people have discussed that your app and features evolve over time and how this data storage decision impacts your team.

Because one of the temptations of using JSON is to avoid migrating schema and so if the team is not disciplined, it's very easy to stick yet another key/value pair into a JSON field. There's no migration for it, no one remembers what it's for. There is no validation on it.

My team used JSON along side traditional columns in postgres and at first it was the best thing since sliced bread. JSON was attractive and powerful, until one day we realized that flexibility came at a cost and it's suddenly a real pain point. Sometimes that point creeps up really quickly and then it becomes hard to change because we've built so many other things on top of this design decision.

Overtime, adding new features, having the data in JSON led to more complicated looking queries than what might have been added if we stuck to traditional columns. So then we started fishing certain key values back out into columns so that we could make joins and make comparisons between values. Bad idea. Now we had duplication. A new developer would come on board and be confused? Which is the value I should be saving back into? The JSON one or the column?

The JSON fields became junk drawers for little pieces of this and that. No data validation on the database level, no consistency or integrity between documents. That pushed all that responsibility into the app instead of getting hard type and constraint checking from traditional columns.

Looking back, JSON allowed us to iterate very quickly and get something out the door. It was great. However after we reached a certain team size it's flexibility also allowed us to hang ourselves with a long rope of technical debt which then slowed down subsequent feature evolution progress. Use with caution.

Think long and hard about what the nature of your data is. It's the foundation of your app. How will the data be used over time. And how is it likely TO CHANGE?

Solution 3

Just tossing it out there, but WordPress has a structure for this kind of stuff (at least WordPress was the first place I observed it, it probably originated elsewhere).

It allows limitless keys, and is faster to search than using a JSON blob, but not as fast as some of the NoSQL solutions.

uid   |   meta_key    |   meta_val
----------------------------------
1         name            Frank
1         age             12
2         name            Jeremiah
3         fav_food        pizza
.................

EDIT

For storing history/multiple keys

uid   | meta_id    |   meta_key    |   meta_val
----------------------------------------------------
1        1             name            Frank
1        2             name            John
1        3             age             12
2        4             name            Jeremiah
3        5             fav_food        pizza
.................

and query via something like this:

select meta_val from `table` where meta_key = 'name' and uid = 1 order by meta_id desc

Solution 4

the drawback of the approach is exactly what you mentioned :

it makes it VERY slow to find things, since each time you need to perform a text-search on it.

value per column instead matches the whole string.

Your approach (JSON based data) is fine for data you don't need to search by, and just need to display along with your normal data.

Edit: Just to clarify, the above goes for classic relational databases. NoSQL use JSON internally, and are probably a better option if that is the desired behavior.

Solution 5

Basically, the first model you are using is called as document-based storage. You should have a look at popular NoSQL document-based database like MongoDB and CouchDB. Basically, in document based db's, you store data in json files and then you can query on these json files.

The Second model is the popular relational database structure.

If you want to use relational database like MySql then i would suggest you to only use second model. There is no point in using MySql and storing data as in the first model.

To answer your second question, there is no way to query name like 'foo' if you use first model.

View more solutions

199,191

ShuklaSannidhya

Updated on July 08, 2022

Comments

ShuklaSannidhya almost 2 years
I am implementing the following model for storing user related data in my table - I have 2 columns - uid (primary key) and a meta column which stores other data about the user in JSON format.
```
 uid   | meta
--------------------------------------------------
 1     | {name:['foo'], 
       |  emailid:['[email protected]','[email protected]']}
--------------------------------------------------
 2     | {name:['sann'], 
       |  emailid:['[email protected]','[email protected]']}
--------------------------------------------------
```
Is this a better way (performance-wise, design-wise) than the one-column-per-property model, where the table will have many columns like uid, name, emailid.

What I like about the first model is, you can add as many fields as possible there is no limitation.

Also, I was wondering, now that I have implemented the first model. How do I perform a query on it, like, I want to fetch all the users who have name like 'foo'?

Question - Which is the better way to store user related data (keeping in mind that number of fields is not fixed) in database using - JSON or column-per-field? Also, if the first model is implemented, how to query database as described above? Should I use both the models, by storing all the data which may be searched by a query in a separate row and the other data in JSON (is a different row)?

Update

Since there won't be too many columns on which I need to perform search, is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same MySQL database)?
Colin M about 11 years

Out of curiosity, what made you assume that his model is non-relational. The information he put above seems very relational to me.
ShuklaSannidhya about 11 years

So you mean, I should use both. Key-per-column for the data i need to search and JSON for others, right?
Nick Andriopoulos about 11 years

yes. that way, you get the required performance from searching the data-per-column fields, and grab the JSON blob to use in code when needed.
ShuklaSannidhya about 11 years

Is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same database)?
ShuklaSannidhya about 11 years

Since there won't be too many columns on which I need to perform search, is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same MySQL database)?
Girish about 11 years

@Sann - haha. That's data duplication. You will have to make sure that both the pieces of data are always the same. Even if the one of the data is different at any point of time, then your data is not clean and might lead to serious problem. So, my answer is NO
Colin M about 11 years

@Sann You should use a column per value for data that you want to read or query often. Putting someone's name in JSON doesn't make sense because, even though you're not likely to query based on it, you're likely to need it very often. That's a lot of wasteful decoding on your application-side. Unless you really feel like your data is better represented as JSON (and trust me, it's probably not), you shouldn't resort to that.
ShuklaSannidhya about 11 years

But redundancy isn't costly when the redundant data is small, say, there are only two fields on which I need to perform search, so I create two new columns for them, [maybe] remove them from my JSON data [/maybe]. That won't be costly duplication right?
Girish about 11 years

If you are looking at performance, then MongoDB and CouchDB provide faster read and write operations than MySql because they do not offer a lot of features in relational databases which are not required in most of the use cases.
Bruno about 11 years

I'd be curious to see if a NoSQL solution really performs better than a relational query on a properly index key. I'd suspect it should be more or less the same on a 1-level example like this.
ShuklaSannidhya about 11 years

+1. I've noticed it too! But it gives you a huge table (in terms of rows). Also you can not store multiple values, say, if user changes his/her name, but I want to preserve the old name too, in that case I'll need JSON type data model.
Bruno about 11 years

@Sann, if you wanted to keep the old value in JSON, you'd also have to rename the key: you can do with with an EAV (which is what this example is) or JSON. It's not particularly different.
Adam about 11 years

It does give you a huge table, but as for duplicate values, you run into the same problem with JSON - you can't have duplicate keys at the same level (e.g. two "name" keys) and expect predictable behavior.
ShuklaSannidhya about 11 years

Sure you can't have duplicate keys, but can have an array associated with that key. Check out the emailid key in the example I've given in my question.
Travis J over 9 years

This seems good for things which are single values, but complex structures (which to me seem the whole reason to use json in a db) are not represented well here. If there was a deeply nested object it would get rather ugly to store the index purely as a string. At that point it would make more sense to simply store the json since each key value would probably represent the json structure.
markbratanov about 9 years

Couldn't the benefit be storing JSON objects / callbacks from an API? For example, instead of calling youtube's API for URL, thumb, etc, you could just query your local DB (mysql, lite, etc) for the JSON object? I don't know, makes sense to me, especially if you are trying to cache or make an app run faster. But I'm no professional :/
ted about 8 years

"virtually impossible to query" - today psql allows you to search and index its jsonb
Colin M about 8 years

@ted true. However, at the time of writing this answer that was not really available. Also, this question references MySQL in which is capability is not present.
ted about 8 years

@ColinM, yes, I realize my comment is 3 years younger your post. The reason I left it is because it may be helpful and decision changing for others. As for the reference to MySQL: could be true, but have "For relational databases" in your answer =P
Shri over 7 years

So in a user database table, with details like name, email, city, and some settings (like theme, colors etc ) and other profile data (like interests and keywords) for the user, Does it makes sense to have seperate columns for name email etc, and a single json column for all settings?
Bruno over 7 years

In addition to what I said above, it can be worth looking at the operators for the JSONB datatype in PostgreSQL 9.4 and above.
Alex over 7 years

Just to give a further update to this post, MySQL introduced a JSON data type with release 5.7, which allows querying. I don't know enough to comment about it's performance.
尤川豪 over 5 years

This is ''EAV model (entity-attribute-value) model.
PBo about 5 years

SQL Server 2016 now also allows you to store JSON rows and search for data within the JSON docs.microsoft.com/en-us/sql/relational-databases/json/…