Putting JSON string as field data on MySQL

16,717

Solution 1

Proper escaping and you are fine but I must add that this is where XML is the better format than json since it will also allow you to do also use the data inside the xml in your queries

<?xml version="1.0" encoding="UTF-8" ?>
<user>
   <gender>male</gender>
   <birthday>8-Jan-1991</birthday>
   <country>UK</country>
   <city>London</city>
</user> 

select

SELECT ExtractValue(data, '//gender') AS gender FROM users WHERE name='john' AND EXTRACTVALUE(data, '//country') != 'UK';

http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html#function_extractvalue

Solution 2

But is it safe?

As long as you properly escape the input, using an appropriate library to access the database (or at least use mysql_real_escape_string) then yes, it is safe. Or at least, not more risky then storing anything else, in terms of hacking the database.

Is there any disadvantages or problem with using this method to store data on MySQL

Yes, here are a few:

  • It's not possible, or at least much harder, to query against anything in the "data" column. Say you want all users that live in London. You'll have to fetch all the "data" columns in the entire database and do the searching in PHP.

  • It's also not possible to sort by anything in the "data" column when querying. It would have to be done in PHP.

  • You have to take care of ensuring that the data stored is stored in the correct format. You should do this anyway, but it does remove an extra level of protection against storing "bad" data.

It looks like you have essentially turned MySQL into a NoSQL database. Although my experience is limited of them, the are able to index + sort data in the stored documents/JSON data to some extent. As a relational database, MySQL can't: it can only sort + index the defined columns. You're getting the worst of MySQL, the difficulty of scaling, without using any of its advantages, namely being able to run complex queries.

That being said if you are sure that you'll never need to run such queries, it might make it easier to move to NoSQL later if you store things as JSON.

Edit: If you're concerned about using up space with empty columns, you can always add tables. Say a user-addresses table. This is actually quite a good way to be future-friendly if you might sometime need more than one address per user.

Solution 3

Try to add new columns. JSON decodes are very expensive. But if your PHP application cannot afford downtime or you can't add more columns for some reason, you can do below:

  • Convert the data for your pseudo-columns into a PHP array and serialise them to a string (see serialize) and store it in a MySQL CLOB.
  • Same as above, but use package http://pecl.php.net/package/igbinary for serialization and deserialization. Store it in a MySQL BLOB field.

Solution 4

If I were you, I would simply add the new columns for the data set.

Using JSON inside a MySQL field isn't bad. It's saved me a lot of grief. But it does introduce a good bit of overhead and limits what functionality you can use from the database engine. Constantly manipulating SQL schema is not the best thing to do, but neither is decoding JSON objects when you don't have to.

If the data schema is fairly static, like your example, where you store a user's gender, birthday, etc, it's best to use columns. Then you can manipulate the data quickly and easily directly with SQL... sort, filter, create indexes for faster lookups, etc. Since the data schema is fairly static, you don't really gain anything from JSON except maybe a few minutes of your time creating the columns. In the end you lose a lot more time in machine cycles over the life of the application.

Where I use JSON in MySQL fields is where the data schema is very fluid. As a test engineer, this is pretty much the norm. For example, in one of my current projects, the list of target metrics (which are stored in MySQL) changes very regularly, depending on what issues are being addressed or what performance characteristics are being tweaked. It's a regular event for the development engineers to ask for new metrics, and they of course expect this to all get neatly displayed and changes to be made quickly. So instead of futzing with the SQL schema on a daily basis, I store the static schema (test type, date, product version, etc) as columns, but the ever-fluid test result data as a JSON object. This means I can still query the data using SQL statements based on test type, version, date, etc, but never have to touch the table schema when integrating new metrics. For display of the actual test data, I simply iterate the results and decode the JSON objects into arrays and go from there. As this project expands, I'll eventually implement memcached to cache everything.

This also has the side-effect of bundling the 100+ test metrics into one text blob, the whole of which I zlib-compress, making it about 10% of the original size. That adds up to quite a significant data storage savings as we're at 7 figures of rows already.

Share:
16,717
Paranoid
Author by

Paranoid

I think too much.

Updated on June 07, 2022

Comments

  • Paranoid
    Paranoid almost 2 years

    I have this idea where I make a multi/non-multi dimensional array into a JSON string using PHP's json_encode and store the data into my SQL database.

    For example, I have a table called 'users'. Table 'users' have 3 fields: id, name, data

    Using php, I would like to fetch user John's data: SELECT data FROM users WHERE name='john'

    Now the value/text for 'data' field will be like this: {"gender":"male","birthday":"8-Jan-1991","country":"UK","city":"London"}

    I will decode the 'data' field using PHP's json_decode and then I will convert the stdClass object into an array using one of my self-made PHP functions. And then I can show John's information wherever I want like this: $user['data']['country'].

    This saves me from the hazel to create extra fields on database for country, city, birthday, etc. But is it safe? Is there any disadvantages or problem with using this method to store data on MySQL.