How to store a list in a column of a database table

245,520

Solution 1

No, there is no "better" way to store a sequence of items in a single column. Relational databases are designed specifically to store one value per row/column combination. In order to store more than one value, you must serialize your list into a single value for storage, then deserialize it upon retrieval. There is no other way to do what you're talking about (because what you're talking about is a bad idea that should, in general, never be done).

I understand that you think it's silly to create another table to store that list, but this is exactly what relational databases do. You're fighting an uphill battle and violating one of the most basic principles of relational database design for no good reason. Since you state that you're just learning SQL, I would strongly advise you to avoid this idea and stick with the practices recommended to you by more seasoned SQL developers.

The principle you're violating is called first normal form, which is the first step in database normalization.

At the risk of oversimplifying things, database normalization is the process of defining your database based upon what the data is, so that you can write sensible, consistent queries against it and be able to maintain it easily. Normalization is designed to limit logical inconsistencies and corruption in your data, and there are a lot of levels to it. The Wikipedia article on database normalization is actually pretty good.

Basically, the first rule (or form) of normalization states that your table must represent a relation. This means that:

  • You must be able to differentiate one row from any other row (in other words, you table must have something that can serve as a primary key. This also means that no row should be duplicated.
  • Any ordering of the data must be defined by the data, not by the physical ordering of the rows (SQL is based upon the idea of a set, meaning that the only ordering you should rely on is that which you explicitly define in your query)
  • Every row/column intersection must contain one and only one value

The last point is obviously the salient point here. SQL is designed to store your sets for you, not to provide you with a "bucket" for you to store a set yourself. Yes, it's possible to do. No, the world won't end. You have, however, already crippled yourself in understanding SQL and the best practices that go along with it by immediately jumping into using an ORM. LINQ to SQL is fantastic, just like graphing calculators are. In the same vein, however, they should not be used as a substitute for knowing how the processes they employ actually work.

Your list may be entirely "atomic" now, and that may not change for this project. But you will, however, get into the habit of doing similar things in other projects, and you'll eventually (likely quickly) run into a scenario where you're now fitting your quick-n-easy list-in-a-column approach where it is wholly inappropriate. There is not much additional work in creating the correct table for what you're trying to store, and you won't be derided by other SQL developers when they see your database design. Besides, LINQ to SQL is going to see your relation and give you the proper object-oriented interface to your list automatically. Why would you give up the convenience offered to you by the ORM so that you can perform nonstandard and ill-advised database hackery?

Solution 2

You can just forget SQL all together and go with a "NoSQL" approach. RavenDB, MongoDB and CouchDB jump to mind as possible solutions. With a NoSQL approach, you are not using the relational model..you aren't even constrained to schemas.

Solution 3

What I have seen many people do is this (it may not be the best approach, correct me if I am wrong):

The table which I am using in the example is given below(the table includes nicknames that you have given to your specific girlfriends. Each girlfriend has a unique id):

nicknames(id,seq_no,names)

Suppose, you want to store many nicknames under an id. This is why we have included a seq_no field.

Now, fill these values to your table:

(1,1,'sweetheart'), (1,2,'pumpkin'), (2,1,'cutie'), (2,2,'cherry pie')

If you want to find all the names that you have given to your girl friend id 1 then you can use:

select names from nicknames where id = 1;

Solution 4

Simple answer: If, and only if, you're certain that the list will always be used as a list, then join the list together on your end with a character (such as '\0') that will not be used in the text ever, and store that. Then when you retrieve it, you can split by '\0'. There are of course other ways of going about this stuff, but those are dependent on your specific database vendor.

As an example, you can store JSON in a Postgres database. If your list is text, and you just want the list without further hassle, that's a reasonable compromise.

Others have ventured suggestions of serializing, but I don't really think that serializing is a good idea: Part of the neat thing about databases is that several programs written in different languages can talk to one another. And programs serialized using Java's format would not do all that well if a Lisp program wanted to load it.

If you want a good way to do this sort of thing there are usually array-or-similar types available. Postgres for instance, offers array as a type, and lets you store an array of text, if that's what you want, and there are similar tricks for MySql and MS SQL using JSON, and IBM's DB2 offer an array type as well (in their own helpful documentation). This would not be so common if there wasn't a need for this.

What you do lose by going that road is the notion of the list as a bunch of things in sequence. At least nominally, databases treat fields as single values. But if that's all you want, then you should go for it. It's a value judgement you have to make for yourself.

Solution 5

In addition to what everyone else has said, I would suggest you analyze your approach in longer terms than just now. It is currently the case that items are unique. It is currently the case that resorting the items would require a new list. It is almost required that the list are currently short. Even though I don't have the domain specifics, it is not much of a stretch to think those requirements could change. If you serialize your list, you are baking in an inflexibility that is not necessary in a more-normalized design. Btw, that does not necessarily mean a full Many:Many relationship. You could just have a single child table with a foreign key to the parent and a character column for the item.

If you still want to go down this road of serializing the list, you might consider storing the list in XML. Some databases such as SQL Server even have an XML data type. The only reason I'd suggest XML is that almost by definition, this list needs to be short. If the list is long, then serializing it in general is an awful approach. If you go the CSV route, you need to account for the values containing the delimiter which means you are compelled to use quoted identifiers. Persuming that the lists are short, it probably will not make much difference whether you use CSV or XML.

Share:
245,520

Related videos on Youtube

JnBrymn
Author by

JnBrymn

@JnBrymn Author of Taming Search manning.com/turnbull/ I'm all about search relevance.

Updated on January 21, 2022

Comments

  • JnBrymn
    JnBrymn over 2 years

    So, per Mehrdad's answer to a related question, I get it that a "proper" database table column doesn't store a list. Rather, you should create another table that effectively holds the elements of said list and then link to it directly or through a junction table. However, the type of list I want to create will be composed of unique items (unlike the linked question's fruit example). Furthermore, the items in my list are explicitly sorted - which means that if I stored the elements in another table, I'd have to sort them every time I accessed them. Finally, the list is basically atomic in that any time I wish to access the list, I will want to access the entire list rather than just a piece of it - so it seems silly to have to issue a database query to gather together pieces of the list.

    AKX's solution (linked above) is to serialize the list and store it in a binary column. But this also seems inconvenient because it means that I have to worry about serialization and deserialization.

    Is there any better solution? If there is no better solution, then why? It seems that this problem should come up from time to time.

    ... just a little more info to let you know where I'm coming from. As soon as I had just begun understanding SQL and databases in general, I was turned on to LINQ to SQL, and so now I'm a little spoiled because I expect to deal with my programming object model without having to think about how the objects are queried or stored in the database.

    Thanks All!

    John

    UPDATE: So in the first flurry of answers I'm getting, I see "you can go the CSV/XML route... but DON'T!". So now I'm looking for explanations of why. Point me to some good references.

    Also, to give you a better idea of what I'm up to: In my database I have a Function table that will have a list of (x,y) pairs. (The table will also have other information that is of no consequence for our discussion.) I will never need to see part of the list of (x,y) pairs. Rather, I will take all of them and plot them on the screen. I will allow the user to drag the nodes around to change the values occasionally or add more values to the plot.

    • Janac Meena
      Janac Meena over 2 years
      A value that is atomically accessed/modified, is really just a single value. For example, no one would object to storing a URL (which is really just a list of chars) as a field in a db table. I disagree with the answer below that "if you do this now, you'll do it again later". It sounds like you've thought about whether or not this scenario makes sense for your usecase. The minute you find yourself needing to add constraints on the list, or individual access of a list item, is when you should normalize your db, and create a JOIN table.
  • JnBrymn
    JnBrymn almost 14 years
    So you believe strongly that storing a list in a column is a bad idea, but you fail to mention why. Since I'm just starting out with SQL, a little bit of the "why" would be very helpful indeed. For instance, you say that I'm "fighting an uphill battle and violating one of the most basic principles of relational database design for no good reason" ... so what is the principle? Why are the reasons that I cited "no good"? (specifically, the sorted and atomic nature of my lists)
  • JnBrymn
    JnBrymn almost 14 years
    I though about that. It still means that I would have to serialize and deserialize... but I suspect that's doable. I wish there was some condoned way to do what I want, but I suspect there isn't.
  • Toby
    Toby almost 14 years
    Basically, it comes down to years of experience condensed into best practices. The basic principal in question is known as 1st Normal Form.
  • JnBrymn
    JnBrymn almost 14 years
    Thanks Adam. Very informative. Good point with your last question.
  • coolgeek
    coolgeek almost 14 years
    +1 for anticipating future changes - always design your data model to be extensible.
  • Marjan Venema
    Marjan Venema almost 14 years
    I can only agree with this. In my view there is only one exception to the rule of not storing multiple values in a single column. An attribute that is a set of individual enumerate values. Eg. TEnum = (mcUgly, mcEvil, mcBad) and a property Character: set of TEnum. So Character can be [mcUgly, mcBad] or [mcEvil, mcBad], or ... Character can be stored as an integer. Storing it as a csv of the individual enumerate values can be more self-explanatory. Only reason it would be acceptable is that Character is still a single attribute and the enum values are finite (hardcoded).
  • Edson Horacio Junior
    Edson Horacio Junior about 9 years
    Just a tip to select that list considerind you created another table to store it: select array_to_string(array_agg(list_table.column_name), ', ') from main_table inner join list_table on main_table.id = list_table.id This will mount a "list" of column_name concatenated by ,, hope that is useful.
  • VoronoiPotato
    VoronoiPotato about 8 years
    capnproto.org is a way to not have to serialize and deserialize, similarly quick (in comparison to csv or xml) in case capnproto is not supported in your language of choice msgpack.org/index.html
  • Lynn
    Lynn almost 8 years
    “[…] and you won't be derided by other SQL developers when they see your database design.” There are very good reasons to respect First Normal Form (and your answer mentions them), but peer pressure/“that’s how things are done around here” is not among them.
  • Lisandro
    Lisandro over 7 years
    The real answer might be not using a relational database at all. Checkout Redis or other in-memory DBs. They can store sets, lists and hashes in a single key.
  • Haakon Løtveit
    Haakon Løtveit almost 7 years
    We already store bunches of lists in database columns every day. They're called "char" and "varchar". Of course in Postgres, they're also called text. What the 1NF really says is that you shouldn't ever want to split up the information in any field into smaller fields, and if you do that, you've dun goofed. So you don't store name, you store personal name, middle names, and family names (depending on localization), and stitch those together. Otherwise we'd not store strings of text at all. On the other hand, all he wants is a string of strings. And there are ways to do that.
  • Demurgos
    Demurgos over 6 years
    The main reason why you want to avoid storing it in a single column is to preserve referential integrity. Having a table for the join allows your database to protect you from removing an item from the DB and still having an invalid ID in your serialized list.
  • Janac Meena
    Janac Meena over 4 years
    Seasoned front end web developers said to always separate style, form and logic in web apps. ReactJS and Angular broke those rules and took over the front end web scene. Seasoned networking engineers said that dropbox and cloud storage was not going to be worth the cost/profit effort (search hackernews for dropbox and you'll find the comments from 2008). With ever rule, comes exceptions and this is what our job really is, to discern whether our user's needs will be met in a better way if we break the rules or if we follow them
  • Gert Arnold
    Gert Arnold almost 4 years
    But this is a very specific case: a fixed number of fixed-length items. Even then, it makes a simple search like "all products having at least color x" harder than standard SQL would.
  • Antonin GAVREL
    Antonin GAVREL almost 4 years
    As I stated multiple times, I don't use it for color, the field I use it to should not be indexed nor used as a condition, and yet it is a critical one
  • Gert Arnold
    Gert Arnold almost 4 years
    I know, I'm trying to indicate that this is highly specific. If any small additional requirement sneaks in it quickly becomes more awkward than standard solutions. The vast majority of people that get tempted to store lists in one db field are probably better off not doing it.
  • Shai Cohen
    Shai Cohen about 3 years
    this is the "bare bones" "nuts & bolts" reason.
  • Orenger
    Orenger about 2 years
    I like this answer, however, lets say nickname is a common thing and we give hundreds of nicknames - we would have a big table with no meaning its kinda mixing SQL with NOSQL and i think SQL should address that issue