Ways to save enums in database

128,902

Solution 1

We never store enumerations as numerical ordinal values anymore; it makes debugging and support way too difficult. We store the actual enumeration value converted to string:

public enum Suit { Spade, Heart, Diamond, Club }

Suit theSuit = Suit.Heart;

szQuery = "INSERT INTO Customers (Name, Suit) " +
          "VALUES ('Ian Boyd', %s)".format(theSuit.name());

and then read back with:

Suit theSuit = Suit.valueOf(reader["Suit"]);

The problem was in the past staring at Enterprise Manager and trying to decipher:

Name          Suit
------------  ----
Kylie Guénin  2
Ian Boyd      1

verses

Name          Suit
------------  -------
Kylie Guénin  Diamond
Ian Boyd      Heart

the latter is much easier. The former required getting at the source code and finding the numerical values that were assigned to the enumeration members.

Yes it takes more space, but the enumeration member names are short, and hard drives are cheap, and it is much more worth it to help when you're having a problem.

Additionally, if you use numerical values, you are tied to them. You cannot nicely insert or rearrange the members without having to force the old numerical values. For example, changing the Suit enumeration to:

public enum Suit { Unknown, Heart, Club, Diamond, Spade }

would have to become :

public enum Suit { 
      Unknown = 4,
      Heart = 1,
      Club = 3,
      Diamond = 2,
      Spade = 0 }

in order to maintain the legacy numerical values stored in the database.

How to sort them in the database

The question comes up: lets say i wanted to order the values. Some people may want to sort them by the enum's ordinal value. Of course, ordering the cards by the numerical value of the enumeration is meaningless:

SELECT Suit FROM Cards
ORDER BY SuitID; --where SuitID is integer value(4,1,3,2,0)

Suit
------
Spade
Heart
Diamond
Club
Unknown

That's not the order we want - we want them in enumeration order:

SELECT Suit FROM Cards
ORDER BY CASE SuitID OF
    WHEN 4 THEN 0 --Unknown first
    WHEN 1 THEN 1 --Heart
    WHEN 3 THEN 2 --Club
    WHEN 2 THEN 3 --Diamond
    WHEN 0 THEN 4 --Spade
    ELSE 999 END

The same work that is required if you save integer values is required if you save strings:

SELECT Suit FROM Cards
ORDER BY Suit; --where Suit is an enum name

Suit
-------
Club
Diamond
Heart
Spade
Unknown

But that's not the order we want - we want them in enumeration order:

SELECT Suit FROM Cards
ORDER BY CASE Suit OF
    WHEN 'Unknown' THEN 0
    WHEN 'Heart'   THEN 1
    WHEN 'Club'    THEN 2
    WHEN 'Diamond' THEN 3
    WHEN 'Space'   THEN 4
    ELSE 999 END

My opinion is that this kind of ranking belongs in the user interface. If you are sorting items based on their enumeration value: you're doing something wrong.

But if you wanted to really do that, i would create a Suits dimension table:

Suit SuitID Rank Color
Unknown 4 0 NULL
Heart 1 1 Red
Club 3 2 Black
Diamond 2 3 Red
Spade 0 4 Black

This way, when you want to change your cards to use Kissing Kings New Deck Order you can change it for display purposes without throwing away all your data:

Suit SuitID Rank Color CardOrder
Unknown 4 0 NULL NULL
Spade 0 1 Black 1
Diamond 2 2 Red 1
Club 3 3 Black -1
Heart 1 4 Red -1

Now we are separating an internal programming detail (enumeration name, enumeration value) with a display setting meant for users:

SELECT Cards.Suit 
FROM Cards
   INNER JOIN Suits ON Cards.Suit = Suits.Suit
ORDER BY Suits.Rank, 
   Card.Rank*Suits.CardOrder
    

Solution 2

Unless you have specific performance reasons to avoid it, I would recommend using a separate table for the enumeration. Use foreign key integrity unless the extra lookup really kills you.

Suits table:

suit_id suit_name
1       Clubs
2       Hearts
3       Spades
4       Diamonds

Players table

player_name suit_id
Ian Boyd           4
Shelby Lake        2
  1. If you ever refactor your enumeration to be classes with behavior (such as priority), your database already models it correctly
  2. Your DBA is happy because your schema is normalized (storing a single integer per player, instead of an entire string, which may or may not have typos).
  3. Your database values (suit_id) are independent from your enumeration value, which helps you work on the data from other languages as well.

Solution 3

As you say, ordinal is a bit risky. Consider for example:

public enum Boolean {
    TRUE, FALSE
}

public class BooleanTest {
    @Test
    public void testEnum() {
        assertEquals(0, Boolean.TRUE.ordinal());
        assertEquals(1, Boolean.FALSE.ordinal());
    }
}

If you stored this as ordinals, you might have rows like:

> SELECT STATEMENT, TRUTH FROM CALL_MY_BLUFF

"Alice is a boy"      1
"Graham is a boy"     0

But what happens if you updated Boolean?

public enum Boolean {
    TRUE, FILE_NOT_FOUND, FALSE
}

This means all your lies will become misinterpreted as 'file-not-found'

Better to just use a string representation

Solution 4

I would argue that the only safe mechanism here is to use the String name() value. When writing to the DB, you could use a sproc to insert the value and when reading, use a View. In this manner, if the enums change, there is a level of indirection in the sproc/view to be able to present the data as the enum value without "imposing" this on the DB.

Solution 5

We just store the enum name itself - it's more readable.

We did mess around with storing specific values for enums where there are a limited set of values, e.g., this enum that has a limited set of statuses that we use a char to represent (more meaningful than a numeric value):

public enum EmailStatus {
    EMAIL_NEW('N'), EMAIL_SENT('S'), EMAIL_FAILED('F'), EMAIL_SKIPPED('K'), UNDEFINED('-');

    private char dbChar = '-';

    EmailStatus(char statusChar) {
        this.dbChar = statusChar;
    }

    public char statusChar() {
        return dbChar;
    }

    public static EmailStatus getFromStatusChar(char statusChar) {
        switch (statusChar) {
        case 'N':
            return EMAIL_NEW;
        case 'S':
            return EMAIL_SENT;
        case 'F':
            return EMAIL_FAILED;
        case 'K':
            return EMAIL_SKIPPED;
        default:
            return UNDEFINED;
        }
    }
}

and when you have a lot of values you need to have a Map inside your enum to keep that getFromXYZ method small.

Share:
128,902
user20298
Author by

user20298

Updated on November 18, 2021

Comments

  • user20298
    user20298 over 2 years

    What is the best way to save enums into a database?

    I know Java provides name() and valueOf() methods to convert enum values into a String and back. But are there any other (flexible) options to store these values?

    Is there a smart way to make enums into unique numbers (ordinal() is not safe to use)?

    Update:

    Thanks for all awesome and fast answers! It was as I suspected.

    However a note to 'toolkit'; That is one way. The problem is that I would have to add the same methods to each Enum type I create. Thats a lot of duplicated code and, at the moment, Java does not support any solutions for this (a Java enum cannot extend other classes).

    • palantus
      palantus over 15 years
      Why is ordinal() not safe to use?
    • Sherm Pendley
      Sherm Pendley over 15 years
      What kind of database? MySQL has an enum type, but I don't think it's standard ANSI SQL.
    • oxbow_lakes
      oxbow_lakes over 15 years
      Because any enumerative additions must then be put on the end. Easy for an unsuspecting developer to mess this up and cause havoc
    • palantus
      palantus over 15 years
      I see. Guess it's a good thing I don't deal with databases much, because I probably wouldn't have thought of that until it was too late.
    • Kishan Vaishnav
      Kishan Vaishnav over 4 years
  • ddimitrov
    ddimitrov over 15 years
    toString is often overriden to provide display value. name() is a better choice as it's by definition the counterpart of valueOf()
  • mistertodd
    mistertodd over 15 years
    What is this "name()" method you speak of? What is this "valueOf()" method you speak of? The only way i can find to convert an enumeration member to a string is with the .ToString() method of the enumeration variable.
  • flicken
    flicken over 15 years
    @anonymousstackoverflowuser.openid.org: See java.sun.com/j2se/1.5.0/docs/api/java/lang/Enum.html#name()
  • Helios
    Helios over 15 years
    Excellent implementation and answer. Many thanks for the advice
  • sath garcia
    sath garcia almost 15 years
    I'm using a hybrid approach of your solution and @Ian Boyd's solution with great success. Thanks for the tip!
  • mamu
    mamu almost 15 years
    I strongly disagree with this, if enum persistence is required then should not persist names. as far as reading it back goes it is even simpler with value instead of name can just typecast it as SomeEnum enum1 = (SomeEnum)2;
  • mistertodd
    mistertodd almost 15 years
    mamu: What happens when the numeric equivalents change?
  • Jason
    Jason over 14 years
    Nice to use this with a default enum value to fall back on in deserialize. For example, catch the IllegalArgEx and return Suit.None.
  • Jason
    Jason over 14 years
    While I agree it is nice to have it normalized, and constrained in the DB, this does cause updates in two places to add a new value (code and db), which might cause more overhead. Also, spelling mistakes should be nonexistent if all updates are done programatically from the Enum name.
  • xanadont
    xanadont over 14 years
    @basszero If in .Net, for flags you want to use the [Flags] attribute: msdn.microsoft.com/en-us/library/…
  • Nick Spacek
    Nick Spacek over 12 years
    Another vote for names, and we were able to run delta scripts to convert old numerical values to the newer name-based (using MySQL, so that helped).
  • mistertodd
    mistertodd over 12 years
    Say the average enumeration name length is 7 characters. Your enumID is four bytes, so you have an extra three bytes per row by using names. 3 bytes x 1 million rows is 3MB.
  • Steve Perkins
    Steve Perkins about 12 years
    I agree with the comment above. An alternative enforcement mechanism at the database level would be to write a constraint trigger, which would reject inserts or updates that try to use an invalid value.
  • maaartinus
    maaartinus over 10 years
    @IanBoyd: But an enumId surely fits in two bytes (longer enums are not possible in Java) and most of them fit in a single byte (which some DB support). The saved space is negligible, but the faster comparison and the fixed length should help.
  • Tautvydas
    Tautvydas about 10 years
    I would discourage anyone using this approach. Tying yourself to string representation limits code flexibility and refactoring. You should better use unique ids. Also storing strings wastes storage space.
  • Omid Aminiva
    Omid Aminiva over 8 years
    and if need to query the enum you can use Enum.valueOf(Suit .class, c.getString(c.getColumnIndex("Suit")))
  • ebyrob
    ebyrob over 7 years
    Why would I want to declare the same information in two places? Both in CODE public enum foo {bar} and CREATE TABLE foo (name varchar); that can easily get out of sync.
  • Kuchi
    Kuchi over 7 years
    If you don't want to maintain a switch statement and can ensure that dbChar is unique you could use something like: public static EmailStatus getFromStatusChar(char statusChar) { return Arrays.stream(EmailStatus.values()) .filter(e -> e.statusChar() == statusChar) .findFirst() .orElse(UNDEFINED); }
  • afk5min
    afk5min about 7 years
    If we take the accepted answer at face value, that is that the enum names are only used for manual investigations, then this answer is indeed the best option. Also, if you go on changing enumeration order or values or names, you will always have much more problems than maintaining this extra table. Especially when you only need it (and may choose to create only temporarily) for debugging and support.
  • SebastianRiemer
    SebastianRiemer almost 6 years
    What about sorting? If I want to sort results in the query by the stored enum, when choosing the String-representation it would be implicitly defined by the names of the enumeration, whereas if we implement an int-value for each enumeration value, the ordering is left to the programmer.
  • mistertodd
    mistertodd almost 6 years
    @SebastianRiemer The ordering isn't left to the programmer. If the programmer wanted to insert a new enumeration value in the middle: i can't ; not without breaking all existing values. But say the programmer did mistakenly save the int values of the enum, and how you wanted to do sorting. If i want to sort the results in the query, but i only have access to the corresponding int value: how would i do it? Simple: ORDER BY CASE WidgetID WHEN 1 THEN 1 WHEN 2 THEN 3 WHEN 3 THEN 2 END. Same for by name.
  • SebastianRiemer
    SebastianRiemer almost 6 years
    To clarify consider this example, enumeration with these values: apple, banana, grapefruit, orange; when using the name we can either sort ASC or DESC; ASC: apple, banana, grapefruit, orange; DESC: orange, grapefruit, banana, apple; whereas if i choose to give them a dedicated integer-value (by constructor and member field of type int) like this: grapefruit(1), banana(2), orange(3), apple(4) i am free to choose which ordering I want to apply; inserting new values isn't regarded in my comment and this example
  • mistertodd
    mistertodd almost 6 years
    @SebastianRiemer Yes, if you insert strings: you cannot sort them as integers.
  • Andrey M. Stepanov
    Andrey M. Stepanov over 5 years
    In your example Suit theSuit = Suit.valueOf(reader["Suit"]); you probably meant Suit theSuit = Suit.valueOf(reader["Spade"]); or smth?
  • mistertodd
    mistertodd over 5 years
    @AndreyM.Stepanov The database doesn't contain a column named Spade
  • Luis Gouveia
    Luis Gouveia over 4 years
    I agree with this solution because it improves readability. I also agree that database space isn't an issue here. However, I am surprised nobody discussed the fact that filtering by an INT is quicker than filtering a varchar(50). This is for me the only weak spot of this solution.
  • mistertodd
    mistertodd over 4 years
    @LuisGouveia I'd be curious to see the database system that has anything more than theoretical performance issues filtering on varchar(50) as opposed to int.
  • Luis Gouveia
    Luis Gouveia over 4 years
    @IanBoyd, don't take it personally. Like I told you, your solution is the best one. However, if you doubt my comment, you can see the following accepted answer: stackoverflow.com/questions/2346920/…
  • mistertodd
    mistertodd over 4 years
    @LuisGouveia I wasn't doubting a theoretical difference. I was looking at a practical difference. In other words: it's a premature micro-optimization whose extra time is 'in the noise' as we say. See the other answer - the one with actual benchmarks (stackoverflow.com/a/39650204/12597).
  • Luis Gouveia
    Luis Gouveia over 4 years
    @IanBoyd, I agree with you, the difference is small, but only negligeable if the size of the columns are similar, which is the case in the example you're showing me. However, a code can easily use a varchar(50) - 54 bytes, which is not comparable with a bigint - 8 bytes. I believe the time spent would double (or more) if in the example given we would use a 50 char string, but please tell me if you believe I'm wrong.
  • mistertodd
    mistertodd over 4 years
    @LuisGouveia I agree with you that the time could double. Causing a query that takes 12.37 ms to instead take 12.3702 ms. That's what i mean by "in the noise". You run the query again and it takes 13.29 ms, or 11.36 ms. In other words, the randomness of the thread scheduler will drastically swamp any micro optimization you theoretically have that is in no way visible to anyone in any way ever.
  • Luis Gouveia
    Luis Gouveia over 4 years
    @IanBoyd, I see your point. I thought the difference was small but not negligeable. Your point is clear: the difference is in fact negligeable. Thank you! I think the whole point of our discussion was resumed in this thread and its accepted answer: stackoverflow.com/questions/183201/…
  • Zoidberg
    Zoidberg over 4 years
    Anyone ever think of what a junior could do in this code without knowing? Changing the order of enums seems benign and wouldn't really be caught in code review. Changing an existing name of an ENUM that is known to be stored in the db is something that will be blatant in code review and will be addressed. Have to think more on the side of maintainability in this case, because the downside of getting the ENUM type wrong could be disastrous. Especially if the update is released, and new data (with new ordinals) is mixed with old data (with previous ordinals), good luck fixing that!
  • Zoidberg
    Zoidberg over 4 years
    I'd also like to add that ORMs typically won't even LOAD existing data that has a mismatched enum name and you'll know immediately should the name of an enum change when it goes to production, where as an ordinal change can hum along silently causing untold damage to your data.