Column oriented database vs row oriented database

13,436

Solution 1

I am having trouble with your terminology. You are describing an EAV structure (standing for Entity-Attribute-Value).

Aside: A "column-oriented" database usually refers to a database that stores each column separately from others (when I learned about databases, this was called "vertical partitioning", but I don't think that caught on). Examples include Paracel and Vertica.

An entity-attribute-value database is storing each attribute for an entity as a separate row.

The first problem that you have with your particular structure is typing. Some of the attributes are strings and some are numbers. This becomes a management nightmare in an EAV world. Either you store everything as strings (losing the ability to type check values and to guarantee that arithmetic words) or you include multiple columns for different types with a type column (making queries much more complicated).

Similarly, constraints and foreign key references are much harder to implement. Also, because you are repeating the entity id and attribute id on each row, the data often takes up more space. NULL values are typically quite space efficient.

On the OLTP side, you have another problem. When you want to insert an entity, you typically want to insert a bunch of attributes as well. One insert has now turned into many inserts, and you'll want to start wrapping these in transactions, affecting performance.

Given all these shortcomings, you might think never use EAV models. There is a place for them. They are particularly useful when attributes are changing over time. Say, if you have an application where users can put in their own information with tags. In such cases, a hybrid approach is the best solution. Use a regular relational table with many columns for the common information. Use an EAV table for optional information for each entity.

Solution 2

Source: WIKI

  1. Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
  2. Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
  3. Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
  4. Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.

In practice, row-oriented storage layouts are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. Column-oriented storage layouts are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve a smaller number of highly complex queries over all data (possibly terabytes).

Solution 3

In addition to the problems Gordon Linoff mentions, EAV data models are also fiendishly hard to query - find all cars where the make is BMW and the months between 12 and 24 and the cost < 10000 becomes a huge jumble of nasty SQL, especially if you're doing string comparison on numbers...

Share:
13,436
Ron
Author by

Ron

Updated on June 04, 2022

Comments

  • Ron
    Ron about 2 years

    I have used row oriented database design for long time and except for datawarehouse projects and Big data samples, I have not used column oriented database design for OLTP app.

    My row oriented table looks like

    ID, Make, Model, Month, Miles, Cost
    1   BMW   Z3     12     12000  100
    

    Some people in our team advocating column oriented database design. They suggest that all the column names should be property names in a Property table. Then another table Quote will have two columns PropertyName and PropertyValue.

    In the .net code, we read each key and compare and convert to strongly typed object. The code is really getting messy.

    if (qwi.DomainCode == typeof(CoreBO.Base.iQQConstants.MBPCollateralInfo).Name)
         {
            if (qwi.RefCode == iQQConstants.MBPCollateralInfo.ENGINETYPE)
            {
               Aspiration = qwi.Value;
            }
            else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.FUELTYPE)
            {
               FuelType = qwi.Value;
            }
            else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.MAKE)
            {
               Make = qwi.Value;
            }
            else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.MILEAGE)
            {
               int reading = 0;
               bool success = int.TryParse(qwi.Value, out reading);
               if (success)
               {
                  OdometerReading = reading;
               }
    }
    }
    

    The arguement for this column oriented design is that we won't have to change table schema and the stored proc(we are still using stored proc instead of Entity Framework).

    Seems like we are heading into real problem. Is Column oriented design well accepted in the industry.

    • nvogel
      nvogel almost 11 years
      The question appears to have nothing to do with "column-oriented" (column store) DBMSs. The question is actually about the EAV design pattern.
    • Mariusz
      Mariusz over 8 years
      "we are still using stored proc instead of Entity Framework" I played with EF in many smaller and mid size projects and don't think anymore that EF is a way to go. From what I tested and understand EF needs to have data in the memory to be able to manipulate it (i.e. update) which is a door stopper quite often when serious work is considered.
    • Mariusz
      Mariusz over 8 years
      "we are still using stored proc instead of Entity Framework" EF seems to need the data in memory to be able to modify it, so something as simple as: UPDATE dbo.MyTable SET Active = 1 WHERE Active = 0 for large amount of data may be not trivial using EF (but I love it also as it kicks the butt big time)
  • dnoeth
    dnoeth almost 11 years
    Of course your correct, but the question didn't ask for columnar storage :-)