Designing a SQL schema for a combination of many-to-many relationship (variations of products)

32,864

Solution 1

Applying normalization to your problem, the solution is as given. Run and see it on SQL Fiddle.

CREATE TABLE products (
    product_id  int AUTO_INCREMENT PRIMARY KEY,
    name        varchar(20),
    description varchar(30)
);

INSERT INTO products
    (name, description)
VALUES
    ('Rug', 'A cool rug' ),
    ('Cup', 'A coffee cup');

-- ========================================

CREATE TABLE variants (
    variant_id int AUTO_INCREMENT PRIMARY KEY,
    variant    varchar(50)
);

INSERT INTO variants
    (variant)
VALUES
    ('color'),
    ('material'),
    ('size');

-- ========================================

CREATE TABLE variant_value (
    value_id   int AUTO_INCREMENT PRIMARY KEY,
    variant_id int,
    value      varchar(50)
);

INSERT INTO variant_value
    (variant_id, value)
VALUES
    (1, 'red'),
    (1, 'blue'),
    (1, 'green'),
    (2, 'wool'),
    (2, 'polyester'),
    (3, 'small'),
    (3, 'medium'),
    (3, 'large');

-- ========================================

CREATE TABLE product_variants (
    product_variants_id int AUTO_INCREMENT PRIMARY KEY,
    product_id          int,
    productvariantname  varchar(50),
    sku                 varchar(50),
    price               float
);

INSERT INTO product_variants
    (product_id, productvariantname, sku, price)
VALUES
    (1, 'red-wool', 'a121', 50),
    (1, 'red-polyester', 'a122', 50);

-- ========================================

CREATE TABLE product_details (
    product_detail_id   int AUTO_INCREMENT PRIMARY KEY,
    product_variants_id int,
    value_id            int
);

INSERT INTO product_details
    (product_variants_id, value_id)
VALUES
    (1, 1),
    (1, 4),
    (2, 1),
    (2, 5);

Solution 2

Part of your issues stem from a confusion between product and SKU.

When you sell, "XYZ pullover, size M, blue model", the latter corresponds to an SKU. It is marketed as an XYZ pullover (the product), which has a set of attributes (size and colors), each with their own set of potential values. And not all possible combinations of the latter might yield valid deliverables: you won't find absurdly thin and long jeans. SKUs, products, attributes, attribute values.

And when a user wants a $10 blue pullover, he's actually looking for an SKU within a product category.

I hope the above clears up your confusion and where your problem and question stem from.

In terms of schema, you want something like this:


products

  • #product_id
  • name
  • description

Optionally, also add:

  • price
  • in_stock

This is a marketing related table. Nothing else. If anything outside of marketing uses a product in your application, you'll end up in a world of pain down the road.

The price, if present, is a master price used to populate the field when it's null in SKUs. This makes price entry more user-friendly.

in_stock is a hopefully self-explanationary flag, ideally maintained by a trigger. It should be true if any SKU related to that product is in stock.


product_attributes

  • product_id
  • #attribute_id
  • name

product_attribute_values

  • attribute_id
  • #value_id
  • value

This just holds things like Color, Size, etc., along with their values like blue, red, S, M, L.

Note the product_id field: create a new set of attributes and values per product. Sizes change depending on the product. Sometimes it's S, M, L, etc.; other times, it'll be 38, 40, 42, and what not. Sometimes, Size is enough; other times, you need Width and Length. Blue might be a valid color for this product; another might offer Navy, Royal Blue, Teal and what not. Do NOT assume that there is any relationship between one product's attributes and those of another; the similarities, when they exist, are entirely cosmetic and coincidental.


SKUs

  • product_id
  • #sku_id
  • price

Optionally, add:

  • name
  • barcode
  • stock

This corresponds to the deliverables that get shipped.

It's actually the most important table underneath. This, rather than the product_id, is almost certainly what should get referenced in customer orders. It's also what should get referenced to for stock-keeping and so forth. (The only exception I've ever seen to the latter two points is when you sell something really generic. But even then, the better way to deal with this in my experience is to toss in an n-m relationship between interchangeable SKUs.)

The name field, if you add it, is primarily for convenience. If left null, use app-side code to make it correspond to the generic product's name, expanded if necessary with the relevant attribute names and values. Filling it allows to rephrase the latter generic name ("Levis' 501, W: 32, L: 32, Color: Dark Blue") with something more natural ("Levis' 501, 32x32, Dark Blue").

In case it matters, stock is better maintained using a trigger in the long run, with a double-entry bookkeeping schema in the background. This allows to distinguish between in stock and available for shipment today (which is the figure that you actually want here) vs in stock but already sold, among the multitudes of real-world scenarios that you'll encounter. Oh, and... it's occasionally a numeric, rather than an integer, if you ever need to sell anything measured in kilos or liters. If so, be sure to add an extra is_int flag, to avoid customers sending you orders for .1 laptops.


product_variants

  • product_id
  • #sku_id
  • #attribute_id
  • value_id

This links the deliverable's id with the corresponding attributes and values, for the sake of generating default names.

The primary key is on (sku_id, attribute_id).

You might find the product_id field an aberrance. It is, unless you add foreign keys referencing:

  • SKUs (product_id, sku_id)
  • product_attributes (product_id, attribute_id)
  • product_attribute_values (attribute_id, value_id)

(Don't forget the extra unique indexes on the corresponding tuples if you decide to add these foreign keys.)


Three additional remarks in conclusion.

Firstly, I'd like to stress once again that, in terms of flow, not all combinations of attributes and values yield a valid deliverable. Width might be 28-42 and length might be 28-42, but you probably won't see a seriously skinny 28x42 jeans. You're best off NOT automatically populating every possible variation of every product by default: add UI to enable/disable them as needed, make it checked by default, alongside name, barcode and price fields. (Name and price will usually be left blank; but one day, you'll need to organize a sale on blue pullovers only, on grounds that the color is discontinued, while you continue to sell the other options.)

Secondly, keep in mind, if you ever need to additionally manage product options, that many actually are product attributes in disguise, and that those that aren't yield new SKUs that must also be taken into account when it comes to stock-keeping. A bigger HD option for a laptop, for instance, is really a variant of the same product (Normal vs Large HD size) that is masquerading as an option due to (very valid) UI considerations. In contrast, wrapping the laptop as a christmas gift is a genuine option that has references a completely separate SKU in bookkeeping terms (e.g. .8m of gift wrap) -- and, should you ever need to come up with average marginal costs, a fraction of staff time.

Lastly, you'll need to come up with an ordering method for your attributes, their values, and the subsequent variants. For this, the easiest is to toss in an extra position field in the attributes and values tables.

Solution 3

I would use 4 tables:

generic_product: product_id, name, description 

e.g. 1, 'rug', 'a coffee rug' / 2, 'mug', 'a coffee mug'

generic_product_property: product_id, property_id, property_name 

e.g. 1, 10, 'color' / 1, 11, 'material'

sellable_product: sku, product_id, price 

e.g. 'A121', 1, 50.00 / 'A122', 1, 45.00

sellable_product_property: sku, property_id, property_value 

e.g. 'A121', 10, 'red' / 'A121', 11, 'wool' / 'A122', 10, 'green' / 'A122', 11, 'wool'

This will allow your user to define any property for your sellable products he wants.

Your application will have to ensure with its business logic that sellable_products are described completely (check that for every applicable generic product property the sellable product property is defined).

Solution 4

Sku is your primary key. You can setup foreign key relationships to the variants table with sku. Forget about productid entirely.

Create table x (sku, price, description) primary key sku

Solution 5

In general terms, you're looking for what's called a grouper or a junk dimension. Basically it's just an row for every combination.@sahalMoidu's schema looks like it should give you what you are asking for.

But before getting too hung up on normalization, you need to know if the db is there for storing data (transactional, etc) or for getting data out (dimensional, reporting, etc). Even if it is a transactional database, you have to ask yourself what you are trying to accomplish by normalization.

Share:
32,864

Related videos on Youtube

Zaki Aziz
Author by

Zaki Aziz

Updated on October 02, 2021

Comments

  • Zaki Aziz
    Zaki Aziz over 2 years

    I hope the title is somewhat helpful. I'm using MySQL as my database

    I am building a database of products and am not sure how to handle storing prices/SKU of variations of a product. A product may have unlimited variations, and each variation combination has its own price/SKU/etc..

    This is how I have my products/variations table set up at the moment:

    PRODUCTS
    +--------------------------+
    | id | name | description  |
    +----+------+--------------+
    | 1  | rug  | a cool rug   |
    | 2  | cup  | a coffee cup |
    +----+------+--------------+
    
    PRODUCT_VARIANTS
    +----+------------+----------+-----------+
    | id | product_id | variant  | value     |
    +----+------------+----------+-----------+
    | 1  | 1          | color    | red       |
    | 2  | 1          | color    | blue      |
    | 3  | 1          | color    | green     |
    | 4  | 1          | material | wool      |
    | 5  | 1          | material | polyester |
    | 6  | 2          | size     | small     |
    | 7  | 2          | size     | medium    |
    | 8  | 2          | size     | large     |
    +----+------------+----------+-----------+
    
    (`products.id` is a foreign key of `product_variants.product_id`)
    

    I've created an SQLFiddle with this sample data: http://sqlfiddle.com/#!2/2264d/1

    The user is allowed to enter any variation name (product_variants.variant) and can assign any value to it (product_variants.value). There should not be a limit the amount of variations/values a user may enter.

    This is where my problem arises: storing prices/SKU for each variation without adding a new table/column every time someone adds a product with a variant that did not exist before.

    Each variant may have the same price but the SKU is unique to each product. For example Product 1 has 6 different combinations (3 colors * 2 materials) and Product 2 only has 3 different combination (3 sizes * 1).

    I've thought about storing the combinations as a text, i.e:

    +------------+-----------------+-------+------+
    | product_id | combination     | price | SKU  |
    +------------+-----------------+-------+------+
    | 1          | red-wool        | 50.00 | A121 |
    | 1          | red-polyester   | 50.00 | A122 |
    | 1          | blue-wool       | 50.00 | A123 |
    | 1          | blue-polyester  | 50.00 | A124 |
    | 1          | green-wool      | 50.00 | A125 |
    | 1          | green-polyester | 50.00 | A125 |
    | 2          | small           | 4.00  | CD12 |
    | 2          | medium          | 4.00  | CD13 |
    | 2          | large           | 3.50  | CD14 |
    +------------+-----------------+-------+------+
    

    But there must be a better, normalized, way of representing this data. Hypothetical situation: I want to be able to search for a blue product that is less than $10. With the above database structure it is not possible to do without parsing the text and that is something I want to avoid.

    Any help/suggestions are appreciated =)

  • ChuckCottrill
    ChuckCottrill over 10 years
    Also, you can use nullval (coalesce) to provide default price from base product when variant sku price not given.
  • Trey Stout
    Trey Stout over 9 years
    Sorry for replying to an older answer @Denis, but such is SO. I have a question regarding how a SKU tracks its collection of attribute values. So if I made color and material two attributes on a chair product, then set 2 possible values for each attribute. When I go to make my first SKU, shouldn't the SKU table hold references to the various attribute value ids? Thank you for a great write-up, I'm just confused as to your final table (product_variants)
  • Denis de Bernardy
    Denis de Bernardy over 9 years
    @TreyStout: In my view, it's primarily a UI problem. Even in a trigger oriented app, I wouldn't populate the variants table automatically when attributes and their values are created, because doing so gets messy and ends up doing a lot of computation for nothing. Instead, I'd have the product creation interface populate a full list of potential SKUs on the fly, based on the attributes and their values. Those which a human operator assigns an SKU code to (i.e. they're actual products rather than potentially existing ones) are then the only ones I'd then store into the database.
  • Brendan Vogt
    Brendan Vogt almost 9 years
    @DenisdeBernardy I am busy implementin your solution on a project of mine, and was wondering if you might go and look at my question that I posted at: stackoverflow.com/questions/30995983/…. I hope to hear from you :)
  • Mads Nielsen
    Mads Nielsen almost 9 years
    @ChuckCottrill: I would not recommend that, since you'd mix your business logic into your data layer
  • Carlo
    Carlo about 8 years
    Can you explain the use of the product_details table?
  • Laxman
    Laxman almost 8 years
    Instead of two tables products and product_variants, will it be better if a single table product_variants is used with product_id on table product_variants referring back to the parent product id.
  • Green
    Green over 7 years
    Can you explain the use of the product_details table?
  • Benny Thadikaran
    Benny Thadikaran over 5 years
    @Green Perhaps you already figured - The product_details table, contains the foreign keys for product_variants and variants_value. The product_details table should allow joining all other tables together.
  • CodeTrooper
    CodeTrooper over 5 years
    Where would one keep the stock of a product? And the stock of a specific variant?
  • CodeTrooper
    CodeTrooper over 5 years
    How would the stock be managed here? I would imagine since we have variants per product_id, there would be a stock per each variation? But then how would products that don't have any variation handle stock?
  • andcl
    andcl over 5 years
    What if product_attributes were not limited to only several values (held in product_attribute_values table) and could be user inputted to fully customise their products? Think of a product whose width could be customised through a number input field. Any ideas?
  • Rick James
    Rick James over 5 years
    You will eventually realize that you "over-normalized" by having variant_value being separate. EAV schema pattern is bad enough; that just makes it worse.
  • bfl
    bfl over 5 years
    Say you want to get all products with the color "blue", how would you handle that with this design?
  • vikrant
    vikrant over 3 years
    @CodeTrooper I would like to know the answer too
  • DmitriBodiu
    DmitriBodiu over 3 years
    value of a productAttribute is varchar? what would be the best way to query products where bestBeforeDate is between a specific range? I have BestBeforeDate attribute.
  • DmitriBodiu
    DmitriBodiu over 3 years
    value of an Attribute is varchar? what would be the best way to query products where bestBeforeDate is between a specific range? I have BestBeforeDate attribute (variant).
  • DmitriBodiu
    DmitriBodiu over 3 years
    value of an Attribute is varchar? what would be the best way to query products where bestBeforeDate is between a specific range? I have BestBeforeDate attribute (variant).
  • DmitriBodiu
    DmitriBodiu over 3 years
    value of an property_value is varchar? what would be the best way to query products where bestBeforeDate is between a specific range? I have BestBeforeDate attribute (variant).
  • xwoker
    xwoker over 3 years
    This depends on the representation of the bestBeforeDate. if its currentTimeMilis its trivial, otherwise it depends on the capabilities of your database. You could also have mutliple type property_value fields (e.g. one of type datetime). Then it would be a tradeoff between generic solution and best for your specific UCs
  • aasutossh
    aasutossh over 2 years
    @Carlo it looks like products in stock.
  • vinayak
    vinayak over 2 years
    How will you write query to fetch, color red + material wool and polyster from database? putting relation keys in brackets separated by commas is not a good idea.