Star-Schema Design

35,568

Solution 1

Using star schemas for a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also have an operational data store (ODS) - a normalised structure that holds 'current state' and facilitates operations such as data conformation. However there are reasonable situations where this is not desirable. I've had occasion to build systems with and without ODS layers, and had specific reasons for the choice of architecture in each case.

Without going into the subtlties of data warehouse architecture or starting a Kimball vs. Inmon flame war the main benefits of a star schema are:

  • Most database management systems have facilities in the query optimiser to do 'Star Transformations' that use Bitmap Index structures or Index Intersection for fast predicate resolution. This means that selection from a star schema can be done without hitting the fact table (which is usually much bigger than the indexes) until the selection is resolved.

  • Partitioning a star schema is relatively straightforward as only the fact table needs to be partitioned (unless you have some biblically large dimensions). Partition elimination means that the query optimiser can ignore patitions that could not possibly participate in the query results, which saves on I/O.

  • Slowly changing dimensions are much easier to implement on a star schema than a snowflake.

  • The schema is easier to understand and tends to involve less joins than a snowflake or E-R schema. Your reporting team will love you for this

  • Star schemas are much easier to use and (more importantly) make perform well with ad-hoc query tools such as Business Objects or Report Builder. As a developer you have very little control over the SQL generated by these tools so you need to give the query optimiser as much help as possible. Star schemas give the query optimiser relatively little opportunity to get it wrong.

Typically your reporting layer would use star schemas unless you have a specific reason not to. If you have multiple source systems you may want to implement an Operational Data Store with a normalised or snowflake schema to accumulate the data. This is easier because an ODS typically does not do history. Historical state is tracked in star schemas where this is much easier to do than with normalised structures. A normalised or snowflaked Operational Data Store reflects 'current' state and does not hold a historical view over and above any that is inherent in the data.

ODS load processes are concerned with data scrubbing and conforming, which is easier to do with a normalised structure. Once you have clean data in an ODS, dimension and fact loads can track history (changes over time) with generic or relatively simple mechanisms relatively simply; this is much easier to do with a star schema, Many ETL tools (for example) provide built-in facilities for slowly changing dimensions and implementing a generic mechanism is relatively straightforward.

Layering the system in this way providies a separation of responsibilities - business and data cleansing logic is dealt with in the ODS and the star schema loads deal with historical state.

Solution 2

There is an ongoing debate in the datawarehousing litterature about where in the datawarehouse-architecture the Star-Schema design should be applied.

In short Kimball advocates very highly for using only the Star-Schema design in the datawarehouse, while Inmon first wants to build an Enterprise Datawarehouse using normalized 3NF design and later use the Star-Schema design in the datamarts.

In addition here to you could also say that Snowflake schema design is another approach.

A fourth design could be the Data Vault Modeling approach.

Solution 3

Star schemas are used to enable high speed access to large volumes of data. The high performance is enabled by reducing the amount of joins needed to satsify any query that may be made against the subject area. This is done by allowing data redundancy in dimension tables.

You have to remember that the star schema is a pattern for the top layer for the warehouse. All models also involve staging schemas at the bottom of the warehouse stack, and some also include a persistant transformed merged staging area where all source systems are merged into a 3NF modelled schema. The various subject areas sit above this.

Alternatives to star schemas at the top level include a variation, which is a snowflake schema. A new method that may bear out some investigation as well is Data Vault Modelling proposed by Dan Linstedt.

Solution 4

The thing about star schemas is they are a natural model for the kinds of things most people want to do with a data warehouse. For instance it is easy to produce reports with different levels of granularity (month or day or year for example). It is also efficient to insert typical business data into a star schema, again a common and important feature of a data warehouse.

You certainly can use any kind of database you want but unless you know your business domain very well it is likely that your reports will not run as efficiently as they could if you had used a star schema.

Solution 5

Star schemas are a natural fit for the last layer of a data warehouse. How you get there is another question. As far as I know, there are two big camps, those of Bill Inmon and Ralph Kimball. You might want to look at the theories of these two guys if/when you decide to go with a star.

Also, some reporting tools really like the star schema setup. If you are locked into a specific reporting tool, that might drive what the reporting mart looks like in your warehouse.

Share:
35,568
user1066101
Author by

user1066101

Software Architect, aspiring writer. Programmer for well over 30 years, about 70% of my working life. Blog: S.Lott-Software Architect. Books: Building Skills. Technorati: SLott. LinkedIn: Profile. Ohloh: s_lott.

Updated on July 08, 2022

Comments

  • user1066101
    user1066101 almost 2 years

    Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern?

  • user1066101
    user1066101 about 15 years
    "other than relational database engines"... Interesting. What design pattern do they use for the data? A star schema or some other kind of design?
  • csaba
    csaba about 15 years
    Multidimensional (MOLAP) databases store their data in various multidimensional array structures. Conceptually, in my interpretation, when building a data warehouse we build a conceptual data model first (with dimensions and data cubes), then we map it to the logical level (tables and constraints), which is then implemented on phisical level (files on disks, handled by the DBMS). MOLAP engines however map the conceptual model directly to phisical level. As star schema is the logical model of relational dws, therefore it is omitted in a MOLAP environment.
  • Hamish Grubijan
    Hamish Grubijan almost 14 years
    It is basically object-oriented design in SQL ;)
  • Marcus D
    Marcus D about 13 years
    There is also Data Vault Modelling (en.wikipedia.org/wiki/Data_Vault_Modeling) now as a layer below your data marts.