One of the goals for a business analyst is telling a complete story with data that will deliver insights on how business has been doing or how to improve. This is done through business intelligence dashboards that pull data from a data warehouse.
The challenge for business users comes in when they try to connect information from different fact tables within a data warehouse. This typically requires a request to a data engineer who creates a data mart within the data warehouse consisting of multiple star schemas that can lead to data loss or data duplication.
This complexity drove Francesco Puppini to examine a data warehouse schema design that would better support the end user. And he’s addressed this in the book he coauthored with Bill Inmon, The Unified Star Schema: An Agile and Resilient Approach to Data Warehouse and Analytics Design, published by Technics Publications.
What are star and snowflake schemas?
Before discussing the revolution of the Unified Star Schema, let’s look at the star schema itself.
A star schema is the simplest dimensional modeling form. The schema consists of facts and dimension. Facts center on events, while dimensions reference information related to facts. Dimensions relate to facts in a way that resembles a star.
Additionally, snowflake schemas relate dimensions to other dimensions and continue to branch out. According to Puppini, snowflake schemas are a best fit when data has no pitfalls.
“And data always has pitfalls,” he said.
Benefits of the Unified Star Schema
“When you write data in a database, it’s organized a certain way,” Puppini said. “But the moment you read it, it’s better to organize it in a different way.”
The main challenge with star and snowflake schemas for end users is a lack of independence for business users in finding insights. In order to perform complex queries with data with a data warehouse schema design like these, end users need a data engineer to join tables manually. It’s similar to calling a plumber every time you need a glass of water, Puppini said. Additionally, merging fact tables with a join typically causes duplicate data when joining incompatible tables.
The core problem for data duplication is the fan trap, he said. The fan trap happens when two or more one-to-many joins are linked. This creates conflicts with the data, which leads to duplication of data. Historically, this has been fixed by a manual deduplication or by these ad hoc joins.
The key to the Unified Star Schema is the bridge that connect fact tables. In the book, Puppini compares the bridge to a telephone switchboard. The bridge is central in the Unified Star Schema with all the fact tables and dimensions around it, and every table connects to the bridge. This then limits the number of transformations to the data.
“The problem of data warehouses today is that they are overworking,” Puppini said. “They are over-transforming.”
By using the bridge and performing a union instead of a join, this allows users to reach the data without transforming it.
“The more you transform data, the more you make a mess,” he said. “I just took the minimum number of steps necessary to put the information together.”
Check out this excerpt from The Unified Star Schema by Bill Inmon and Francesco Puppini, published by Technics Publications, to learn more about implementation and use cases for the Unified Star Schema.