-
Notifications
You must be signed in to change notification settings - Fork 31
Home
I created Transformalize to transform normalized relational data into star schema. This is something you do when you build data warehouses.
After doing a few manually, I realized it could be automated, or configuration based. At the start, I knew I wanted these features:
- 100% configurable
- An ETL (Extract, Transform, and Load) data pipeline
- Automated Input and output for SQL Server
- Rebuild the output from scratch
- Just update the output with new data from the input
- Simple inline transformations
- Calculated fields
As time passed, I realized I wanted these features also:
- Configuration inheritance
- Razor Templates to transform the configuration in different ways
- Actions to perform before and after the process
- Aggregation, to flatten one to many relationships
- Custom Javascript, Razor Templates, and C# transformations.
- Other input and output options
###Relational, Normalized Model### Basically, you'll find that relational databases are often made up of many tables. There are so many tables because the designer is trying to keep all the different types of information separate. He doesn't want the same data duplicated across tables, because it creates the possibility for inconsistency. He could reduce the number of tables by adding more columns, but he would limit himself by the number of columns added. So, you guessed it, he makes more tables instead. More or less, this process of finding the right place to put the right data is normalization, and usually it translates into "the more normalized it is, the more tables you have."
With a normalized database, the tables are related to each other by keys (a unique identifier for a record in the table). One table joins to another, that table joins to another, and so on and so forth.
###Star Model### To create a star model, you pick out one important part of the data. Pick a key fact that you're interested in. It should have some numbers in it. For example, in an e-commerce database, it might be an order detail record with quantities and prices. Then, you pull the important fact, and the keys that relate to the fact into a single table, so that there is no more than one relationship between the fact, and any other related table. This de-normalizes the data, but that's okay because you're not using it for OLTP (online transaction processing) anymore. Instead, you're just reading the data. You get a read performance boost, plus, with the help of a single view that joins the star together, your queries won't need any joins at all.