Lesson 68. Prepopulating A Junk Dimension
Last updated
Last updated
The mathematical properties of junk dimensions are fascinating to me. Let me explain.
Junk dimensions are usually a collection of flags and other values with low cardinality. The records in a junk dimension represent unique combinations of all the columns. Since the number of possible values in each column is finite, using basic combinatorics, you can calculate how many records are going to be in that table if you know all the values a priori.
If we let x represent the number of possible values in a column, then total number of records in a table is defined by:
Total number of records = x1 * x2 * x3*…xn
Suppose we had a table with the following profile.
This means that the total number of records in this table will be:
5 * 3 * 2 = 30
This is not just an academic exercise. You can use this information to determine if there will be too many records in this table and maybe consider splitting up some of the columns into smaller logical groupings.
But the real fun of all this is loading the data completely before the first load even hits! Yeah, yeah. We could just let the table populate naturally but what fun is that?!
In our example above, we already know what all the values are. Additionally, there are not that many. We can take advantage of CROSS JOIN to put the data together and load the table!
Watch While I Science The **** Out Of This
In [ ]:
Column Name
Number Of Possible Values
Color
5
Size
3
isActiveProduct
2