Date Tables… yep that again. I’ve been here before when I blogged about whether you should use M or DAX to generate your date table in Power BI.
But the real answer, once again referencing Roche’s Maxim is to do it further upstream in a Data Warehouse.
Some source systems might have a date table in them already that you can import into the Data Warehouse, but I’ve certainly worked in lots of scenarios previously where no source table existed and I’ve needed to magic one up with some SQL sorcery.
My go to method for this used to be to use a recursive CTE (Common Table Expression) – see example here.
However, having had the recent pleasure of dabbling with Databricks and Azure Synapse Analytics Serverless Pools, this method came a cropper. Both of these platforms run on Spark and Spark does not support recursivity. So I needed a new technique and I thought I’d share the method I landed on.
Creating Data Records
The method still utilises CTEs, just not the recursive kind, so running this in Azure Synapse Analytics using TSQL date functions works really well.
/* generates a list with 10 values */
WITH
List AS (
SELECT
n
FROM
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)
),
/* generates 10k rows by cross joining List with itself */
Range AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 AS RowNumber
FROM
List a
CROSS JOIN List b
CROSS JOIN List c
CROSS JOIN List d
),
/* query to specify date range */
DateRange AS (
SELECT
CAST(CONCAT(YEAR(GETDATE())-5,'-01-01') AS DATE) AS StartDate /* should always start on 1st January */,
CAST(CONCAT(YEAR(GETDATE())+1, '-12-31') AS DATE) AS EndDate /* should always end on 31st December*/
),
/* query to generate dates between given start date and end date */
Calendar AS (
SELECT
DATEADD(DAY, r.RowNumber, dr.StartDate ) AS Date
FROM
Range r
CROSS JOIN DateRange dr
WHERE
DATEADD(DAY, r.RowNumber, dr.StartDate ) <= dr.EndDate
)
/* date table query */
SELECT
c.Date,
YEAR(c.Date) AS Year,
CONCAT('Qtr ',DATEPART(QUARTER, c.Date)) AS Quarter,
DATENAME(MONTH, c.Date) AS Month,
DAY(c.Date) AS Day,
MONTH(c.Date) AS MonthNumber,
DATEPART(WEEKDAY, c.Date) AS DayOfWeek,
DATENAME(WEEKDAY, c.Date) AS DayOfWeekName,
ROW_NUMBER() OVER (PARTITION BY YEAR(c.Date) ORDER BY c.Date) AS DayOfYear,
DATEPART(WEEK, c.Date) AS Week,
DATEPART(ISO_WEEK, c.Date) AS ISOWeek
FROM
Calendar c
ORDER BY
c.Date
The principal behind this starts by creating 10 dummy records in a CTE. By cross joining (…instantly makes me think of Chris Webb’s blog) this table with itself, the number of rows are multiplied. Cross joining 3 times gives 10,000 rows. If your date table needs more days than this, you could add a 4th cross join to get to 100,000, though 10k rows gives you coverage for over 27 years, so in most cases this will suffice.
I’ve then defined the start and end dates for my table in another CTE. You could choose to hard code these, or maybe you could query another table in your data warehouse to get relevant min and max dates. In this example I’ve chosen a rolling 7 year window by looking backwards 5 years from today’s date and then forwards to the end of the next calendar year. For time intelligence functions to work in Power BI you need a contiguous list of dates that cover full years, so I always anchor the start date to 1st January and end date to 31st December.
I then use the DATEADD function to offset my start date by the row number of my 10k data set to generate the list of dates, limiting the output to only return records less than my end date – for my 7 rolling year date table this means my 10k rows are reduced to 2,556.
Finally I query this list of dates and add additional attributes such as Year, Quarter, Month, Day etc…
Of course this still works for regular TSQL databases too. If you are using Databricks you might need to swap out some functions for Databricks friendly alternatives.
There are a plethora of additional date attributes you could add to this, with advice readily available on the interwebs on how these can be applied (I find week ending, week commencing, 4-4-5 periods or alternative financial year definitions are often requirements).
Creating Time of Day records
If your Power BI model also requires a Time table (we all know separate date and time tables are best practice, right?) you can use a similar technique.
/* generates a list with 10 values */
WITH
List AS (
SELECT
n
FROM
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)
),
/* generates 100k rows by cross joining List with itself */
Range AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 AS RowNumber
FROM
List a
CROSS JOIN List b
CROSS JOIN List c
CROSS JOIN List d
CROSS JOIN List e
),
/* query to create 24 hours worth of records */
Time AS(
SELECT
DATEADD(SECOND, r.RowNumber, '1900-1-1' ) AS Time
FROM
Range r
WHERE
DATEADD(SECOND, r.RowNumber, '1900-1-1' ) < '1900-1-2'
)
/* time table query */
SELECT
CAST(t.Time AS TIME(0)) AS Time,
CAST(DATEADD(MINUTE, DATEDIFF(MINUTE, 0, t.Time), 0) AS TIME(0)) AS Minute,
CAST(DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Time), 0) AS TIME(0)) AS Hour,
FORMAT(t.Time, 'tt') AS [AM/PM]
FROM
Time t
For a Time table at second grain, 86,400 records are required so that 4th cross join is needed (though I guess you could choose to use a higher grain). There’s no need to calculate start and end dates, you just need to cover a 24 hour period so I nominally chose to include all seconds between 1st January 1900 and 2nd January 1900. And then I applied the functions to return the time attributes I wanted – again you may choose to add or detract from these as you see fit.
How you manage these date and time constructs is really up to you. A lot of people will create a table in their database or data lake and run an ETL pipeline to execute the code to load the data. I actually prefer to just keep these as views. The data volume is low, so these are quick to load into a Power BI model and it means I can add new attributes easily without having to worry about physically creating new columns in a table, or having to re-run pipelines. By dynamically specifying the start and end dates I can even add additional records to the views without worrying about triggering an ETL process.
As per, these scripts are available on my Github page.
Let me know what you think.
0 Comments