Time to conquer the next great Power BI debate! By request of fellow PBI blogger Mr Bernat Agulló on Twitter (you can check out his blog https://www.esbrina-ba.com/ here) I’m going to dive into whether you should use Dax or M to create calendar tables in Power BI.
In real life, I’m kind of spoiled, because my answer is neither! I live in an Enterprise BI world where all my data has always come via a data warehouse, so my default mode of operation is to maintain a calendar in there and import it across all of my models.
But if that isn’t an option, what is the best choice? Well, for personal projects outside of work, my preference is to use M. Why? Well if we consider Roche’s Maxim, that data transforms should happen as far upstream as possible and as far downstream as necessary (Burning Suit wrote a great blog on this here: https://www.burningsuit.co.uk/blog/2021/03/where-do-i-make-data-transformations-in-power-bi-roches-maxim/) then using M over DAX more closely fits that bill.
And my gut instinct was always that M would utilise the Vertipaq engine better, with the data being compressed as part of the model processing, therefore reducing the amount of memory being used by the model, and consequently producing a better performing data model. This seemed like a theory that would be pretty easy to prove, so I did some testing.
Memory Usage
This is the power query script that I have saved as a template for my calendars:
let
//set calendar start date here or by using parameter
StartDate = #date(Date.Year(DateTime.LocalNow())-5, 1, 1),
//calculates last day of the next year
EndDate = #date(Date.Year(DateTime.LocalNow())+1, 12, 31),
//works out number of days between start date and end date
Duration = Duration.Days(EndDate - StartDate)+1,
//creates list of dates based on specified start date and calculated duration
Calendar = Table.TransformColumnTypes(Table.FromList(List.Dates(StartDate, Duration, #duration(1, 0, 0, 0)),Splitter.SplitByNothing(),{"Calendar Date"}, null, ExtraValues.Error),{{"Calendar Date", type date}}),
#"Added Day of Week Name" = Table.AddColumn(Calendar, "Calendar Day of Week Name", each Date.DayOfWeekName([Calendar Date]), Text.Type),
#"Added Day of Week Number" = Table.AddColumn(#"Added Day of Week Name", "Calendar Day of Week Number", each Date.DayOfWeek([Calendar Date])+1, Int64.Type),
#"Added Calendar Month Name" = Table.AddColumn(#"Added Day of Week Number", "Calendar Month Name", each Date.MonthName([Calendar Date]), Text.Type),
#"Added Calendar Month Number" = Table.AddColumn(#"Added Calendar Month Name", "Calendar Month Number", each Date.Month([Calendar Date]), Int64.Type),
#"Added Calendar Quarter Name" = Table.AddColumn(#"Added Calendar Month Number", "Calendar Quarter Name", each "Q" & Text.From(Date.QuarterOfYear([Calendar Date])), Text.Type),
#"Added Calendar Quarter Number" = Table.AddColumn(#"Added Calendar Quarter Name", "Calendar Quarter Number", each Date.QuarterOfYear([Calendar Date]), Int64.Type),
#"Added Calendar Year" = Table.AddColumn(#"Added Calendar Quarter Number", "Calendar Year", each Date.Year([Calendar Date]), Int64.Type),
#"Added Calendar Year Quarter" = Table.AddColumn(#"Added Calendar Year", "Calendar Year Quarter", each Text.From([Calendar Year]) & "-" & [Calendar Quarter Name]),
#"Added Calendar Year Month" = Table.AddColumn(#"Added Calendar Year Quarter", "Calendar Year Month", each Text.From([Calendar Year]) & "-" & Date.ToText([Calendar Date], "MMM")),
#"Added CalendarYearMonthSort" = Table.AddColumn(#"Added Calendar Year Month", "CalendarYearMonthNumberSort", each Text.From([Calendar Year]) & Text.PadStart(Text.From([Calendar Month Number]), 2, "0"))
in
#"Added CalendarYearMonthSort"
It dynamically goes back 5 full calendar years, and then forwards to the end of the next calendar year, so it covers 7 years altogether which I find is about the right length of time to cover topical data (and I can always adjust it as I see fit).
I then created this DAX script to create a calculated table covering the same time period and with the same attributes:
Calendar =ADDCOLUMNS (
CALENDAR (
DATE ( YEAR ( TODAY () ) – 5, 1, 1 ),
DATE ( YEAR ( TODAY () ) + 1, 12, 31 )
),
“Calendar Day of Week Name”, FORMAT ( [Date], “dddd” ),
“Calendar Day Of Week Number”, WEEKDAY ( [Date], 2 ),
“Calendar Month Name”, FORMAT ( [Date], “MMMM” ),
“Calendar Month Number”, MONTH ( [Date] ),
“Calendar Quarter Name”,
“Q”
& TRUNC ( ( MONTH ( [Date] ) – 1 ) / 3 ) + 1,
“Calendar Quarter Number”,
TRUNC ( ( MONTH ( [Date] ) – 1 ) / 3 ) + 1,
“Calendar Year”, YEAR ( [Date] ),
“Calendar Year Quarter”,
YEAR ( [Date] ) & “-Q”
& TRUNC ( ( MONTH ( [Date] ) – 1 ) / 3 ) + 1,
“Calendar Year Month”,
YEAR ( [Date] ) & “-“
& FORMAT ( [Date], “MMM” ),
“CalendarYearMonthSortOrder”, YEAR ( [Date] ) & FORMAT ( [Date], “MM” )
)
Using the Vertipaq Analyser in DAX Studio allowed me to see the difference in memory usage. And the results were pretty astounding:
Using DAX makes the memory used by the Calendar nearly 50 times bigger!
Performance
But what does that do to performance?
I loaded my FactOnlineSales table into both models – that’s around 13M rows.
And then I tested performance using the following DAX:
DEFINEMEASURE ‘FactOnlineSales'[Sales Amount] =
SUM ( ‘FactOnlineSales'[SalesAmount] )
MEASURE ‘FactOnlineSales'[Sales Orders] =
DISTINCTCOUNT ( ‘FactOnlineSales'[SalesOrderNumber] )
EVALUATE
SUMMARIZECOLUMNS (
‘Calendar'[Calendar Year],
–‘Calendar'[Calendar Year Quarter],
–‘Calendar'[Calendar Year Month],
–‘Calendar'[Calendar Date],
“Sales Amount”, [Sales Amount],
“Sales Orders”, [Sales Orders]
)
Using the benchmarking feature in DAX Studio I could run the query multiple times, and I tested at multiple granularities; so first I grouped by year, then by quarter, then by month and then finally by day.
And here are the results:
Showing ultimately, from a performance perspective, it doesn’t really make much difference. Across all permutations the results were within a tenth of a second, with sometimes the DAX calendar performing better and sometimes the M calendar performing better.
And I guess by their very nature, any calendar table is always likely to be pretty small, so is nitpicking over the memory footprint really necessary?
Probably not. However, the purist in me feels like I should always do everything I can to keep that memory usage as small as possible, so I plan to keep continuing to use M, the added bonus also being that it gives me the opportunity to publish it as a Dataflow for re-use across other models.
Once again though, it’s probably one where it’s easy to sit firmly on the fence, and another case of choosing your path based on personal preference.
So once again, I ask, which camp are you in?
2 Comments
Donald Parish · May 28, 2021 at 4:15 pm
I used https://www.sqlbi.com/tools/dax-date-template/ to handle 4-5-4 with 53 week year and holidays
Creating Date and/or Time Tables with SQL - Greyskull Analytics · April 1, 2022 at 7:26 pm
[…] Date Tables… yep that again. I’ve been here before when I blogged about whether you should use M or DAX to generate your date table in Power BI. […]