My take on the Medallion Architecture hype

New dawn or false dawn?

It’s only mid-January, but I feel like I’ve already seen 1000 claims already this year that the Medallion Architecture is fad, is just marketing hype and is just the same way we’ve always done it.

But I don’t agree.

I wonder whether folks with this opinion have actually done a medallion implementation on a Lakehouse?

Thanks for reading Greyskull Analytics! Subscribe for free to receive new posts and support my work.

The History

I’d agree that it’s not necessarily a revolutionary approach. It’s more an evolution of the way we’ve done things in the past.

Historically, when loading a data warehouse, data transformation would happen in flight, be landed into staging tables and then loaded to its target. This often happened with incremental patterns that would load just the new data to staging and then some kind of merging technique to get the data to its final destination. Data in staging tables was ephemeral and tended to be cleared down either before or after the data load. In an on-premise world, this was often because we had disk space to worry about. I’d describe this as an old school ETL (Extract, Transform, Load) approach.

Then came all the buzz around data lakes, schema on read and the death of data modelling, and for a fleeting moment, I wondered if my chosen specialism was going to become obsolete.

That certainly wasn’t the case, but as we moved to the cloud, we did transition to what became known as “The Modern Data stack” and the ELT pattern. Part of me always resisted the whole ETL v ELT argument. It felt a bit “you say tomato and I say tomato”. But ultimately I think I recognise that this approach was more about loading data to staging untransformed, and doing the transformation inside the data platform, rather than on top of the source systems. The staging layer would often be a data lake, with the final destination being a cloud data warehouse. Taking advantage of cheap storge in data lakes, you could keep copies of all your data staging extracts and build up an auditable history, as well as the capability to “replay” data loads based on different points in time.

I have to call out at this point my hatred of the name “Modern Data Stack”. Prefixing any approach with the word modern was always a recipe for disaster, and we now find our selves in the oxymoronic position of describing MDS as being an old (or at least dying) way of doing things.

The New Approach

The Medallion Architecture for me is the next step in this evolution. Taking advantage of open table formats that allow data lakes to mimic the behaviour of relational databases and also leaning into the mantra that “storage is cheap”, we start to see further additional layers with persisted data being added into the architecture (though use of the word “architecture” can also be contentious – it’s more of a data processing pattern). The Medallion approach has risen in line with the birth of the Data Lakehouse.

There are many different variations on how to implement Medallion. As is too often the case with data professionals, we have failed to agree on an industry standard. But if I describe the implementations I’ve been involved in, they typically look something like this:

A landing zone that keeps data in its original format and is akin to the staging area of the modern data stack.
A bronze layer that converts data to an open table format such as Delta or Iceberg. This is an append only layer, with full history and versions of data preserved.
A silver layer that applies cleansing rules and de-deduplication of data. The structure of data here still mimics source systems. This can become almost like an operational data store.
A gold layer that models the data into a structure for analytical use cases, often a dimensional model. This fulfils the Data Warehouse portion of the Data Lakehouse portmanteau.

In all honesty, I have tended to steer away from the bronze, silver, gold labelling for these processing areas. I prefer something more descriptive such as Raw, Cleansed, Curated. And you shouldn’t necessarily feel limited to only 3 areas. I may add an Enriched zone in there too to add additional data points to my Cleansed zone without fully modelling it into facts and dimensions. Simon Whiteley has a good, ranty video on the subject that you should check out here:

The Schmäh

There has undoubtedly been some element of marketing schmäh applied to the rise of Medallion.

If you’re not familiar with the term schmäh, it’s a concept I got from Arnold Schwarzenegger – I can’t for the life of me track down where – I thought it was in his book “Be Useful: Seven Tools for Life” but I can’t now find the reference. He does talk about it in the Netflix documentary series “Arnold”, and you can see a clip here:

Pronounced sh-may it’s basically the art of glossing things up with a little bit of bull-shittery.

It’s unclear who first coined the Medallion Architecture term. It’s often cited as being Databricks and certainly the concept is mooted heavily in their marketing literature. Was the use of precious metals to describe each processing zone unnecessary? Perhaps. Was somebody trying to add more sparkle and glamour to these ideas (I mean come on folks, surely data engineering and analytics is sexy enough already?!).

It also worthy of note that it’s not an approach that’s exclusive to Databricks. As we move towards Data Lakehouses becoming the de facto data architecture of the era, I see other products referencing it, such as Microsoft Fabric and Dremio. And though I can’t find any direct references to it by Snowflake (woe be tide anyone from Snowflake using terminology favoured by Databricks), there’s certainly material out there by Snowflake practitioners endorsing Medallion Architecture as a valid approach.

The Label

I do think that giving this approach a clear name is useful though. When describing the pattern, I’ll sometimes call it a “multi-hop data management and processing pattern”, but that’s quite a mouthful and using “Medallion” instead has now reached a level of adoption that most people in the industry will know what you are talking about.

Did the pattern exist before the marketing hype emerged? Maybe. I’m sure someone, somewhere, took the approach first. But wide use of the pattern has coincided with the adoption of the “Medallion” nomenclature.

My Take

If you’re an experienced data professional who has yet to dive into Medallion, then there’s definitely nothing to be afraid of. You will certainly come across concepts that are familiar as well as ways of working that may be similar to the ways you work today. And yes, some of those existing concepts may have been dressed up with slightly flowery new language.

But if you’re someone who’s being dismissive of this approach, claiming it is merely “the way we’ve always done it”, I politely disagree. I think that Medallion Architecture has enough distinct features to warrant its own chapter in the history of data platform approaches and labelling the whole pattern as a snake-oil-esque marketing ploy is doing it a disservice.

For the time being at least, I believe Medallion Architecture is here to stay.

Categories: Uncategorized

5 Comments

Bubba · January 18, 2025 at 6:12 pm

Interesting.

Stuart Cuthbertson · January 23, 2025 at 8:33 am

Great article, Johnny, and I love the word schmäh 😁

Totally agree with your overall point. That said: my first data job in 2009 (migrating hospital data between very different OLTP databases, rather than anything analytic) was already using a “multi-hop data management and processing pattern”, on SQL Server 2005. And that company had been doing that for some years before I joined. There was nothing lakey or open about it, which is key, but I guess it was basically an ELT medallionesque approach.

Johnny Winter · January 23, 2025 at 8:38 am

Yeah, I feel like I’ve managed to live under a rock… In my 20-ish year career I’d never experienced the multi-hop approach until the advent of Lakehouses, but having exchanged dialogue with a few folks, perhaps it was more common than I realised. So having gotten all preachy about it, I feel a little tale between my legs now. That being said, I still maintain that it’s widespread adoption is still a relatively new thing

Johnny Winter · January 23, 2025 at 8:39 am

And schmäh is my word of the week. I’ve had need for it in a few different contexts so far (not all particularly positive)

Stuart Cuthbertson · January 25, 2025 at 12:22 pm

For sure, I think the move to standardising on reasonably open/portable data formats, and the whole storage/compute separation, is also a meaningful factor. The combination of what you do, how you do it, and what you do it WITH, all have to come together.

My take on the Medallion Architecture hype

Published by Johnny Winter on January 18, 2025

New dawn or false dawn?

5 Comments

Bubba · January 18, 2025 at 6:12 pm

Stuart Cuthbertson · January 23, 2025 at 8:33 am

Johnny Winter · January 23, 2025 at 8:38 am

Johnny Winter · January 23, 2025 at 8:39 am

Stuart Cuthbertson · January 25, 2025 at 12:22 pm

Leave a Reply Cancel reply

Uncategorized

What is a Sausage Roll?

Uncategorized

Is your Data Demoralized?

Uncategorized

Relationships over Requirements

My take on the Medallion Architecture hype

Published by Johnny Winter on January 18, 2025

New dawn or false dawn?

The History

The New Approach

The Schmäh

The Label

My Take

5 Comments

Bubba · January 18, 2025 at 6:12 pm

Stuart Cuthbertson · January 23, 2025 at 8:33 am

Johnny Winter · January 23, 2025 at 8:38 am

Johnny Winter · January 23, 2025 at 8:39 am

Stuart Cuthbertson · January 25, 2025 at 12:22 pm

Leave a Reply Cancel reply

Related Posts

Uncategorized

What is a Sausage Roll?

Uncategorized

Is your Data Demoralized?

Uncategorized

Relationships over Requirements