An introduction to building end-to-end analytics solutions – a Packt publication

The lovely folks over at Packt have kindly sent me a copy of their book ‘Fundamentals of Analytics Engineering’ for me to review. If you’re interested in picking up a copy, you can find it on the the Packt website. Yeah, yeah, the title image for this piece is AI generated… the real cover looks like below:

The authors
The book is written by quite the ensemble cast; Dumky De Wilde, Fanny Kassapian, Jovan Gligorevic, Juan Manual Perafan, Lasse Benninga, Ricardo Angel Granados Lopez and Tais Laurindo Perieira. The authors are all colleagues who worked together at the consultancy Xebia.
Having worked my way through it over the last few weeks, here’s what I think.
The gripes
Spoiler alert! This is going to be a largely positive review, so I’m just going to get a couple of mini complaints out there first.
The blurb
Firstly the blurb. Directly from that blurb is the following section:
“With practical guidance, you’ll also learn how to build a simple data platform using Airbyte for ingestion, Google BigQuery for warehousing, dbt for transformations, and Tableau for visualization”
This description kind of got my back up a bit – it implies that the book is quite technology specific. I personally don’t have much interest in any of the above technologies other than dbt. Was reading this going to be a waste of time?
It also gives the impression that Analytics Engineering only uses these technologies. I’ve complained about it this before, but I do not believe that any one technology vendor should be allowed to have a monopoly on a specific job title. The rise of Analytics Engineering is definitely tightly associated with dbt, but I personally don’t think the job title is exclusive to using that tech.
Fortunately the book doesn’t actually fall into this trap. There is one chapter (chapter 8) that works its way through a practical example of implementing an analytics solution, and whilst it does use the technologies listed, it also calls out the fact this is just one example used to demonstrate how to put some of the theory in to practice and alternatives are available.
The title
Which kind of leads me on to my second bugbear; the title. Calling this “Fundamentals of Analytics Engineering” really does the contents of the book a disservice. The subtitle “An introduction to building end-to-end analytics solutions” gets nearer to describing the subject this book covers. Perhaps an alternative title akin to “Everything you should know about that’s adjacent to what you should know as an Analytics Engineer” would be more accurate, but yeah, I get it. That’s a lot less snappy.
I feel some sympathy for whoever made the call on the title. Finding something that truly encapsulates the books material whilst being concise enough to be considered a realistic title must have been hard. I feel that the content is aimed at Analytics Engineers, but really it covers the entire Modern Data Stack. I personally believe that to be a truly great Analytics Engineer, you shouldn’t just stay in your lane and only understand the tasks you are responsible for, you should have an appreciation of the wider stack, see the bigger picture and understand the inputs and outputs that analytics engineering relies on and feeds into.
This book does exactly that.
The book
The book is split into 5 sections:
Introduction to Analytics Engineering
Building Data Pipelines
Hands on guide to Building a Data Platform
DataOps
Data Strategy
Introduction to Analytics Engineering
This section serves as a great primer to Analytics Engineering as a discipline, helping frame what analytics engineers do, how and why the term emerged and the shift in technologies that contributed to this.
The section also has a chapter introducing the Modern Data Stack (MDS). It has seemed in recent times that it’s somewhat trendy to declare the Modern Data Stack as dead. Vendors currently seem to be clamouring to build more integrated platforms, and perhaps the more modular MDS wouldn’t be my first choice for a greenfield project, but the implementation of MDS over the past several years certainly means it has a blast radius big enough to be a topical and valid architecture choice for a while to come.
Building Data Pipelines
When I think “building data pipelines” I think more of data ingestion and ETL. This section covers far more than that, giving practical advice on implementing a full data stack.
There’s a chapter on data ingestion, which itself is more a data engineering task than an analytics engineering one, but this high level look at the fundamentals is still a great intro for aspiring analytics engineers who will inevitably need to work closely with data engineers.
There’s a chapter on Data Warehousing, giving a history lesson on data warehouses and the move to the cloud.
The chapter on data modelling, an often neglected subject is another great piece of context for analytics engineers to understand, this chapter covering 3NF, dimensional modelling and data vault.
Transforming Data is the chapter that I would most associate with the role of an analytics engineer. Here is where I expected a bigger focus for the rest of the book. What does get covered is still relatively high level. It covers the basic well, but if you are expecting a deep dive here, this isn’t what you will get.
The final chapter in the section covers serving, including not just visualisation, but also how to go about making sure you visualise the right things by choosing actionable metrics as opposed to vanity metrics. A great overview of how important it is to make sure the outputs of your data platform actually provide value. It includes in it one of my favourite soundbites:
“There is nothing wrong with using data visualization as a form of art. However, the true power of information design is how it facilitates comprehension”
Hands on guide to Building a Data Platform
This is the aforementioned hands on guide. As already touched on, despite my fears this wasn’t as tech heavy as I thought it was going to be, and in fact was done really well, focussing on a fictional use case for a stroopwafel seller (who doesn’t love a stroopwafel!?). This definitely focussed more on theory than tech, which for me, was a great approach.
Data Ops
This section covers a lot of what allows analytics engineering to earn it’s “engineer” tag. It includes things like orchestration, observability, source control and code sharing and automations and deployments. I’d have liked to have seen a call out to the Data Ops manifesto and reference some of the wider (and fundamental) Data Ops principles. Despite missing this, the essentials needed for building a data platform are all nicely covered.
Data Strategy
The final section has two chapters; Driving Business Adoption and Data Governance. I think including a section on adoption was a great move. The best data platform in the world is useless if no one is going to use it. And Data Governance, even described by the authors as being “not the sexiest topic”, is certainly an essential consideration. Again, this section doesn’t go massively in to depth, but does call out things to think about.
Conclusion
There’s an epilogue to round out the book, which gives a very brief summary of all you’ve learned but also has a wonderful little section calling out how to make your analytics engineering career future proof. There are three pieces of advice for this:
Tip #1 – keep learning and developing your skills
Tip #2 – network and engage with the community
Tip #3 – showcase your work and build a portfolio
All fantastic bits of advice for an aspiring data professional.
Final Thoughts
I’m really glad I had opportunity to read this. I enjoyed the breadth of subject matter. For me it goes well beyond just analytics engineering. Huge parts of the book I found myself nodding along, and there are many fantastic explanations in here that fledgling data professionals would definitely benefit from. There were perhaps the odd sections where I didn’t necessarily agree entirely with some ideas or principles being presented, but not to the point where I’d feel aggrieved enough to dismiss the rest of the contents of the book.
So overall, would I recommend it? Absolutely yes. I think for data professionals in the current era, especially those who focus more on the analytics end of the stack, this is a great foundational text, and if you’re new to the industry, you could certainly do a lot worse than read this as an introduction to a lot of very important fundamental topics.
Finally, a quick shout out to Nivedita Singh from Packt, who gave me the opportunity to receive a copy of the book and do this review.
0 Comments