Although they‘ve been generally available for over two years now, it’s only in recent months that I’ve finally had some hands-on experience using Dataflows in the Power BI service.
I think they’re a good addition and I can definitely see their value. The Power Query experience, with the query folding indicator is really nice, as are features such as the diagram view.
There are however still a few areas that have frustrated me.
Dataflow Issues
One thing is that by default, when you connect to a Dataflow from Power BI desktop, the connection utilises the Dataflow’s unique ID.
On the face of it, that didn’t ought to be a problem, but there are at least two issues I faced (and it seems one of these has since been “fixed” anyway) where this became troublesome.
The first one, is what happens if you need to replace your Dataflow from scratch. There is a known feature (bug?) whereby you can not remove redundant data sources from Dataflows. This has caused problems when trying to schedule Dataflow refreshes, especially if the redundant data source is an on-premise data source that no longer exists. I’ve found that the Power BI Dataset still expects a gateway to be mapped – when the source doesn’t exist anymore, that’s impossible! The solution was to delete and re-create the Dataflow. I found that exporting the JSON definition of the Dataflow (which seems to omit the rogue data source) and re-importing was the quickest method, I didn’t have to rewrite all of my Power Query steps from scratch.
The problem with this “fix” though, is that the re-imported Dataflow now has a new unique ID. The connection in the Dataset now no longer works, as the unique ID it points at has been deleted. You have to go back to your Dataset and redo the connection string to point at the “new” Dataflow.
The second problem was caused by using the deployment pipelines in the Power BI service. I had my Dataset and Dataflows all published in the same Workspace. However, when I deployed my development Dataset to the test Workspace (and subsequently onwards to the production workspace) the Dataset connection still pointed back to the Dataflows in the development Workspace. More recently, support for Dataflows as part of deployment pipelines has been added (although at time of writing it’s still in preview) and this behaviour does seem to have been sorted.
An alternative to Dataflow ID
However, at the time, it prompted me to look at alternatives.
Some twitter interactions with Fred Kaffenberger (@freder1ck) and Mike Allen (@MikeAinOz) prompted me to look at a method for connecting to Dataflow using its name, rather than its ID.
And so I’ve written a Power Query function and this is what I came up with:
let
fnConnectToDataflow =
(Workspace as text, Dataflow as text, Entity as text) =>
let
Source = PowerPlatform.Dataflows(null),
Workspaces = Source{[Id="Workspaces"]}[Data],
Workspace_Name = Workspaces{[workspaceName= Workspace]}[Data],
Dataflow_Name = Workspace_Name{[dataflowName=Dataflow]}[Data],
Entity_Name = Dataflow_Name{[entity=Entity,version=""]}[Data]
in
Entity_Name
in
fnConnectToDataflow
It works great.
You need to pass it three parameters: Workspace Name, Dataflow Name and Entity Name (it is possible for you to have multiple “tables” output from a Dataflow – Entity is the technical name for these tables).
Using this technique, if for whatever reason you find yourself in a position where you need to delete and replace a Dataflow, providing you stick to the same naming convention, then your Dataset will continue to connect to the new version.
I’ve also been parameterising the Workspace when I create the connection in my Dataset. This allows me to easily switch the Workspace location of the Dataflow I’m connecting to. Using Deployment rules in the Power BI Service Pipelines I was then able to make sure that when I promoted my Dataset between development, test and production Workspaces, it always picked up the Dataflows from the correct location.
I’ve been using this technique for a few months now (though I’ve been slack about writing a blog on it) but with the introduction of the new Dataflows connector in the August release of Power BI Desktop, I’ve revisited my function to make sure it uses this new connector (which, as Chris Webb explains here, supports query folding).
You can access it on my Github page here
Let me know what you think?
7 Comments
Bernat Agulló · August 25, 2021 at 7:51 pm
Very nice indeed! Totally stealing this one too.
Taras Koniukhov · August 26, 2021 at 6:48 pm
Super handy solution to work explicitly with dataflows. Walking away with it. Thank you for sharing.
Taras Koniukhov · September 8, 2021 at 12:53 pm
Stop working for me with message:
Expression.Error: The name ‘PowerPlatform.Dataflows’ wasn’t recognized. Make sure it’s spelled correctly.
BTW, Johnny, why you are useing connector “PowerPlatform.Dataflows” and not “PowerBI.Dataflows”?
Johnny Winter · September 9, 2021 at 3:11 pm
Hey Taras, so MSFT recently released a new Dataflows connector, which is the PowerPlatform.Dataflows version you can see here. If that isn’t working for you in PBI desktop it would suggest that you’re not using the latest version. The advantage of the new connector is that if your dataflow is using the advanced compute engine (a premium capacity feature) then it will now support query folding, which means you can more effectively do incremental refreshes of your datasets. If you have to use an older version of PBI desktop, you’re right, you need to use PowerBI.Dataflows instead, which then uses a slightly different syntax in the subsequent applied steps. My original version of this function did exactly that (you should still be able to view that version in the history on my github page).
Taras Koniukhov · September 10, 2021 at 1:02 pm
Hi Johny,
Checked it. Desktop reinstallation is exactly removed this problem.
Appreciate your help!
Many thanks
Taras Koniukhov · September 8, 2021 at 1:16 pm
Connector “PowerPlatform.Dataflows” stoped working for me in the power bi desktop, but still working in the power bi service ¯\_(ツ)_/¯
Plagiarize on your initial solution:
==========================================
let
fnConnectToDataflow =
(Workspace as text, Dataflow as text, Entity as text) =>
let
Source = PowerBI.Dataflows(null),
Step1Workspace = Table.SelectRows(Source, each ([workspaceName] = Workspace))[Data]{0},
Step2Dataflow = Table.SelectRows(Step1Workspace, each ([dataflowName] = Dataflow))[Data]{0},
Step3Entity = Table.SelectRows(Step2Dataflow, each ([entity] = Entity))[Data]{0}
in
Step3Entity
in
fnConnectToDataflow
Alfons · July 14, 2023 at 2:59 pm
Nice approach., We are trying to implement it as deployment of dataflows & datasets using pipelines by default (with ID’s on pwer queries) lead us to some errors. My only concern is that after applying it dataset lineage does not longer shows the dependency with the dataflows. Is this the expected behaviour?
Thx