Recently, I have been using Power Query to scrape data from web pages to load into Power BI for some side projects (more to come as these are completed, hint hint). The data from the web pages is good but inconsistencies do exist. Sounds like a pretty normal day for a data nerd, right?
Let me explain a few of the details and do my best to not overload you. In my scenario, Webpage A contains a list of items, and for each item, a separate webpage page contains more details on each item. The list of items contains a hyperlink linking to the corresponding page of details. So there is a query that retrieves the list of all of the items and hyperlinks from Webpage A. This query then calls a function to follow the hyperlink and retrieve all of the respective details. The inconsistency I ran into was that the details for List Item A may not always be the same as List Item B. Perhaps data was not captured in the same way for both List Items A and B, perhaps someone edited the HTML for Details Webpage B and removed some information. Whatever the reason, I want to include the data when it is present in the details. In cases where the detail is not present, a null value is acceptable.
Since each of the details will be transformed into an individual column, Power Query is expecting the same number of columns with consistent column names from each of the queries to the details pages. If this is not the case, Power Query will throw an error.
I came across Table.HasColumns() in the Power BI Community forum. With a little bit of additional research, it became clear that this would solve the problem. It’s a pretty straight forward solution, here’s the code:
if Table.HasColumns(#"Reference_To_Previous_Step", "ColumnName") then #"Reference_To_Previous_Step" else Table.AddColumn(#"Reference_To_Previous_Step", "ColumnName", each null)
The code is running a simple test to determine if a column named, in our example, “ColumnName” exists. If so, we pickup at the end of the previous step and continue on with the remaining steps in the query. If “ColumnName” does not exist, then a new column is added to the table, named “ColumnName”, and given a value of whatever is specified after each. In our example, “ColumnName” contains a null value. You could enter a number, text, calculation, or another M function here, whatever meets your need to populate the column with data.
This is a simple way of solving the problem of the disappearing data column in Power Query!
Read more on Table.HasColumns():
Official M Language documentation: https://docs.microsoft.com/en-us/powerquery-m/table-hascolumns
Chris Webb’s blog post on this function: https://blog.crossjoin.co.uk/2015/07/08/checking-columns-are-present-in-power-query/