What is a calculated column
Written by: Danny Galvan
A calculated column is a type of DAX calculation we can use for our Power BI reports.
The DAX calculations we write in Power BI will typically fall into 1 of 2 categories:
- calculated columns
- calculated measures
Calculated Measures can get complex, so we will start with calculated columns to build our understanding from the ground up. When I mentioned in a previous post that, DAX can look like an Excel formula for a newcomer, I was mostly referring to calculated columns. When I was first learning this stuff, that analogy helped to get me started.
Calculated columns can be useful for many cases but I would venture to say that most users will only create calculated columns to either:
- manipulate some string of text (I.e. concatenation) OR
- create a special category used as a flag or on a slicer/filter
There are many more cases but these two are quite common, especially if we want to quickly put a report or dashboard into use.
Let’s make a simple calculated column and see what it does. Although I mentioned that string and categorical data is most commonly used for calculated columns, I am going to use numerical values to illustrate a point and hopefully give you a deeper understanding.
Inside of Power BI Desktop, I navigate to the Data tab. From there, click on Modeling ribbon, and you should see “New Column” under Calculations. Click New Column and you should get the same as seen below.
Now we can give it a useful name and type our formula into the formula bar. Pay attention to the structure of this calculated column. We have a table name, two column names, and the subtraction operator used to perform a calculation.
So, in this example above, I named the calculated column “Gross Profit”. The formula simply subtracts the value of TotalCost from SalesAmount. The table from which these two fields belong to is Sales, as highlighted in orange.
Now I hit Enter and suddenly my calculation appears as a new column in the data set. No surprises here. We made a calculation and out comes a new column, hence the name calculated column.
Here is what this calculation told Power BI to do…
This calculation only cares about one row at a time. On each row, we take one value and subtract it from another to get a new value. Due to sorting by a specific product (a foreign key), our Gross Profit value is $3,775.60 on each row we see in the photo snippet. If every row of this data represented a unique transaction by a customer, then our Gross Profit would most likely be different for each row.
The main takeaway is that the calculation focuses on one row of data at a time because we did not include any functions which tell Power BI to calculate over multiples rows of data at a time. This is an important distinction when writing DAX. Each Gross Profit value is derived from the data on its own unique row. It does not care about the row above it or below it. If we want to calculate over more than one row at a time, or a specific group of data, we need to use DAX functions which aggregate our data.
What if I use an Aggregate function in a calculated column?
If our calculated column included an aggregate function (e.g SUM, AVG) and looked like this:
Total Sales = SUM(‘Sales’[SalesAmount])
… Power BI won’t give you an error. It will simply create a new column with the same value repeating on every row. Why? Because we told Power BI to sum all the SalesAmount values in our Sales table. That can only be one number.
What’s even worse is that it only calculates this once and stores the value. This is how calculated columns work. So, if you try to use this calculated column in a graph or matrix, it’s not going to recalculate for you based on the context (context is a very important concept for DAX) you created on your report, because the value is already set and computed.
If you try and use this new column to do other calculations, it’s worthless. For example, if you drag it into a graph in Power BI it’s just going to re-sum every Gross Profit value based on the filter category you selected. You would most likely notice an abnormally high number in your graph.
Since the value would be the same for the whole table, it would be more like an attribute of the table than a metric of the data. The solution to this problem is calculated measures, the logical next step in learning DAX.
Hopefully by now when you think about calculated columns, you have some basic idea of where it fits into Power BI. Knowing some basic theory on behavior and limitation of calculated columns should certainly help. Now go experiment with your own!