Write Filter-based Measures

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:
Warning: Invalid argument supplied for foreach() in /home/customer/www/learndatainsights.com/public_html/wp-content/themes/twentytwentyone-child/single.php on line 152

Filter Context

The easiest way to understand the role of DAX Measures in Power BI is to think of them as advanced pivot table formulas that work in a slightly different way compared to standard formulas in the Excel grid.

Consider a standard pivot table in Excel as per the below screenshot. You can drag a numerical value into the values bucket and choose an aggregation (e.g. sum, count, average). Then each number that appears in the Values area of the pivot table is calculated according to whatever headings are in the rows/columns of the pivot table as well as any slicers that are applied.

So in the example shown in the screenshot the value of 484.95 is calculated as the sum of sales for Product Altes Lussekofta, Customer Casual Clothing, and an Order Date of 4th Jan 2008. These together provide the filter context for the calculation.

However, there are a number of limitations with having only simple calculations like Sum(Sales) in a pivot table.  Rather than having the filters specified in rows/columns or by slicers, we might want to put the filter conditions directly into the formula so that we can compare values with different filters applied. For example, we might want to compare January (and every other month) sales in 2009 vs those in 2008 for each customer/product combination within a single pivot table or graph. Having a year slicer will apply to the entire pivot table so it doesn’t give us the ability to control the filter conditions applied to individual cell values.

We might also want to show something a little more sophisticated than just the Sum(Sales). For example, if we wanted a Gross Profit Margin ([Sales – Direct Cost]/Sales) instead then that isn’t easy to do in a standard Excel Pivot table. Utilising the functionality of pivot table formulas might take us some way to achieving this but it is limited and clunky to use.

For a ratio like this, whilst we could calculate it row by row in our source table and then average it across rows, that isn’t the same result as calculating it for each subset of data – taking the total sales and total cost for a given customer/product combination and using these values to work out the ratio isn’t the same as  calculating it row by row on a transaction table.

DAX allows us to specify filter conditions in formulas, make use of the data model (so, unlike with a standard pivot table, not all our data has to sit in one big source table), and apply sophisticated logic such as time intelligence e.g. to dynamically give us the growth since last year for each row/column combination.

So we can have a simple DAX measure that is simply

Total Sales =
SUM ( Sales[SalesAmount] )

but we could also then have another measure that references our Calendar table to work out last year’s value e.g.

Total Sales Last Year =
CALCULATE ( [Total Sales], SAMEPERIODLASTYEAR ( CalendarTable[Date] ) )

And a third that calculates the growth e.g.:

Sales Growth =
DIVIDE ( ( [Total Sales] – [Total Sales Last Year] ), [Total Sales Last Year] )

And then just show the sales growth measure on our report – e.g. a bar chart with customer name on the X-axis and sales growth on the Y-axis, showing us which customers have had the highest sales growth in the period being considered. This functionality allows us to drive real insights into our data in a way that is quite difficult to achieve with Excel.

But using DAX measures requires us to think of the data as it is structured in the Values area of a pivot table, with all the filter contexts being applied to it, rather than as references to particular cells in a spreadsheet.

Note also that one DAX measure can reference others. In this way, you can build a chain of measures in your data model: base measures with simple aggregations of your key numeric columns and then any more complex measures then build upon these base measures.

Key DAX Aggregations

Most DAX functions work by aggregating values in columns or tables on subsets of data. Sum and Average are the most common aggregation functions e.g. Total Sales Revenue = SUM(Sales[Sales Amount]) or Average Revenue Per Transaction = AVERAGE(Sales[Sales Amount]). These are often a good idea to set up as base measures as the first step in building out your DAX formulas, as these can then referenced in other, more complex formulas where required.

In addition to COUNT and COUNTA, DAX provides two new functions for counting records which are incredibly useful: COUNTROWS and DISTINCTCOUNT.

COUNTROWS works well when you are referencing a dimension table. For example, to count the number of products then as long as your products table has only one row per product (Which it should) then this is as simple as # Products = COUNTROWS(Product).

However, this will give you the total number of products in this table (the total number of products available), which is not necessarily the same as the number of products which actually sold.

A more common scenario might be to say count the number of products which sold during a particular time frame or at a particular location. Consider the below data model diagram: when your  relationships are all set up as single direction in this way (which is advisable) then filtering for say a particular year and using

# Products =
COUNTROWS ( Products )

won’t give you the number of products which sold in that year because the data filter will pass from the Date table to the SalesFact table but not back to Product.

Rather than just enabling bi-directional filters (or using the CROSSFILTER DAX function), consider instead changing the measure for #Products from using COUNTROWS to DISTINCTCOUNT on the SalesFact table.

#Products v2 =
DISTINCTCOUNT ( SalesFact[Product ID] )

This gives you the unique number of Product IDs that sit in the SalesFact table and as such will respond to other filters.

When considering which DAX functions to use, it is important to consider the data model and the flow of data between the tables. The v2 of the measure above won’t work to consider the converse – the number of products that didn’t sell. For that to work, you would need to ensure your products table includes all products (not just those that sold in a particular period) and use COUNTROWS on the products table instead – an example is shown below.

Nonetheless, DISTINCTCOUNT is a very useful DAX function. Consider the requirement to calculate the average sale value per day (that could e.g. be viewed per location and month). This could be calculated as the Total Sales Value divided by the number of days for which sales occurred in that month, i.e.:

Total Sales =
SUM ( Sales[Sales Amount] )

# Days Sold =
DISTINCTCOUNT ( Sales[DateKey] )

Avg Sales Value Per Day =
DIVIDE ( [Total Sales], [# Days Sold] )

Other aggregations on a column include MAX, MIN, and PRODUCT. For a complete list of functions see the Microsoft Docs DAX Reference or the SQL BI DAX Guide

 

The structure of a DAX Formula

A DAX formula looks very similar to structured references using Tables in Excel.

Columns are referred to using the table name followed by the column name in square brackets – e.g. Sales[SalesAmount]. 

Other measures are also referred to in square brackets – e.g. [Total Sales] – though in order to avoid confusion never put a table name prefix when referring to a measure; that way it can be distinguished from a reference to a field.

You are also able to use Variables in Power BI when writing a DAX formula in order to calculate a value (in the filter context specified), or a filter condition, and then refer to them in subsequent lines. For example, the Sales Growth formula could have been written as:

Sales Growth =
VAR Sales = [Total Sales]
VAR PYSales = [Total Sales Last Year]
RETURN
    DIVIDE ( Sales – PYSalesPYSales )

As well as significantly improving readability with long and complex DAX formulas (which is very important!), the use of variables can also improve performance. In this example, the Total Sales Last Year value is referenced twice in the end calculation though it is only calculated once when it was assigned as a variable.

It is important to recognise that variables can only be defined once in a DAX calculation and are immutable – its value is determined in the expression and in the filter context which applies when it is set, and this cannot then be changed.

For example consider:

Sales Last Year =
VAR Sales = [Total Sales]
RETURN
    CALCULATE ( SalesSAMEPERIODLASTYEAR ( CalendarTable[Date] ) )

This would not return the expected result. The variable Sales will be assigned a static value representing the Sum of Sales in the current context (i.e. with any filter context present in the report at the time it is run). Let’s say that the value is £20k. The return expression will then reference the £20k value rather than the measure, and since SAMEPERIODLASTYEAR cannot be applied to a static value, it will just return £20k rather than the corresponding sales amount last year.

 

 

Setting the Filter Context

You don’t need to set the filter context using DAX for each subset of data you analyse because the filter context is already set by your report and data model. A simple DAX measure of Total Sales = Sum(Sales[SalesAmount]) will give different values depending on how it used and for each point – i.e. by the filter context applied. The filter context is defined by the axis of your charts (or rows/columns of a table), any slicers, visual/page/report level filters applied, cross-filtering/highlighting between charts as well as the flow of data in your data model.

Whilst this means that there’s a lot to think about when considering what a given DAX formula will return at any given point, it means, for the most part, your formula itself can (and should) be quite simple. There is nothing wrong with just having a simple Total Sales formula and showing this in different ways using different visuals and front-end filters. Sometimes, however, you might want to override the filter context that is being applied through a slicer (e.g. if your slicer is for a year and set to a value of 2018 but you want to override this to show the previous year value instead), or just to apply filters that are not being explicitly set in the report.

The DAX function used to set/override a filter context is CALCULATE. This calculates a measure of expression for a given filter context. The syntax of the function is:

CALCULATE(<expression>,<filter1>,<filter2>…)

So to calculate Sales for the year 2017 (regardless of what year is set through the filter context on the front end), you would write

2017 Sales =
CALCULATE ( [Total Sales], CalendarTable[Year] = 2017 )

i.e. calculating the total sales for dates where the year attribute of the date (as specified in the CalendarTable) is 2017.

For this to work, there needs to be a relationship between the CalendarTable and the Sales table (on the date field) – otherwise setting a filter on the Calendar Table won’t pass through to Sales and impact the result of Total Sales. By default, DAX utilises the relationships defined in the data model though it is possible to override these using certain DAX functions if required.

The pattern I use for setting the filter context in any DAX formula is to define the filter conditions as variables using CALCULATETABLE, FILTER or ALL and then use these as part of a CALCULATE function in the return of the formula.

The syntax for CALCULATETABLE is the same as CALCULATE but passing in a table (rather than a measure/expression) as the first argument:

So I would actually write the above formula as:

2017 Sales =
VAR YearFilter =
    CALCULATETABLE ( CalendarTable, CalendarTable[Year] = 2017 )
RETURN
    CALCULATE ( [Total Sales], YearFilter )

Where there are multiple filter conditions to be applied across different tables my preference is to define each as a separate variable and then pass them in via CALCULATE. So for example, if we wanted to filter on both year and Product, we might do something like

2014 Product A Sales =
VAR YearFilter =
    CALCULATETABLE ( CalendarTable, CalendarTable[Year] = 2014 )
VAR ProductFilter =
    CALCULATETABLE ( Product, Product[Product Name] = “Product A” )
RETURN
    CALCULATE ( [Total Sales], YearFilterProductFilter )

 

CALCULATETABLE vs FILTER

CALCULATETABLE and FILTER are examples of table functions – they return an entire table of values with the filter conditions applied. Since measures return a scalar value, they can’t be used in isolation in a measure, but they can be used as parameter arguments in the CALCULATE function.

The 2017 Sales formula could also be written using FILTER instead of CALCULATETABLE e.g.

2017 Sales Using Filter =
VAR YearFilter =
    FILTER ( CalendarTable, CalendarTable[Year] = 2017 )
RETURN
    CALCULATE ( [Total Sales], YearFilter )

The reason we would want to use CALCULATETABLE rather than FILTER is because it is generally considerably faster and works for the vast majority of filtering scenarios in DAX. Defining filter variables using CALCULATETABLE (or passing the conditions directly into a CALCULATE statement) should be the default option.

CALCULATETABLE is faster because it always works on single columns that can make better use of the columnar storage engine behind Power BI; by contrast, FILTER is an iterator function which calculates results row by row for all columns in scope. Passing in a single column as the first argument for a FILTER function (e.g. FILTER(ALL(Date[Date])) is considerably faster than passing in an entire table (e.g. FILTER(ALL(Date)). Using a single column with FILTER is equivalent to passing the arguments in as a filter condition in CALCULATE (or using CALCULATETABLE) though my preference is still to use CALCULATETABLE or conditions directly in CALCULATE where possible.

The FILTER function is more flexible, however, and there are a small number of cases where this additional flexibility is required for the calculation.

CALCULATETABLE can only specify conditions on one field at a time and can only filter rows for where that field compares to a particular static value.

FILTER can specify conditions on multiple fields simultaneously and can also use measures as part of the filter condition.

For example, if we wanted to calculate Sales for 2016 and 2017, we could write this as

YearFilter =
FILTER (
    CalendarTable,
    CalendarTable[Year] = 2017
        || CalendarTable[Year] = 2016
)

or

YearFilter =
CALCULATETABLE (
    CalendarTable,
    CalendarTable[Year] = 2017
        || CalendarTable[Year] = 2016
)

and either would work, but

July 2017 Sales Incorrect =

CALCULATETABLE (
    CalendarTable,
    CalendarTable[Year] = 2017
        && CalendarTable[MonthNo] = 7
)

would return an error whereas

July 2017 Sales Correct =
FILTER (
    CalendarTable,
    CalendarTable[Year] = 2017
        && CalendarTable[MonthNo] = 7
)

would work fine.

The operator || means ‘Or’ and && means ‘And’. For 2016-2017 Sales we need to use || because we want dates from both years, and in July 2017 Sales we need to use && because we want to restrict it by both Year and Month.

Note that for the July 2017 Sales example, it is still possible to use CALCULATETABLE by passing in the conditions as separate arguments (separated by a comma) i.e:

YearFilter =
CALCULATETABLE (
    CalendarTable,
    CalendarTable[Year] = 2014,
    CalendarTable[MonthNo] = 7
)

Where FILTER becomes necessary to use over CALCULATETABLE is where you specify more complex conditions for the filtering. For example, consider the example discussed above where you want to calculate the number of products that didn’t sell. This could be written as:

# Products Not sold =
VAR NotSoldFilter =
    FILTER ( ‘Product’, COUNTROWS ( RELATEDTABLE ( SalesFact ) ) = 0 )
RETURN
    CALCULATE ( COUNTROWS ( Product )NotSoldFilter )

CALCULATETABLE wouldn’t work in this case (it would come up with the error The True/False expression does not specify a column. Each True/False expressions used as a table filter expression must refer to exactly one column).

In addition, FILTER is required when a measure is used as a filter condition. For example we could define a measure

# Sales =
COUNTROWS ( SalesFact )

and then define the Products Not Sold Measure as:

# Products Not sold v2 =
VAR NotSoldFilter =
    FILTER ( ‘Product’, [# Sales] = 0 )
RETURN
    CALCULATE ( COUNTROWS ( ‘Product’ )NotSoldFilter )

Which would return the same result and, at least for more complex measures, is a convenient way to apply filters. Any time a measure is used as a filter condition, the FILTER function (Rather than CALCULATETABLE) needs to be used.

However, caution needs to be exercised when doing so because it implements something called context transition – the row context applied through FILTER is converted into an equivalent filter context. Normally the two return equivalent results which is fine; however if there are non-unique rows in the table being iterated over (In this case the sales table), whilst FILTER works fine in a pure row context, non-unique rows will be counted only once when it is converted into a filter context.  For this reason it is generally safer to re-write the filter conditions directly rather than rely on using a measure to apply the filters.

Removing Filters

Finally, consider the scenario when we want to remove filters. For example, we might have a year slicer on our report but if we want to consider all time periods in our measure (ignoring the slicer selection) then we can use the ALL function on the Date table e.g.:

Total Sales All time =
VAR IgnoreTimeFilter =
    ALL ( Date )
RETURN
    CALCULATE ( [Total Sales], IgnoreTimeFilter )

We can also use ALL where we only want to ignore a selected field rather than a whole table. If we had the year slicer with sales broken down by month, and we wanted to calculate the proportion of sales in each month we could use:

Total Sales =
SUM ( Sales[Sales Amount] )

Full Year Sales =
VAR IgnoreMonthFilter =
    ALL ( ‘Date'[MonthNo] )
RETURN
    CALCULATE ( [Total Sales], IgnoreMonthFilter )

Proportion of Full Year Sales =
DIVIDE ( [Total Sales], [Full Year Sales] )

In this case, the year filter would still apply but within each year the full year sales would look ignore any particular context for a month and consider all the months when calculating sales.

ALL is most useful in this scenario – as a denominator for a ratio. Other examples might include e.g. considering sales in a particular city as a proportion of total sales in a broader region.

Sometimes the ALL function is a little too crude in removing all the filters. Consider the scenario where we have a date range slicer on our report; we might want to consider the proportion of sales within the date range selected by the user. Applying ALL to the denominator of our ratio will ignore the date range selection, so we can use the ALLSELECTED function instead. This function removes any direct filters applied on the table/field though keeps those applied indirectly e.g through slicers or visual/page/report filters.

Other similar functions include ALLEXCEPT and ALLNOBLANKROW which work as their names suggest. However, there are many complexities when working with these functions and it is a good idea to try these measures first  in a matrix with a hierarchy of columns in the Rows section in order to check that the measure is calculating the correct totals at each level of the hierarchy.

 

 

 

Want to provide feedback on this blog post or enquire about having training for your company/team? Fill in the form below and let us know!

Related Articles

Design a Data Model

Rishi Sapra
0

Power BI How to build a star schema. This topic looks at how to understand what structure of data tables you need, and how to model them using a star schema When data is imported into Power BI, the resulting data model sits at its heart. It drives the kind of analysis that can be… Continue reading Design a Data Model

Read More

Example Reports

Rishi Sapra
0

Power BI Example Reports | Visual Best practice design This report was the winner of the Complex Data Preparation Challenge for the 2017 Data Insight Summit in Seattle. The challenge involved taking a number of disparate data sets from the Hawaii Tourism Authority (on visitor numbers to Hawaii) and turning them into a visualization which… Continue reading Example Reports

Read More

Manage Table Relationships

Rishi Sapra
0

Power BI How-to Manage Table Relationships Once we have created our fact and dimension tables in the query editor, we then define the table relationships in the relationships pane of Power BI, either by dragging a field in one table over the top of another one or by selecting ‘Manage Relationships’ in the modelling tab:… Continue reading Manage Table Relationships

Read More

Leave a comment