Why is it important to care about efficient workbooks? You never get a second chance at a first impression. Load times are one of the very first aspects of our dashboards visible to the end user. We can begin to lose our audience before they even begin to experience the dashboard.
This blogpost is the second of a series that will provide you with the best practices to keep in mind on your journey to create efficient workbooks along with flow charts to help simplify the process of determining where to focus your optimization efforts. The series highlights points from the white paper Designing Efficient Production Dashboards by Ben Bausili published by Tableau. There a link downloading at the end of this post.
This blogpost, in particular, will help guide you with best practices in creating the needed calculations. However, it is not intended to be an introduction to calculations in Tableau.
Calculations Best Practices
Calculations are where we can control much of the complexity and speed of queries to the underlying data source. Reducing the complexity by materializing calculations or doing it in advance in your data source are two of the most effective ways to achieve great performance. However, there are many things in calculations that can add up to impact performance, and even small things will add up. Therefore, we will provide you with general guidelines to keep in mind when writing calculated fields as well as tips and tricks on how to structure your calculations efficiently.
Tip no. 1: Materialize your calculations
Calculations in Tableau typically run when a user requests or interacts with a dashboard. If we can do the calculation before the user arrives, we can save them time by delivering a faster dashboard. There are several ways you can achieve this speed boost. The easiest is to use a Tableau Hyper Extract, as Tableau will materialize your calculations (i.e. calculate them and store the results in the Extract) for you when you create your extract or you can ask Tableau to do it by going to the Data menu in Tableau Desktop, selecting your data source, and then Extract > Compute Calculations Now. On a live connection, you can perform calculations in the data source. Row-level calculations can easily be added to a database view or table.
Tip no. 2: Aggregate your data when possible
The closer the calculation is to each individual row of data, the slower it will perform. The difference comes down to order of operations. Take for example a use case where you are allowing a user to select a measure to display. The calculation to the left requires the user input to be compared before aggregation on each and every row. However, by adding, for instance, sum aggregations, it allows Tableau to aggregate the values before checking the user selection, which results in fewer conditional checks.
For LOD calculations, look at the number of unique dimensions members in the calculation and reduce if possible. For table calculations, the more marks you have in the view the longer it will take to calculate.
Tip no. 3: Consider when to use Native Features vs. Custom Calculations
Native features in Tableau makes developing analytics fast and easy. In certain cases, it is faster to do a custom calculation, e.g., grouping. In many others, the native features have been optimized to the point that they will be substantially faster than anything custom, e.g. advanced analytics from the analytical pane. One sometimes surprising best practice is to create calculations that use CASE statements to group items instead of Tableau’s built-in group functionality. The built-in grouping loads the entire domain of the dimension, whereas your calculated field only loads the named members of the domain. In testing, grouping methods performed in the following order (from best to worst):
1. CASE Statements
2. Tableau Sets
3. Native Group function
4. If / Else If Statements
Tip no. 4: Reduce the number of nested calculations
Creating calculations built upon other calculations can often happen as you iterate and validate your work, but each layer can add complications and additional processing. This is especially true when you layer IF statements or other performance intensive functions. However, do not consolidate all your calculations into a single block as readability and your ability to maintain calculations overtime is important.
Tip no. 5: Use IN instead of OR
The left statement will return TRUE if Project contains the values “Project 1” or “Project 2” and FALSE otherwise. From a performance perspective, when a data source is passed a series of OR statements, it evaluates each condition individually before giving the result of the entire statement. If we use IN instead, it can be assumed that we are checking a single dimension against a list, which is a much simpler task:
Tip no. 6: String searches and manipulation
Incorporating IF statements is slower than allowing a function such as CONTAINS to return a Boolean result. Take for example a case where you want to look up a product name:
If you can change this sort of calculation into a Filter or Set, you may find even better results in performance.
Tip no. 7: Test the most frequent outcomes first
When Tableau processes your conditional statements, it stops as soon as it finds a match. Putting the most likely outcomes at the top will mean the majority of cases in your data will stop sooner. Often, it is worse for performance to have multiple IF statements nested together, but in cases where nested statements help the most frequent outcomes to be evaluated quickly, they might result in better processing.
Tip no. 8: Tableau will not materialize a calculation with a parameter
This means that anything you have put in your calculated field will not gain the benefits of being precomputed and stored in your Extract. If you have more complicated calculations happening within the larger calculation including a parameter, consider breaking the calculation apart so that Tableau can materialize pieces of it. Consider the calculation below that allows a user to provide a conversion rate (parameter). Instead, we could break out the last part of the calculation, i.e. ([Sales] – [Expenses]), into its own calculation and refer to it in the calculation shown below. Thus, Tableau would be able to materialize it.
Tip no. 9: Consider alternatives to Table calculations
Table calculations are executed by Tableau after the result of a query is retrieved. If a table calculation is performing poorly, it is often because the query result being returned to Tableau is large. In those cases, consider pushing back some aspects of the calculation to a query or data source. One way to do this within Tableau is a level of detail (LOD) calculation.
Table calculations are often not necessary. For example, if we want to find the average sale per store, this can be solved using the following table calculation: Window_AVG(Sum([Sales]). This would require a visualization that showed sales for each store, so the table calculation can take the appropriate average. This could potentially result in a very large dataset and a lot of marks in our view. However, two more efficient options to consider are:
1. Use a Tableau extract and aggregate sales to each Store. Then we can do a standard calculation of AVG([Sales])
2. Calculate the average in a standard calculation by taking a count of stores. SUM([Sales]) / COUNTD([Stores])
Tip no. 10: Materialize string manipulation in your data
FIND, REPLACE, REGEX and so forth are all comparatively expensive functions to use in calculations. You will see better performance by making sure your data has the string values you need to use already in it. Hence, it is best to look at how these calculations can be added to your underlying data or materialized in your Extract.
Tip no. 11: MIN and MAX are faster than ATTR and AVG
There is often a need to display a column that contains duplicate values and would display the same result for AVG, MIN, MAX, and ATTR. In those cases, MIN and MAX will be computationally faster, as Tableau for instance calculates both a MIN and a MAX for the ATTR calculation.
Tip no. 12: COUNTD is slow
There is a lot going on in this simple calculation to ensure it only counts the distinct items. When possible, use COUNT, [Number of Records], or the CNT function that allows you to count the rows of a single table.
Tip no. 13: ELSEIF > ELSE IF
ELSEIF statements are viewed as a single statement and can often be transformed into a CASE statement for the underlying query, while ELSE IF statements are considered 2 separate logical tests and result in nested logical statements in queries.
Looking for more?
This blogpost extracts the key takeaways from the white paper Designing Efficient Production Dashboards by Ben Bausili. The white paper is built upon the experience and expertise of consultants across the globe and is addressing the likely areas you will achieve large improvements with performance in Tableau. For more detailed advice and insights, download the white paper here
Troubleshooting flow chart that can help simplify the process of where to focus your optimization efforts related to calculations. The flow chart is from the report, Designing Efficient Production Dashboards, from Tableau and Interworks.