Do you sometimes experience high latency when working with Tableau workbooks and dashboards? When handling a lot of data in one workbook the user experience sometimes becomes less smooth due to performance issues within Tableau. Over the coming weeks, we will provide you with a series of three blog posts that will give you a roadmap to great performance in Tableau workbooks along with flow charts to help simplify the process of determining where to focus your optimization efforts. More specifically, the key areas that will be in focus are:
- Data sources
- Workbook design
This blogpost will be tackling performance problems related to the first step: data sources. You can find the troubleshooting flow chart at the end of this article where you can also download an extensive white paper from Tableau that has inspired this series.
Data Source Best Practices
Tableau has an extensive ability to connect to data across many different platforms. For most connections Tableau can either query the data live or create a Hyper Data Extract (a .hyper file).
Querying live means that Tableau will generate queries to the underlying data source as you build visualizations and dashboards. Thus, if the underlying data source is slow, it will have a huge impact on your dashboard. In such cases, Tableau Hyper Extract will be your secret weapon for performance. In general, most environments benefit from using extracts and extracts are, therefore, an important part of your toolkit to improve performance.
A few recommendations apply broadly to all data sources and model types, including doing proper data prep, summarizing data, excluding unused data, and using Hyper when possible. More specifically:
- Data Speed Is Key: Your visualizations can only perform as fast as your underlying data sources, and that is the giant roadblock that can turn great dashboards into headaches. Many data sources have unique optimization strategies, such as database indexes. You can consult your administrators to discuss possible options, but one technique that always works is focusing your data source i.e., limiting the size of the data. Smaller data means less info to process and transmit. Use only the data that is needed at the grain necessary for the worksheet to perform its analysis.
- Use Hyper Extracts When Possible: Tableau Hyper Extract is a powerful tool that leverages data source filters, aggregating to visible dimensions, and hides unused columns to shrink the size of your extracts and accelerate your dashboards.
- Use Native Drivers When Possible: For the best performance, stick with native drivers. When a native driver is not available, you can turn to JDBC or ODBC if your data source supports it or check the extensive list of web data connectors for additional options.
- Optimize and Materialize Your Calculations: Perform calculations in the database when possible. When a calculation is created and stored in the database, you are reducing overhead in Tableau. This is especially the case for row-level calculations that do not need to be done by Tableau, such as Profit = Sales – Cost. Aggregate calculations are an example of something that often needs to be done specific to a user’s requests and should be calculated within Tableau.
Hyper Extract Specific Recommendations
The most impactful way to improve performance is to focus your Extract to the exact needs of your workbook. Tableau gives you a lot of built-in options for managing your extract. We recommend considering these in particular:
- Pre-aggregate data and “aggregate data for visible dimensions”. When you create a data extract, Tableau also gives you the option to aggregate your data for all visible dimensions, i.e., an aggregated extract. This type of extract does not contain the row-level data. Rather, it contains only the aggregated data. Hence, it is smaller than a standard data extract, and it creates another level of efficiency in generating fast performance.
- Use the appropriate data storage. You can store data in the extract as either logical tables or physical tables, which is tied to the layers of the new data model as described in another blogpost in this series. The physical table storage option is the most granular, as level of detail is at the row level of merged physical tables. The logical storage will group everything as you see it on the logical layer, so it will complete any joins you have defined on the physical layer and store them as single tables in the extract. You can think of logical storage as an aggregate of physical storage based on how you have defined the relationships on the logical layer. In cases where you have defined physical joins, this could result in larger than expected data volumes
- Use extract filters. There are many ways to filter your data in Tableau. Extract filters provide the most performance benefit. You can add an extract filter in the create extract dialog. This filter will leave out any data from the extract, making it smaller and faster.
- Remove unneeded columns by pressing the “Hide All Unused Fields” button when you create a Tableau Extract. Tableau will remove any field in your data that is not referenced by your Tableau workbook. For best results, make sure you have cleaned up your production workbook of any unused calculations, sheets, or dashboards so that Tableau can truly optimize it for the production workload.
- Embedded extracts provide additional optimization: Embedded extracts (i.e., publishing a workbook with an extract) have two important performance advantages: 1) they are optimized for your workbook, and 2) they reduce roundtrips over the network. When you embed the extract, Tableau analyzes your workbook and optimizes the extract by creating temporary tables to speed up elements of your dashboards. For instance, you might have multiple filters set to show only relevant values. Tableau can create smaller datasets to make these look-ups easier to perform.
- Materialize calculations in your extracts: If you are using a hyper extract, Tableau will automatically materialize your calculations (if possible). This essentially pre-calculates the results before a user interacts with the dashboard. This technique can be applied to any data source, however, with a bit of manual work. By taking the time to build out row level calculations or even more complicated calculations in your data source, you are offloading the processing from Tableau. These are particularly effective when dealing with string calculations, which are more resource-heavy than numeric or date calculations.
Live Data Connections Specific Recommendations
When using a live data connection, you will be very dependent on the underlying data source’s ability to process your requests in a timely manner. In cases where you are connecting to a relational database, your database administrator has a role in helping performance by doing things such as index tuning and proper data modeling. That said, as workbook authors, we do have some best practices we can follow to ensure we have the best possible performance in Tableau.
- Use referential integrity: If your underlying database supports the concept of foreign keys, Tableau will automatically be able to assume referential integrity allowing Tableau to do join culling. I.e., if a query requires only data from one table, the other joined tables will not be referenced, and hence, Tableau will drop it from the query without affecting the results.
- Leverage the relational data model: the relational data model allows for more flexible relationships between tables. Testing showed performance was better on more intensive calculations, like COUNTD() when it could leverage a small dimension table in place of one large table or traditional join.
- Consider using multiple data sources: When dealing with complex data sources at different levels of detail, it is often better to use them separately than trying to force them to coexist with the complex queries that results in. However, if all the sheets in your dashboard are the same level of granularity, a single table is likely preferable.
- Advice for talking to data engineers and database administrators: If you have optimized what you can in Tableau and your performance is hitting a bottleneck specifically on queries, then it might be time to talk to your data team about tuning the data source. Here are some things they can do to help (not all will apply to every data source type): move complex calculations to the database, set indexes on join dimensions, use a star schema when possible, set appropriate primary and foreign keys, set Index on filtering dimensions, and cast data types in data source.
This blogpost extracts the key takeaways from the white paper Designing Efficient Production Dashboards by Ben Bausili. The white paper is built upon the experience and expertise of consultants across the globe and is addressing likely areas where you will achieve large improvements with performance in Tableau. For more detailed advice, insights or personal help with your Tableau dashboards, feel free to contact us.
Download the white paper “Designing Efficient Production Dashboards” from Tableau here.
Troubleshooting flow chart that can help simplify the process of where to focus your optimization efforts related to data sources and strains on the Tableau Server resources. The flow chart is from the report, Designing Efficient Production Dashboards, from Tableau and Interworks.