Bob Peers is a data consultant at Inviso by Devoteam
Snowflake has been in the spotlight a lot recently, mainly for their IPO in 2020 that valued the company at over $100 billion. But Snowflake has been around for 9 years now and is a stable and mature product
Alteryx has also recently announced a partnership with Snowflake, but what is Snowflake and how and why would I use it with Alteryx?
What is Snowflake?
Snowflake is what people are calling a Cloud native database.
Most databases in use today started out as locally installed (on-premise as it’s called) applications where your own IT department was responsible for setting up and managing the database. If the database got too large after years of use it was your responsibility to give it more disk space, or if the server it was installed on became overloaded it was again your job to move it to another server.
Microsoft SQL Server, Oracle, MySQL, Postgres and so on have historically always worked like this. Even after they moved to the cloud, as with SQL Server on Azure it’s still not what you could call ‘born in the cloud’.
We have gradually moved our servers, applications and storage into the cloud for many reasons, including:
- No installation or maintenance
- You only pay for what you use
- Instantly and infinitely scalable
- High availability (uptime)
Snowflake was designed with this mind along with being highly optimised for super fast queries and analytics in the cloud. Just sign up, create an account and start using. You can be up and running in 30 minutes.
Gone are the days when you have a team of Database admins optimizing indexes or normalising tables. Snowflake handles all this for you.
One of the greatest aspects of Snowflake is that everything can be configured via the very well designed web interface or through SQL-like commands from a connected client.
This means that you can for example, remotely suspend or resume a Warehouse, change the Warehouse size (from XS to 4XL) or add and edit users.
For example to create a new warehouse is a simple command:
create or replace warehouse inviso_wh with
auto_suspend = 180
auto_resume = true
This warehouse will be created but not started, but it will automatically start as soon as you send your first SQL statement (auto_resume). After the SQL is processed it will automatically suspend after 180 seconds (auto_suspend). This way you only pay for the time you’re actually using the database with a minimal fee for data storage.
To change the warehouse size simply use:
alter warehouse inviso_wh set warehouse_size=medium;
Warehouse and Cloud Service credits, quotas and usage are also readily available so you can easily monitor your expenses or add alerts for high usage.
If you want to manage your Warehouse through the browser, the interface is simple and intuitive, yet powerful to use. You can also see a full history of all commands run against the database, including from external clients for example using the ODBC or Python Connector.
Inevitably database performance is THE most important metric for most analytics users so it’s good to know that the Snowflake SQL Engine is best in class and scales automatically as required.
If you have Gigabytes or Terabytes of data that you need to store and process Snowflake is great. Huge volumes of data can be uploaded either by bulk loading or streaming directly.
This benchmark ran 99 queries on 1TB of data spread over 24 tables, the largest having 4 billions rows! As you can see Snowflake is both fast and good value.
Alteryx + Snowflake = Power + Flexibility
Using Alteryx together with Snowflake you can combine a powerful, easy to use analytics platform with the speed and scalability of Snowflake.
Alteryx users can:
- Use Snowflake as a data source, loading data into Alteryx and analysing.
- Load data into Snowflake and then prep inside Snowflake using In-DB tools
- Use Snowflakes data exchange to easily enrich data
Using In-DB tools allows you to leverage processing power of Snowflake’s SQL query engine which will always be orders of magnitude faster than you have available locally. This way you can avoid transferring large datasheets in and out of Snowflake as all the data stays in Snowflake making huge performance improvements.
Alteryx works with Snowflake using an official ODBC Driver, full details about installing and using are shown on the Snowflake site. Once installed you can use Snowflake as an Input or Output, and the driver supports In-database connections.
The driver supports default warehouse and database but if you wish to change either you can easily do that using a Pre-SQL statement in the input tool as shown here.
Single Sign On
Using the Snowflake driver you can also set up Single Sign On using Google and Okta as an identity provider so users can login to Snowflake and authenticate via the browser. This can also be set to cache the connection to reduce the number of logins required.
In-DB with Snowflake allows use to harness the compute performance of Snowflake with the analytic flexibility of Alteryx without downloading and reuploading gigabytes of data.
If you do need to upload large volumes of data, Alteryx supports bulk uploads using either Amazon S3 as a staging area or Snowflake’s own internal staging (user, table or named stages are all supported). This is a much faster method to upload data compared to using a standard ODBC connection.
So who is this for?
In summary the combination of Alteryx and Snowflake can be used by small and large companies alike.
If you’re a large company with Terabytes of data who needs massive storage along with performance that will scale with your company you can quickly carry out advanced analytics on your data using Alteryx In-DB tools with Snowflake.
At the other end of the scale small companies will love Snowflake and Alteryx for their ease of use. After signing up online and installing the Alteryx Snowflake Starter Kit you can be up and running in 30 minutes. You only pay for the compute time you use with very small storage costs, it’s a cost effective solution.
For all types of companies, whether you have 1000 people in your IT department or none, you’ll appreciate how easy it is to get started with a fully managed, high performance database that integrates seamlessly with Alteryx.