Toan Hoang

Tableau and Python / An Introduction

Today, Python is one of the most sought after skills in the world of Data Science, and as such, we can leverage this power in our Tableau Data Visualisations. While integration is not entirely out of the box and requires some initial setup, it is not as hard to get up and running. In this article, we will introduce Python, show you how to integrate Python in Tableau, and more importantly, leave you with an example that you can build on.

Python Programming Language

Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python’s design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.

Python gained a lot of traction in the world of data science and now has a host on fantastic libraries that support your requirements:

LibraryDescription
NumPyNumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
SciPySciPy is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.
PandasPandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term “panel data”, an econometrics term for data sets that include observations over multiple time periods for the same individuals.
StatsModelsStatsmodels is a Python package that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. It complements SciPy’s stats module.
Scikit-LearnScikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
TensorFlowTensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. It is used for both research and production at Google.‍
NLTKThe Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook.
KerasKeras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System), and its primary author and maintainer is François Chollet, a Google engineer. Chollet also is the author of the XCeption deep neural network model.
TheanoTheano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.
GensimGensim is an open-source library for unsupervised topic modeling and natural language processing, using modern statistical machine learning. Gensim is implemented in Python and Cython. Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.

Note: this is a short list of popular libraries that can be leveraged in your Python code.

TabPy

TabPy (the Tableau Python Server) is an external service implementation which expands Tableau’s capabilities by allowing users to execute Python scripts and saved functions via Tableau’s table calculations. TabPy allows Tableau to remotely execute Python code.

It has two components:

  1. A process built on Tornado, which allows for the remote execution of Python code through a set of REST APIs. The code can either be immediately executed or persisted in the server process and exposed as a REST endpoint, to be called later.
  2. A tools library that enables the deployment of such endpoints, based on Python functions.

Tableau can connect to the TabPy server to execute Python code on the fly and display results in Tableau visualizations. Users can control data and parameters being sent to TabPy by interacting with their Tableau worksheets, dashboard or stories.

You can read more about TabPy in the official Github Repository: https://github.com/tableau/TabPy

Getting Started

Note: I will be showing you how to get up and running on Windows, however, the instructions should not be too different if you are an iOS user.

Notice: This section will be updated due to a new version of TabPy

Now that the introductions are out of the way, let us get started by installing Python, TabPy and to get everything integrated.

Download and Install Python

Download and Install TabPy

If all goes well, you should see the following:

Note: if you get a Windows Security Alert, click on Allow access.

Tableau and TabPy

Now we will try to connect Tableau to the TabPy Server.

If all goes well, you should get a Successfully connected to the external service. and boom, we have installed Python, TabPy and connected to Tableau with TabPy, so are now ready to build a workbook and leverage the power of Python.

Worksheet

We are going to start by opening the Sample Superstore Data Source that is provided with Tableau Desktop. Using this Data Source, we are going to create a Calculated Field to perform Pearson’s Correlation Coefficient (r).

Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for example, age and blood pressure. Pearson’s correlation coefficient (r) is a measure of the strength of the association between the two variables.

Let us create a Calculated Field called Pearson Correlation Coefficient:

SCRIPT_REAL("import numpy as np
return np.corrcoef(_arg1,_arg2)[0,1]",
SUM([Sales]),SUM([Profit]))

Things to note:

We will now build our worksheet:

You should now have the following:

Now we will use our Pearson Correlation Coefficient.

You should now end up with the following:

and boom, we are done. As it stands, we have installed Python, TabPy, Integrated Tableau and Python and created a Calculated Field that sends data to Python and returns a value that is rendered.

Summary

In future articles, I will expand on other possible uses of Python and the value that it can bring to your Tableau experience; I will cover topics like such as Sentiment Analysis, Identifying Outliers, and Data Science techniques to name a few.

I hope you all enjoyed this article as much as I enjoyed writing it. Do let me know if you experienced any issues integrating Tableau and Python, and as always, please leave a comment below or reach out to me on Twitter @Tableau_Magic, till next time, Have fun with Tableau.

If you like our work, do consider supporting us on Patreon, and for supporting us, we will give you early access to tutorials, exclusive videos, as well as access to current and future courses on Udemy:

Also, do be sure to check out our various courses:

Exit mobile version