Today, Python is one of the most sought after skills in the world of Data Science, and as such, we can leverage this power in our Tableau Data Visualisations. While integration is not entirely out of the box and requires some initial setup, it is not as hard to get up and running. In this article, we will introduce Python, show you how to integrate Python in Tableau, and more importantly, leave you with an example that you can build on.
Python Programming Language
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python’s design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
Python gained a lot of traction in the world of data science and now has a host on fantastic libraries that support your requirements:
Library | Description |
NumPy | NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. |
SciPy | SciPy is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering. |
Pandas | Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term “panel data”, an econometrics term for data sets that include observations over multiple time periods for the same individuals. |
StatsModels | Statsmodels is a Python package that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. It complements SciPy’s stats module. |
Scikit-Learn | Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. |
TensorFlow | TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. It is used for both research and production at Google. |
NLTK | The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. |
Keras | Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System), and its primary author and maintainer is François Chollet, a Google engineer. Chollet also is the author of the XCeption deep neural network model. |
Theano | Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures. |
Gensim | Gensim is an open-source library for unsupervised topic modeling and natural language processing, using modern statistical machine learning. Gensim is implemented in Python and Cython. Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing. |
Note: this is a short list of popular libraries that can be leveraged in your Python code.
TabPy
TabPy (the Tableau Python Server) is an external service implementation which expands Tableau’s capabilities by allowing users to execute Python scripts and saved functions via Tableau’s table calculations. TabPy allows Tableau to remotely execute Python code.
It has two components:
- A process built on Tornado, which allows for the remote execution of Python code through a set of REST APIs. The code can either be immediately executed or persisted in the server process and exposed as a REST endpoint, to be called later.
- A tools library that enables the deployment of such endpoints, based on Python functions.
Tableau can connect to the TabPy server to execute Python code on the fly and display results in Tableau visualizations. Users can control data and parameters being sent to TabPy by interacting with their Tableau worksheets, dashboard or stories.
You can read more about TabPy in the official Github Repository: https://github.com/tableau/TabPy
Getting Started
Note: I will be showing you how to get up and running on Windows, however, the instructions should not be too different if you are an iOS user.
Notice: This section will be updated due to a new version of TabPy
Now that the introductions are out of the way, let us get started by installing Python, TabPy and to get everything integrated.
Download and Install Python
- Go to https://www.python.org and download Python 3.7.3.
- Once downloaded, double click and install Python.
- Now we will add Python to our Path.
- Click on the Windows Start button and type sysdm.cpl and click enter; this will open your System Properties.
- Click on the Advanced tab.
- Click on Environment Variables…
- Under System variables double click on Path; if Path does not exist, click on New…
- In Edit environment variable click on New and type the following: C:\Users\YourUserName\AppData\Local\Programs\Python\Python37-32
- Keep clicking Ok to close all the configuration windows.
- Now we will test that Python has been added to your machine.
- Click on the Windows start button.
- Type CMD and click enter to open up Command Prompt.
- Now type python –version (two hyphens).
- If you see Python 3.7.3, or whichever version you have installed, Python is working.
Download and Install TabPy
- Go to the following website https://github.com/tableau/TabPy/releases.
- Click on Source code (zip) to download the TabPy code.
- Once the file is downloaded, unzip the contents.
- Go to this directory and double click on startup.cmd.
If all goes well, you should see the following:
Note: if you get a Windows Security Alert, click on Allow access.
Tableau and TabPy
Now we will try to connect Tableau to the TabPy Server.
- Open Tableau Desktop.
- Go to Help in the application menu.
- Go to Settings and Performance and select Manage External Service Connection.
- In the External Service Connection dialogue window:
- In Select an External Service and select TabPy/External API.
- In Server type localhost.
- in Port type 9004, you can see the port that TabPy started on in the image above.
- Click Test Connection.
If all goes well, you should get a Successfully connected to the external service. and boom, we have installed Python, TabPy and connected to Tableau with TabPy, so are now ready to build a workbook and leverage the power of Python.
Worksheet
We are going to start by opening the Sample Superstore Data Source that is provided with Tableau Desktop. Using this Data Source, we are going to create a Calculated Field to perform Pearson’s Correlation Coefficient (r).
Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for example, age and blood pressure. Pearson’s correlation coefficient (r) is a measure of the strength of the association between the two variables.
Let us create a Calculated Field called Pearson Correlation Coefficient:
SCRIPT_REAL("import numpy as np
return np.corrcoef(_arg1,_arg2)[0,1]",
SUM([Sales]),SUM([Profit]))
Things to note:
- We are importing the NumPy library.
- We are calling the corrcoef function and passing through SUM(Sales) and SUM(Profits).
We will now build our worksheet:
- Change the Mark Type to Circle.
- Drag Category onto Columns.
- Drag Sales onto Columns.
- Drag Profit onto Rows.
- Drag Customer Name onto the Detail Mark.
You should now have the following:
Now we will use our Pearson Correlation Coefficient.
- Drag Pearson Correlation Coefficient onto the Colour Mark.
- You will see an invalid JSON error, but do not worry about that. Please close the dialogue.
- Right-click on this object, go to Compute Using and select Customer Name.
- Click on the Color Mark.
- Click on Edit Colour and select the Red-Green Diverging.
You should now end up with the following:
and boom, we are done. As it stands, we have installed Python, TabPy, Integrated Tableau and Python and created a Calculated Field that sends data to Python and returns a value that is rendered.
Summary
In future articles, I will expand on other possible uses of Python and the value that it can bring to your Tableau experience; I will cover topics like such as Sentiment Analysis, Identifying Outliers, and Data Science techniques to name a few.
I hope you all enjoyed this article as much as I enjoyed writing it. Do let me know if you experienced any issues integrating Tableau and Python, and as always, please leave a comment below or reach out to me on Twitter @Tableau_Magic, till next time, Have fun with Tableau.
If you like our work, do consider supporting us on Patreon, and for supporting us, we will give you early access to tutorials, exclusive videos, as well as access to current and future courses on Udemy:
- Patreon: https://www.patreon.com/tableaumagic
Also, do be sure to check out our various courses:
- Creating Bespoke Data Visualizations (Udemy)
- Introduction to Tableau (Online Instructor-Led)
- Advanced Calculations (Online Instructor-Led)
- Creating Bespoke Data Visualizations (Online Instructor-Led)
Thx Toan
As always, you are very welcome 😀
Thanks!
Looking forward to future TabPy articles
The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
Good morning Toan, the install was fine, but i got an ‘invalid json’ as return.
although the msdos cmd window indicates no error on the python routine.
any idea?
Hi Julio, I will need to have a look at this. I remember reading something about certain Python versions, but I will try to have a look later today and get back to you.
I found this on GitHub… https://github.com/tableau/TabPy/issues/40
nevermind. it was an issue with LOD on the calculation.
cheers and thank you.
Awesome 🙂
Your blog is nice. I believe this will surely help readers who are really in need of this vital piece of information. Thanks for sharing and kindly keep updating.
Thank you Asma, I am looking at writing a second part in the next few months.
Hi Toan, Nice article.
Exploring the possibility of exporting Summary / underlying data using Python from Worksheet in async method.
Let me know how it goes 🙂
Hi Toan,
I am getting a problem. Could you please help if possible.
I have installed Python and downloaded and unzipped tabPy. Everything went good but while performing below step, I can’t see ‘In Select an External Service and select TabPy/External API’. Still I coonected Localhost by entering port type 9004, but I am getting message ‘Successfully connected to Predictive service’.
In the External Service Connection dialogue window:
In Select an External Service and select TabPy/External API.
In Server type localhost.
in Port type 9004, you can see the port that TabPy started on in the image above.
Click Test Connection.
Which Version of Tableau are you using? Can you go to Help -> Settings and Performance -> Manage External Service Connections? There is some wording difference in different versions, but the steps are the same.
Thanks, Toan, for the lesson. I am starting to use TabPy and Python scripts on Tableau.
Your welcome, and enjoy, it is loads of fun
I followed your medthod but I have some problem.
When I click test connecting Python with Tableau then I recieved error massage
‘An error occurred while communicating with the external service. Tableau is unable to connect to the service. Verify that the service is running and that you have access privileges.’.
What should I do for this?
thanks for your help.
Hi Nat, I am on vacation at the moment, but please post onto the Tableau Magicians Fb group, someone there may be able to help you.
Okay, thanks.
Enjoy your vacation.
There is an easy way to install and start TabPy now – https://github.com/tableau/TabPy/blob/master/docs/server-install.md.
No need to deal with downloading and unpacking releases, setting up env variables and so on.
Thank you for bringing this to my attention Oleksandr. I will have a read through this and update the tutorial. Kind Regards, Toan
how do you create variables from streaming data in ? if i have api’s that i can bring the data from and clean it for analytics how do i turn that into variables in tableau? thanks
If you want to stream data, your best bet might be to thinking about the Hyper API.
found the examples and it looks like that i need! thank you . i have a lstm time series model that i m trying to bring into tableau for the cool charts and all that
actaully this may not work bc is for that coming from database. mine is coming though an api from sensors .. i just want to stream it straight from the api into the dimensions and measure fields in tableau and then run my code in the calculation field .. any ideas on that? other than that i m using the firebase for the data but i dont see it listed in the connectors page to bring it in from there
It will be challenging, you can build a web data connector for firebase, that should not be too hard; you can then use an extension to refresh the data at specific intervals.
Ya , i m trying to make a shortcut by linking my firebase to Google analytics and bring it in that way since it has a pre built connector already or Google big query.
The challenge will be the time lag if you go that route. Funnily enough, I am looking to build some firebase apps and was thinking about creating a web data connector, I don’t think it would be too hard to build.
I want to pass the python graph (folium or choropleths) to Tableau. Is it possible?
I do not believe that this is possible through the Python API. The idea is that was can pass values for computation. You can get Python charts showing in Tableau if you build an extension.
Hi,
Thank you Toan Hoang for sharing such a nice post on your blog keep it up and share more.
Thanks for Sharing a Very Informative Post & I read Your Article & I must say that is very helpful post for us.
I really appreciate your post, and you explain each and every point very well. Thanks for sharing this information.
Thanks for this blog. provided great information. All the details are explained clearly with the great explanation.