Toan Hoang

Tableau and R / An Introduction

One of the Tableau topics I am frequently asked is can we integrate Tableau with the R programming language for statistical computation? The answer is yes, Tableau can be integrated with R, but then the question becomes a discussion about the art of the possible and sometimes, inner chuckle, on how we can use ggplot2 output in Tableau dashboards.

In this article, I want to introduce R, RStudio, Rserve, and how we can pass information from Tableau in R and render the results; this is aimed to be an introductory article, and hopefully will give you a good starting point for further exploration.

R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team (of which Chambers is a member); R is named partly after the first names of the first two R authors. The R project was conceived in 1992, with an initial version released in 1995 and a stable beta version in 2000

A list of changes in R releases is maintained in various “news” files at CRAN, but below is a list of milestones:

ReleaseDateDescription
0.16 This is the last alpha version developed primarily by Ihaka and Gentleman. Much of the basic functionality from the “White Book” (see S history) was implemented. The mailing lists commenced on April 1, 1997.
0.4923-Apr-1997This is the oldest source release which is currently available on CRAN. CRAN is started on this date, with 3 mirrors that initially hosted 12 packages. Alpha versions of R for Microsoft Windows and the classic Mac OS are made available shortly after this version
0.6005-Dec-1997R becomes an official part of the GNU Project. The code is hosted and maintained on CVS.
0.65.107-Oct-1999First versions of update.packages and install.packages functions for downloading and installing packages from CRAN.
1.029-Feb-2000Considered by its developers stable enough for production use.
1.419-Dec-2001S4 methods are introduced and the first version for Mac OS X is made available soon after.
2.004-Oct-2004Introduced lazy loading, which enables fast loading of data with minimal expense of system memory.
2.118-Apr-2005Support for UTF-8 encoding, and the beginnings of internationalization and localization for different languages.
2.1122-Apr-2010Support for Windows 64 bit systems.
2.1314-Apr-2011Adding a new compiler function that allows speeding up functions by converting them to byte-code.
2.1431-Oct-2011Added mandatory namespaces for packages. Added a new parallel package.
2.1530-Mar-2012New load balancing functions. Improved serialisation speed for long vectors.
3.003-Apr-2013Support for numeric index values 231 and larger on 64 bit systems.
3.421-Apr-2017Just-in-time compilation (JIT) of functions and loops to byte-code enabled by default.
3.5 Packages byte-compiled on installation by default. Compact internal representation of integer sequences. Added a new serialisation format to support compact internal representations.

R Programming Language: https://www.r-project.org
Wikipedia: https://en.wikipedia.org/wiki/R_(programming_language)

R Packages

The capabilities of R are extended through user-created packages, which allow specialised statistical techniques, graphical devices, import/export capabilities, reporting tools etc. A core set of packages is included with the installation of R, with more than 15,000 additional packages (as of September 2018) available at the Comprehensive R Archive Network (CRAN), Bioconductor, Omegahat, GitHub, and other repositories.

As of this article, a list of popular R Packages according to RDocumentation are:

PackageDescription
R6Creates classes with reference semantics, similar to R’s built-in reference classes. Compared to reference classes, R6 classes are simpler and lighter-weight, and they are not built on S4 classes so they do not require the methods package. These classes allow public and private members, and they support inheritance, even when the classes are defined in different packages.
ggplot2A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
devtoolsCollection of package development tools.
dplyrA fast, consistent tool for working with data frame like objects, both in memory and out of memory.
tidyverseThe ‘tidyverse’ is a set of packages that work in harmony because they share common data representations and ‘API’ design. This package is designed to make it easy to install and load multiple ‘tidyverse’ packages in a single step.
readxlImport excel files into R. Supports ‘.xls’ via the embedded ‘libxls’ C library and ‘.xlsx’ via the embedded ‘RapidXML’ C++ library. This library works on Windows, Mac and Linux without external dependencies.
opensslBindings to OpenSSL libssl and libcrypto, plus custom SSH pubkey parsers. Supports RSA, DSA and EC curves P-256, P-384 and P-521. Cryptographic signatures can either be created and verified manually or via x509 certificates.
stringiAllows for fast, correct, consistent, portable, as well as convenient character string/text processing in every locale and any native encoding.
data.tableFast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write.
pkgconfigSet configuration options on a per-package basis. Options set by a given package only apply to that package, other packages are unaffected.
tidyrAn evolution of ‘reshape2’. It’s designed specifically for data tidying (not general reshaping or aggregating) and works well with ‘dplyr’ data pipelines.
RcppThe ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries.
readrThe goal of ‘readr’ is to provide a fast and friendly way to read rectangular data (like ‘csv’, ‘tsv’, and ‘fwf’). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
sparklyrR interface to Apache Spark, a fast and general engine for big data processing. This package supports connecting to local and remote Apache Spark clusters, provides a ‘dplyr’ compatible back-end, and provides an interface to Spark’s built-in machine learning algorithms.
yamlImplements the ‘libyaml’ ‘YAML’ 1.1 parser and emitter for R.
utf8Process and print ‘UTF-8’ encoded international text (Unicode). Input, validate, normalize, encode, format, and display.
GlueAn implementation of interpreted string literals, inspired by Python’s Literal String Interpolation and Docstrings and Julia’s Triple-Quoted String Literals.
LubridateFunctions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects.
Reshape2Flexibly restructure and aggregate data using just two functions: melt and ‘dcast’ (or ‘acast’).
tidyselectA backend for the selecting functions of the ‘tidyverse’. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other ‘tidyverse’ interfaces for selection.

If you explore R, you will find that there are a lot of interesting packages available for use; a full list of the table above can be found here: https://www.rdocumentation.org.

Note: Leveraging R within Tableau is awesome, but if you have time, I would highly recommend you learn how to us R itself as there are lots of extremely cool things you can do with it.

R Studio

RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics. RStudio was founded by JJ Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief Scientist at RStudio.

RStudio is available in two editions: RStudio Desktop, where the program is run locally as a regular desktop application; and RStudio Server, which allows accessing RStudio using a web browser while it is running on a remote Linux server. Prepackaged distributions of RStudio Desktop are available for Windows, macOS, and Linux.

RStudio is available in open source and commercial editions and runs on the desktop (Windows, macOS, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian, Ubuntu, Red Hat Linux, CentOS, OpenSUSE and SLES).

Read more about RStudio here: https://www.rstudio.com

Rserve Package

Rserve acts as a socket server (TCP/IP or local sockets) which allows binary requests to be sent to R. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++ and Java, allowing any application to use facilities of R without the need of linking to R code. Rserve supports remote connection, user authentication and file transfer. A simple R client is included in this package as well.

To integrate Tableau with R, we will make use of the Rserve package.

Read more about Rserve here: https://www.rdocumentation.org/packages/Rserve

Getting Started

As with all our tutorials, let us first get started by downloading the required software and test the connectivity:

With the two key pieces of software installed:

If all goes well you have now:

Now let us start Tableau Desktop and connect to the Rserve process:

If all goes well, you will see Successfully connected to External Service which means you have now integrated Tableau with R. This is some cool stuff, but I think we will have more fun if we build a sample Tableau Dashboard which leverages this connection to R.

Worksheet

As this is an introductory article (more fun stuff will come later) we will build a very simple dashboard to demonstrate Tableau and R functionality. This means we are going to build a map of the US State and use R to perform kmean clustering.

With the Sample – Superstore dataset open:

Now we are going to create a Calculated field called Cluster:

SCRIPT_INT('result <- kmeans(x = data.frame(.arg1,.arg2,.arg3), '+STR([Cluster Size])+')
result$cluster',
SUM([Profit]),SUM([Sales]),SUM([Quantity])
)

Let us dig into this a bit:

Now to finish off our worksheet:

You should have something like the following:

And there you have it, boom, you now have a Tableau Worksheet that draws out all 50 states and uses R to perform a kmean cluster based on Profit, Sales and Quantity to give a color. Feel free to change the number of clusters and observe the results, or better yet, check out the vast library of available R packages and functions. 

Summary

In future articles on R, I will go through some really cool things you can do such as Sentiment Analysis, Identifying outliers, additional clustering methods to name a few. 

I hope you all enjoyed this article as much as I enjoyed writing it. Do let me know if you experienced any issues integrating Tableau and R, and as always, please leave a comment below or reach out to me on Twitter @Tableau_Magic, till next time, Have fun with Tableau.

If you like our work, do consider supporting us on Patreon, and for supporting us, we will give you early access to tutorials, exclusive videos, as well as access to current and future courses on Udemy:

Also, do be sure to check out our various courses:

Exit mobile version