Tableau Server Setup In An AWS Environment

Recently while working with a client, Cloud Systems Developer, Chris Hurley, had the opportunity to install a Tableau server from scratch. Tableau is an analytics platform that allows an enterprise to collaboratively build visualisations using data from its data warehouse. This blog post details the considerations when setting up a Tableau server in an AWS environment.

The setup described below is a fairly common AWS architecture for a public facing app. The client connects to a url in the following form:

https://www.tableau.environment.region.domain.com

e.g.https://www.tableau.dev.eu-west-1.companyxyz.com

This request goes to route53, where it gets resolved to our application load balancer. This connection is made over HTTPS using a certificate created in AWS Certificate Manager which we associate with the ALB. The ALB then forwards the user's request on to our back end tableau server. The user can then authenticate with Tableau via a Cognito user pool. Once authenticated, the connection is complete and the user can create visualisations and integrate with tableau as desired. Tableau itself connects to whatever data sources you have set up. In this diagram, the datasources are represented by RDS and Redshift.

Network and Security Layer

The ALB sits in a public-facing subnet within our VPC. The security group associated with it allows for incoming connections from the public internet over port 443 (for HTTPS). It also allows for connections over the Tableau admin port. The Tableau instance sits in a private subnet so that it can be isolated from the public internet and the associated security threats. Its security group allows port 80 inbound from the ALB security. This too, is standard best practice from a security point of view. The security group also allows administrators to connect to the tableau instance over port 22 from the VPC CIDR range.  

Infrastructure as Code

Sceptre

We used Sceptre to set up all the infrastructure. This way we could easily manage the setup of the entire stack from Route53 all the way to the EC2 instances in a reliable repeatable way. Having everything described in code along with the use of variables and SSM Parameter store for secrets made it possible to replicate the entire setup across development, testing, staging and production with a single command.

Tableau install script

Sceptre/CloudFormation allows you to pass in a userdata script as a parameter when creating an EC2 instance. Using this functionality, we were able to keep a setup script for Tableau in the repo and launch it automatically when a Tableau instance is launched. The script below is the current version at the time of writing.  

    
#!/usr/bin/env bash
set -e


# Mount up EBS volume for tableau data
# On m5 instances /dev/xvdb is exposed as /dev/nvme1n1
if ! grep -qs nvme1n1 /proc/mounts; then
  sudo mkfs -t ext4 /dev/xvdb
  sudo mkdir -p /data
  sudo mount /dev/xvdb /data/
fi

# Add the following mount config to fstab (if it isn't already there) to persist mount changes on reboot
# UUID  mount_point  file_system_type  fs_mntops fs_freq  fs_passno 
# We use the UUID as it is more resilient to a system reboot
UUID=`sudo file -s /dev/nvme1n1 | grep -Po '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}'`
if ! grep -q $UUID /etc/fstab; then
  sudo sh -c  "echo 'UUID=$UUID       /data ext4    defaults,nofail       0 2' >> /etc/fstab"
fi

if ! yum list installed jq > /dev/null; then
  sudo yum install -y jq
fi

# Get tableau user password from ssm
TABLEAU_USER_PASSWORD=`aws ssm get-parameter --with-decryption --region eu-west-1  --name '/Tableau/UserPassword' | jq -r '.Parameter.Value'`


# Add the tableau unix user
if ! id -u tableau > /dev/null; then
  sudo useradd tableau
  # Pass TABLEAU_USER_PASSWORD in as password into passwd via stdin if it exists else default to using tableau-user
  echo ${TABLEAU_USER_PASSWORD:-'tableau-user'} | sudo passwd tableau --stdin
fi

# Make directory for tableau setup
mkdir -p ~/tableau
cd ~/tableau

# Start the tableau install
wget -nc https://downloads.tableau.com/esdalt/2018.2.0/tableau-server-2018-2-0.x86_64.rpm
if ! rpm -qa | grep -qw tableau-server; then
  sudo yum install -y tableau-server-2018-2-0.x86_64.rpm
fi

# Install tableau using data mount for storage and tableau user as inital user
export TABLEAU_DATA=/data/tableau
export TABLEAU_SCRIPTS=/opt/tableau/tableau_server/packages/scripts.20182.18.0627.2230
if ! [ command -v tsm >/dev/null 2>&1 ]; then
  echo "Initialising TSM"
  sudo "$TABLEAU_SCRIPTS"/initialize-tsm --accepteula -d "$TABLEAU_DATA" -a tableau
fi

# Access environment variables added by the initialize-tsm script
source /etc/profile.d/tableau_server.sh

# Login to tsm using the tableau unix user
tsm login -u tableau -p ${TABLEAU_USER_PASSWORD:-'tableau-user'}

# Activate tableau licence
tsm licenses activate -t
# Use this once we have a key
# tsm licenses activate -k 

# Register tableau
cat > ~/tableau/registration.json < ~/tableau/identity-store.json <

 

The first thing we do in the Bash script is to set -e so that the script will exit immediately if a command exits with a non-zero status. Any console output will be logged to /var/log/cloud-init-output.log, so it is easy to debug your scripts following a launch if the instance does not behave the way you intended.

Tableau Server runs best with at least 50 GB of free disk space. It's best to use an EBS volume for this amount of storage. The next step is, therefore, to mount the EBS volume that we use for the tableau data if it's not already mounted. We define the EBS volume in the Sceptre config but still need to mount it here so we can make use of it. We also need to ensure the volume stays mounted after a reboot. We do this by getting the volume's UUID and adding to the fstab file. This command needs to be run as root so we use sudo sh -c  to run commands as root.

We make sure to double check that we haven't already done either of these steps before running them in order to ensure the script is idempotent i.e. it can be run repeatedly and not try to do the same thing twice. This helps us in the testing stage and makes the script more reliable in general.

We then install jq as it's a handy little package to parse the json responses we get from AWS. Next, we get the Tableau user password that we have previously stored in SSM Parameter Store. This allows us to have a secure private string available to us at any stage of the configuration. The script uses the role that we have previously assigned to the EC2 instance in Sceptre in order to authenticate against SSM Parameter store. If the user doesn't exist yet we can create them and give them the password that we retrieved from the Parameter Store. We also provide a default password here for testing in the case where the secret is not set up. This should not be used in production as the password would be available in the repo.

Next, we set up the required directories and install Tableau using yum. Again, checking first if it's already installed in order to keep the script idempotent. We then configure the required environment variables and install Tableau Services Manager (TSM). Once tsm is installed we can access it using the Tableau user we previously set up and register Tableau.

The final step is to configure the identity provider. Here we configure Tableau to use a local identity provider and connect that to our Cognito user pool over OpenID. This is done using the tsm tool along with some configuration parameters provided in our Sceptre config as environment variables.

The final step in this script is to set up an administrator user that we can use to log into the tableau console and start adding data sources and creating visualisations based on that data.

Conclusion/Next Steps

It was great to have the opportunity to set this all up from scratch. It evolved into quite a nice setup but there are always more things we could add in the future.

A lot of the steps in the script could be built into an AMI. That would improve the launch time of the server if it needed to be re-launched frequently.

Due to restrictions in tableau high availability involving a manual failover step, we were unable to find a solution to provide true high availability. This is something we should revisit in the future for production environments.

I hope this blog post provided something useful and let us know in the comments if you have any questions.

  • sceptre
  • aws
  • tableau