Setting up SAS dates for input into Tableau Desktop

February 4, 2013 | By | 1 Comment

Some techniques for setting up integration between the SAS and Tableau solutions, specifically with respect to date formats.

SAS and Tableau are ‘best of breed’ in their own areas – SAS in the area of Analytics and ‘Analytical Data Preparation’, Tableau in the area of data visualization and interactive dashboarding in an intuitive, drag & drop environment. Consequently, it makes sense to find ways to integrate these technologies to provide an Integrated Information Framework which leverages the strengths of both solutions.

Leveraging SAS capabilities in this way provides a way to ‘rapid prototype’ business reporting requirements, without the costs and delays typically seen when attempting to model emerging business requirements in the Data Warehouse via traditional ETL methods.

In addition, this approach suggests a way to protect the investment in analytical reporting as developed by the SAS team, by providing a platform to publish those reports for easy consumption, plus easy re-formatting and ‘slice & dice’ of these reports in the Tableau environment

Example
Typically, the easiest way to prepare SAS data for consumption in Tableau is to set up an export process in SAS Enterprise Guide:
Tableau_Test_1
Using the SAS Enterprise Guide Export wizard, a SAS dataset can be exported as CSV, as a step in the process. Dataset is exported to a network location. SAS process looks like this, and can be set up to run on a schedule eg monthly.

In this example, we have a number of dates in our SAS dataset:
Tableau_Test_1c
Often, the default date format as set up in SAS is adequate to be imported into, and correctly interpreted by, Tableau.

Where this is not the case, the SAS date format ‘DDMMYYS8.’ can be used:
Tableau_Test_zerob
Which would produce SAS code similar to the following:

PROC SQL;
CREATE TABLE
WORK.QUERY_FOR_POLICY1 AS
SELECT t1.trandate FORMAT=DDMMYYS8.,
t1.polexpdt FORMAT=DDMMYYS8.,
t1.commdate FORMAT=DDMMYYS8.
FROM WORK.QUERY_FOR_POLICY t1
QUIT;

On importing the dataset into Tableau, the dates are correctly interpreted, automatically providing the ability to drill from Year to Quarter to Month:
Tableau_Test_3b
Note: While it is generally easier to prepare the data in the most appropriate format for Tableau using the extensive SAS’ data functions and formatting options, there are also date functions within Tableau. For example, to convert a text (string) field containing dates in the format YYYYMMDD into a date within Tableau, the following could be set up as a calculated field:

Date(right([PERIOD],2) + “/” + mid([PERIOD],5,2) + “/” + left([PERIOD],4))

Summary
In addition to the approach described above, another way to ‘prepare’ SAS data for consumption in Tableau involves using the SAS ‘PROC SQL’ method to output SAS results directly to a relational table. With the SAS/ACCESS interface, database objects can be referenced directly in a DATA step or SAS procedure using the SAS LIBNAME statement. PROC SQL can be used to update, delete or insert data into a relational table, for example via Bulk Load.

Filed in: Data Visualization, SAS, Tableau

Patrick Spedding

About the Author (Author Profile)

Patrick Spedding is Senior Director of BI R&D for Rocket Software, and IBM Champion for IBM Collaboration Solutions. He is also a Non-Executive Director on the Board of Eastside Radio in Sydney, Australia. Prior roles include Director of Product Management for IBM Cognos, Director of Field Marketing for Cognos, Founder of Tableau partner See-Change Solutions, and SAS Solution Manager for BI and Strategy Management. Patrick's qualifications include an MBA degree in Marketing (AIU), Diploma in Management (University of Michigan), BSc (Hons) in Mathematics (Loughborough University, UK), Fellow of the Australian Institute of Management (FAIM), and member of the Australian Institute of Company Directors (AICD). Find Patrick on Google+

Comments (1)

  1. Alan Eldridge

    Hi Patrick,

    Great post – a couple of suggestions about your date formula though. First… if you’re going to work with a date as a string type it’s better to use ISO-8601 format of YYYY-MM-DD. This is locale insensitive so you don’t need to worry about DD/MM/YYYY vs. MM/DD/YYYY. Your formula would then read:

    DATE(LEFT([Period],4)
    + “-“ + MID([Period],5,2)
    + “-“ + RIGHT([Period],2))

    This is an improvement, but string logic is much slower than numeric logic, so it would be even better to work with this as numbers. Convert the [Period] field to be a number instead of a string, then use the following:

    DATEADD(‘DAY’, [YYYYMMDD]%100-1,
    DATEADD(‘MONTH’, INT(([YYYYMMDD]%10000)/100)-1,
    DATEADD(‘YEAR’, INT([YYYYMMDD]/10000)-1900, #1900-01-01#)))

    Note that the performance gains can be remarkable with large data sets. In a test we conducted over a 1 billion record sample, the first calculation took over 4 hours to complete, while the second took about a minute.

    Hope this helps.

    Cheers,
    Alan

Leave a Reply

Trackback URL | RSS Feed for This Entry