Data Visualization with Perl and SVG

G. Wade Johnson

YAPC::NA 2010

SVG

How many people here are familiar with SVG?

Despite it's maturity, many people really don't know much about it.

SVG

SVG is an XML-based vector graphics image format
specified by the W3C

Unlike raster graphics, all objects are drawn with high-level descriptions, not individual colored pixels. These objects hold their identities in the completed drawing which allows for effects associated with the objects.

SVG Features

I could spend a long time trying to list all of the individual elements and features of SVG and still not really do them justice. We'll just hit the high points for now.

SVG Maturity

Implemented in a number of non-browser applications over the last decade. This has allowed programmers, designers, and artists to make use of the technology and help it improve. All of the major browsers, except 1, have significant portions of the specification implemented. Microsoft has announced that IE 9 will have support for SVG.

Data Visualization

What do we mean by Data Visualization?

What do people mean by the term?

Data Visualization

Means different things to different people.

Many of you have probably done some form of data visualization. Depending on someone's field or background, they are likely to prefer particular ways of visualizing data. People like Edward Tufte and Stephen Few have categorized and studied may ways of visualizing data (good and bad) for years.

DV: Business Graphics?

Most business people and students probably think of the standard set of presentation graphics tools when trying to decide how to present data.

DV: Scatter Plots?

Scatter plots are great for data that includes random error. Often you can get an idea of potential curves to fit from looking at the plot itself. People with a science or research background are likely to prefer these.

DV: Histograms?

Histograms are similar to bar graphs. But, they normally serve to quantize discrete or continuous data in categories.

DV: Radar Graphs?

Radar graphs seem to be falling out of favor, but you still see them every now and then.

DV: Sparklines?

Sparklines are word-sized graphics designed
by Edward Tufte for use within text. They can
display trends , the win/loss record of your
favorite ball team , or any other data
that would benefit from a quick summary.

Sparklines are actually becoming more popular. Some programs support embedding sparklines in documents. You also find a fair number of tools for generating sparklines for the web.

Improvement

How can SVG improve on basic charts?

Every graphic you've seen so far has been SVG.

Obviously, SVG can easily duplicate the functionality of any data visualization format. But, it is fair to ask what can SVG do that other formats cannot? After all, if it doesn't provide any new benefits, what's the point of using it?

Scalability

Vector graphics are inherently scalable.

Zoom in or out to almost any degree and vector graphics remain just as sharp. This is an inherent advantage of vector graphics. Raster graphics don't scale as well because the consist of a limited number of pixels.

Interactivity

What if you could interact with the data?

The definition of each object is not a matter of which pixels are which color. Objects are defined explicitly and a new representation can be generated, by the viewer, after any scaling.

As a consequence of the fact that the graphical objects remain in the representation of the image and SVG supports scripting, we have the interesting possibility of providing graphics that the user can interact with.

While this capability is often used more for games or user interface components, it can allow combining multiple instances of a single graphic into one. Given the appropriate clues, a user can then interact with the graphic to explore the data in new ways.

A Line Graph...Plus

A couple of years ago, I began doing some profiling work on the speed of the script engines in various browsers and SVG viewers. When I wanted to display the result, I naturally turned to SVG as my data visualization tool of choice.

Unfortunately, none of the libraries that I found out there quite displayed the data the way I wanted. A little bit of Perl and I had this output with a small amount of interactivity.

Each data point was generated by running the test multiple times and generating statistics from the results. The markers on the data line encode five pieces of information. The high and low values are shown by the top and bottom of the vertical line. The rectangle runs from one standard deviation above the average to one standard deviation below the average. The middle of the box (where the color changes) marks the average value.

Data Exploration

Jeff Schiller's Web Statistics.

Jeff Schiller built a wonderfully interactive example showing percentage of users accessing his site from each of the major browsers. By using the interactive features of SVG, Jeff was able to provide a way to explore the data, not just display it.

Map-based Data Visualization

Cartographers display data on maps.

It turns out that one group that has really embraced SVG is the cartography community. Vector graphics turn out to be a really good way of representing maps. Displaying data on maps is something that SVG does quite well.

Mappetizer

Ruth Lang provided an example generated with the Mappetizer tool.

This example was posted on the SVG Developers' Mailing List in part of a discussion of interactive controls. This shows one of the directions that many cartographers have followed in displaying map-based data with SVG.

Animation

With SVG we can also display changes in time.

Use scripting or SMIL for graphs that change over time. This can be used for either real-time display or a tactic for displaying yet another changing variable on a single graphic.

Dynamic Data Displays

The instruments demo was one of my first serious uses of SVG. This is a variation of a tool I used to show data streaming in from an external server.

What About Perl?

So SVG is cool, but what has this got to do with Perl?

Although we normally think only of the graphics that result from data visualization. Generation of the graphics can be a separate issue from the display of the graphic.

Generation vs. Display

SVG generation and SVG display can be separated.

These two events can be separated in both space and time. In the profiling example from before, the measurements, the data processing, the SVG generation, and the display were all performed at different times. There is no inherent reason for them to be connected.

SVG Generation with Perl

SVG is just XML.

Unlike many other graphics formats, SVG is just XML. XML is just (Unicode) text. How many of you have manipulated XML with Perl? Okay. How many of you have not manipulated XML with Perl?

XML-Specific Modules

Since SVG is just XML, libraries that write XML can write SVG. (Assuming they support namespaces.)

SVG is Just (Unicode) Text

Although there is less of a safety net, anything that you can use to output raw text can also be used to write SVG.

SVG-specific Modules

There are a number of modules that provide a Perl interface for writing SVG. I have some experience with the first three.

SVG Example


#!/usr/bin/env perl

use SVG;

my $svg = SVG->new( width => 200, height => 150 );
$svg->line( class=>'axes',
    x1=>50, y1=> 20, x2=>50, y2=>130 );
$svg->line( class=>'axes',
    x1=>50, y1=> 130, x2=>150, y2=>130 );
$svg->path( class=>'data',
    d=>'M50,130l10,-50l10,-30l10,40'
      . 'l10,-50l10,20l10,30l10,-80'
);

print $svg->xmlify;

This module basically provides methods that help you to write out the individual XML elements, and provides a small amount of validation and utility methods. Other than that, you need to be familiar with SVG and be relatively comfortable with the mechanics of the format to use this module.

SVG::TT::Graph Example


#!/usr/bin/env perl

use SVG::TT::Graph::Pie;
my $graph = SVG::TT::Graph::Pie->new({
    'height' => 200, 'width' => 350,
    'compress' => 0, 'expand_greatest' => 1,
    'fields' => [ qw/Jan Feb Mar Apr/ ],
});

$graph->add_data({ 'data' => [ 50, 60, 53, 58 ], });

print $graph->burn;

Here's a quick example of how to generate a pie-chart with the SVG::TT::Graph module. Obviously, this module is much more focused and helps you trade high-level methods for complete control over mechanics.

SVG::Sparkline Example


#!/usr/bin/env perl

use SVG::Sparkline;

my @temps = (61,67,67,77,83,84,82,84,84,86,78,50,44,47,
    54,76,72,78,78,80,80,82,81,77,82,74,77,64,72,75,71);
my $svg = SVG::Sparkline->new( Line => { values=>\@temps } );

print $svg;

Quick example using the SVG::Sparkline module to create a simple line graph. This module is also much more strongly focused resulting in less manual work in exchange for less control over the actual output.

Backend Feed Data to Visualize

JavaScript in SVG can call out to a server.

Remember Jeff Schiller's web browser application and the instruments demo. Both of these were designed to retrieve data from a server for display in the browser.

Perl Serving Data

Need I say more?

Server-Driven SVG

This version of the instruments demo is driven from a server. Depending on the venue, this is either a CGI script or an HTTP::Daemon standalone server.

The Server

A standalone server can be pretty simple.

This is a relatively simple HTTP::Daemon server I wrote to generate random data for display on the server. Its only real advantage is that it is runs simply on my laptop and does not require that the network be available for the demo. Data could actually be coming from anywhere.

References: Web

This is a subset of the resources I use for information on SVG. Some are general information, and others give specific advice.

References: Print

The first book is a bit out of date, but it is the one I learned SVG from. The other two represent what I'm trying to learn about data visualization. Both provide examples that are far beyond my current abilities.

Questions?

Post Talk Information

After the main talk, the group asked a lot of questions about SVG. The drum picture was shown as an example of photorealistic pictures. The primer link was added as someone asked about resources for learning SVG.