Visualizing data

Working out when to use what type of graph can be a challenge. A lot depends on what insight you’re looking for, as different graphs reveal different points of interest. First up, let’s run through the three types of visualizations: volumetrics, illustrations, and statistical graphs.

A volumetric shows you just the facts. Key numbers, a quick glance at the trends, maybe some analysis of your target. At Dexibit, these are some of our most popular visualizations, because it’s quick and easy for everyone to get their heads around. They tell us exactly what we need to know.

Illustrative visualizations are great for working with unstructured or complex data – like word clouds that show popular themes from freeform comments and heat maps that show popular areas from location analytics – or choropleths which map out where visitors come from or a favorite, sunbursts which help analyze trail sequences – our visitors start in the white spot in the middle and the segments outwards show all the different ways they pick a path to traverse through the venue.

And then we have a ton of traditional statistical graphs which are great for bringing life to quantitative data. A noisy table of totals in a traditional report doesn’t reveal much, so picking the right graph helps us look at data in different ways and get more meaning.

Here are a few common chart types:

Bar charts show data by categories (either vertically or horizontally) – to analyze frequency, rank, and deviation. For example, a bar chart is great to use for visitors by the hour or dwell time distribution. Your bars can be side by side or stacked.

A run chart or line chart plots performance over time to look for trends or patterns. The lines can be overlapped or stacked. For example, a line chart is great to use for exhibition visitation, revenue or member conversion over time.

A control chart looks at how a process changes over time, with a central line for the average, an upper line for the upper control limit and a lower line for the lower control limit, determined from historical data. For example, a control chart is great to use for forecast versus actual residuals. These work best with a larger data set, such as a year.

An area chart is like a line chart with categories emphasized by color for comparison. Areas can be overlapped or stacked. For example, an area chart is great to use for comparing onsite with online visitation.

Histograms are like bar charts with one big difference – they plot performance for a single item, rather than compare two, used to look at distribution and variation. For example, a histogram is great to use for queue wait times.

A scatter plot looks for two factor correlation by plotting a dot for each value between both variables, to see if a trend is evident and how steep that correlation is. For example, a scatter plot is great to use for rainfall versus visitation. If you see the dots all over the place in your scatter plot, it means the two things you’re plotting here don’t correlate with each other.

Pie charts or donuts provide qualitative comparison or ratio. They’re often branded too basic by some data lovers, but they’re easy to understand for most people. For example, a pie chart could be used to analyze website referral channels.

A good visualization should follow standardized rules so it is easy to understand at a glance and is not misleading:

  • The Y axis is the vertical line and the X axis is the horizontal line (an easy way of remembering this is the rhyme ‘Y to the sky’)
  • Time is usually plotted on the X axis
  • Always try to start your Y axis at 0 and proceed upwards
  • Make sure your graph isn’t ill proportioned
  • Make sure your graph doesn’t contain bias

If you’re looking at a graph and you want to understand what insight it’s offering you, try asking yourself:

  1. What’s the average of the data I’m looking at? How even is the distribution – is it shaped like a normal bell or does it look slanted?
  2. Are there any outliers – data points that are way off the norm?
  3. What’s the range? For example, if you’re looking at average revenue per visit, what are the lowest and highest values for a single day over the course of the year?
  4. How stable is the data over time? Does it jump up and down, is it growing or declining, or is it flat?
  5. What trends are happening? Can we clearly see patterns across a week or season?
  6. What might be influencing the data? Is there a correlation of various data sets – if we’re looking at a scatter plot – can we see a strong line, or is it just messy, showing us there’s not a lot of correlation? How strong is that connection? How can we prove causation – that we can prove why something’s happening?
  7. Do we need any tension metrics to balance what we’re seeing and make sure we fully understand what’s happening? For example, if people are spending a long time on a website, is it because they’re engaged with the content, or because the website is slow or they can’t find what they’re looking for?

If you’re new to data, it’s easy to get a little overwhelmed looking at a statistical graph – take a moment to orient yourself with the title, the caption, the axis and then the data – then look into what it’s telling you.