9 mins read

Visualizing Uncertainty

Seeing how probabilities work

I’ve been struggling with a means to portray uncertain information, and hope to work out some issues through this explanation with you.

[ad#inline-body]

Uncertainty is difficult

Yes, most people have trouble figuring out uncertainty, or probabilities.

And it’s hard to explain the concepts without referring to even more difficult to understand issues.

But let’s have a go, shall we?

Rolling dice

Or more specifically, one die (yes, that’s the singular form of dice, but you’ve probably heard that before and remember it now that I’ve mentioned that fact;-).

Most people have rolled dice in any number of games. The standard form of a die, a cube, has six sides, number one through six. And presuming everything is on the up and up, it’s equally likely to roll and cause any of the six values to appear face up.

One six sided die…

And this six sided die has what we call a “uniform” distribution. That’s specifically a reference to the equal chance of any one of the six results. We can visualize it with a small picture:

Image of 1d6 Distribution
Results of rolling 1 six sided die

That’s “one” result of each of the numbers 1 through 6. And if we were to look at a simple statistics about it, we say the arithmetic mean (average) is 3.5. To calculate this, we add all six values together and divide by 6 [(1+2+3+4+5+6)/6 = 3.5]. You won’t ever “roll” a 3.5, but it’s still what you might expect – half the values are below, and half above.

Two six sided dice

And this remains part of everyday life – people roll two dice all the time. It’s used in games like Craps, Backgammon, and Monopoly.

And the “odds” are different for two dice versus one. If you add the values together, there’s only one way to roll either a two or twelve, and six different ways to roll a seven. People understand that it’s much more likely to roll a seven than a two. And it becomes more clear when we use an illustration:

Image of 2d6 Distribution
Results of rolling two six sided dice and adding the results together

There are a total of 36 different outcomes (six outcomes for the first die, and six more from the second for each: 6 x 6 = 36). The likelihood of rolling a 7 is six times that of a 2, twice that of a 4.

And when we calculate the arithmetic mean, it is not surprising to find 7.

Skewed distributions

But things don’t always turn out as even distributions around some central number. Some times they favor one side or the other. We call this “skew”. Consider how likely it is that someone will call to ask questions during your evening meal versus almost any other time of the day…

If we had some weighted dice, we might find that they roll according to the following graph:

Image of a skewed distribution
Results of rolling weighted dice

The graph above lists a total of 60 different outcomes (I’m not implying anything about rolling the dice – these are “rigged” and happen to perform in this manner – call it magic). The likelihood of rolling a 4 is nine times that of a 12, more than twice that of a 9.

If we calculate the arithmetic mean, we find 5.6. And while that’s accurate, it isn’t as descriptive as it was in the earlier cases.

A Box, with Whiskers

Instead of showing a single number to represent this probability distribution, we’ve got to add information. We’d still like to show the “central tendency” of the distribution, but instead of taking the arithmetic mean, we’ll use the “median“. This value is right in the middle of the distribution. Half are above it, half are below.

And then we’d like to define the “middle” of the distribution. If we knew what value was half way between the median and lower end, and half way between the median and the upper end, we could say that half of the time, the result would be between these two values.

Technically, we call these “quartiles“, and we’re using the 25th percentile and 75th percentile to bound this box.

And finally, we want to show the extremes – how low is reasonable, how high?

If our numbers aren’t too spread out (like the one die example), it would be reasonable to simply list them – they happen frequently enough. But when the values seem to go on and on, we’d like to use some “practically reasonable” upper and lower bounds. The specific percentile used for these bounds can vary, but for our purposes we’ll use 2% and 98%. That is, we’ll call the lower bound the 2nd percentile of the data, and the upper the 98th.

Box and Whiskers Data

And if we look back at our probability distributions, we can calculate these five bits of data for each. Here’s a table showing them:

Median 25% 75% 2% 98%
One Six Sided Die 3.5 2 5 1 6
Sum of Two Six Sided Dice 7 5 9 2 12
Rigged Dice 5 3 9 2 11
Table of probability distribution data

But that probably doesn’t help you understand anything about those distributions…

Box and Whiskers Graph

So let’s visualize this data:

Image of Box and Whiskers Plot
Box and Whiskers Plot of two dice rolls

And now you might have an idea of what’s going on. The “regular” roll is likely to add up to seven, and has a good chance of being clustered around it. The rigged roll is likely to add up to five, and is much more likely to be a rather low number than a high one.

And if I asked you to roll dice against me in a competition for the higher number, you’d insist on using the regular dice (and stick me with the rigged ones if you could).

Selling books

So now we’ve got a means of comparing uncertain events.

But things can get more complicated when we deal with real-life issues – how do toxins disperse when released from a pipe or chimney? Can we illustrate the movement of the stock market? What happens when some species grows without any limits to the population?

Or what’s behind something as ordinary as the sale of books.

If a million books are published in a year, how many sell one copy? And is there an explanation for how many sell thousands or hundreds of thousands of copies?

Statistically, researchers have found that items like book sales work on a “power law” basis.

Power Law

The power law represents a phenomena where the frequency of an event varies on the size of the population – exponentially. (And this is a bit of a simplification, but for our purposes is enough detail)

One million books might sell one copy, but one book sells a million. Here, let’s look at a chart:

Books in this Rank Number of books sold
1 1,000,000
10 100,000
100 10,000
1,000 1,000
10,000 100
100,000 10
1,000,000 1
Table of book sales data

I’m not saying this is exact, but it’s in the ballpark. The highest selling nonfiction hardcover of 2010 sold 2.6 million copies, the 5th 800,000, the 50th 180,000. I’ve seen published figures that a million books were published in 2009, and the average book sells all of 250 copies…

So for our purposes, this power law comes into play often enough, and really skews things up!

Visualizing the power law

So here’s a graph of the table above:

Image of the Power Law on Book Sales
Graph of books sales according to the power law

And that’s not very helpful… Here, let me blow up the bottom left corner so that it’s more easily seen:

Image of Power Law on Book Sales - Close Up
Graph of books sales according to the power law - close up

And that’s still difficult to make much sense of.

Logarithmic Scales

But the interesting detail about items that obey the power law is that when you scale the X and Y axis according to logarithms, the chart becomes quite simple:

image of Power Law on Book Sales - Log Scale
Graph of books sales according to the power law with log scales

Interesting, right?…

Further issues

One last matter to deal with – does the Box and Whiskers plot reveal anything useful when we talk about items that behave according to the power law?

Not really…

The arithmetic mean of the data in the table above: 6.3

The median: 1

The 2%, 25%, 75%, and 98%: 1

Statistically, if you publish a book, and have no better chance than anyone else, you’ll sell one copy.

And yet somebody is going to write a best-seller and may sell close to a million copies.

So yes, I’ve helped by explaining one way to compare uncertain events – and given a mechanism to visually compare different events. But I bring this last item up to show that even then it isn’t a slam dunk in every situation.

And it’ll take a bit more to work through the probabilities of a million chimps banging on keyboards…