Saturday, 5 November 2016

"3D" plots

It is very common to perform linear analysis in many dimensions. We experience a 3 (approximately) linear spatial dimensions in our world, so we have very good intuition in up to three dimensional linear space. This makes it tempting to visualise things in three dimensions, giving rise to "3D" plots. Going into high impact literature, let's google image "nature 3D scatter plot". the first hit is found on http://www.nature.com/app_notes/nmeth/2006/060328/fig_tab/nmeth870_F3.html from 2006:

This is a pretty representative plot if you continue to scroll down the image search. The problem is that "3D" scatter plots like this doesn't actually display 3 dimensions, making it a harder-to-read version of a normal 2D scatter plot with 3D inspired cosmetics. I don't want to isolate this specific figure, publication or journal (I've no idea about the context), I just wanted to show that these plots are present all the way up into the most prestigious journals. In fact, a google image search for just "3D scatter plots" yields the same kind of hard-to-read 2D plots. Some are better than others, but the majority are plots like these:

To be fair, these two specific plots are from a question on stackoverflow, and a screenshot from a plotting suite where you can also rotate the figure. Point is that static 2D renders of 3D plots like the ones above do not allow you to read out the third space dimension.

Let me elaborate:

The number of dimensions of a scatter plot is how many values you can read out from each dot. A typical scatter plot, as you know, will have 2 dimensions: the x and y coordinate of the dot.

A typical 2D scatter plot


Sometimes, or even frequently, you want to display more than two numbers for each dot though, and there are ways to do that.
Using points size, point type (discrete), colour and support lines to display a third dimension apart from x and y.
These methods have their advantages and drawback, but they all make it possible to read out more information than from the basic 2D scatter plot.

The "3D" scatter plot is a basic 2D scatter plot, but with various 3D-looking graphical effects like reflections on the dots or angled grids in the background. An example I made using a cute little online tool I found called highcharts is this:
A "3D" scatter plot: dot reflections and a 3D grid, but neither allow me to read out more than 2 numbers from the each dot.
We cannot read out the third dimension of the dots as we cannot see how far into the screen each dot is, so this is no more than a basic 2D scatter plot where I read the x and y coordinate on the screen, despite the cosmetics. The choice of camera angle decides which direction ends up as the depth dimension that we cannot see, and which directions (perpendicular to the depth) we can see. So we lose as much information as if we had only plotted dimension 1 and 2, but exactly which information we lose is much less transparent, for both the plotter and the reader.

So I am arguing that, unless you have a very good reason for your choice of camera angle, a "3D" scatter is a worse way to plot three-dimensional data than to just plot the first and second dimension, where you at least know what you are looking at, and what you are leaving out. In fact, I suggest we from now on refer to this kind of plots as "3D" scatter plots, with the quotation marks, as a derogatory term, and we leave the term 3D scatter plot for plots that actually display three values per dot. This can conveniently be done in spoken language with air-quotes, and a slightly disgusted facial expression for good measure. "3D" plots can be used in general for any plots displaying less than three dimensional data, but still using perspective effects. "3D" pie charts jumps to mind. (Credit to @JovMaksimovic for tweeting this one)

This chart shows, uhh... the fraction of colours, umm.. of a circular staircase?

It is always easy (and fun!) to complain, so let's discuss what we should do with our 3 dimensional data that we want to make a picture from. First, think hard about what we actually want to show with the picture, as we may not need all three dimensions. Do we just want to show that the red dots cluster separately from the blue dots? That is actually a 1-dimensional problem, and we should be fine with just finding the direction they separate in (which will be a linear combination of the three dimensions your data is in) and plot that dimension only, while being honest in the legend that we picked this direction from the three dimensional space to show separation. Not as sexy figure? Sorry, the message we are trying to convey isn't either.

Second, can some of the dimensions be conveniently displayed through colour, size or other means? That can look pretty fancy as well, especially if you match colour scales with other plots in your paper.

But assume that we actually do want to show all three dimensions on equal footing. This is a problem that astronomers have had essentially from the founding of the field, and a good solution they use is this (a kind of support line method):
Possibly the best way to 3D scatter plot with the three dimensions on equal footing. (from European Southern Obervatory) Note that the plot also use size (I assume luminosity) and colour (representing, I guess, colour of the star), so it is actually a 5D scatter plot.

With that I wish you all happy sensible plotting, as well as much enjoyable complaining. :)

3 comments:

  1. Note however there is a typo in the script.
    data1 <- data.frame (X, Y, ) should read
    data1 <- data.frame (X, Y)

    3D Scatter Plot

    ReplyDelete
  2. I Like to add one more important thing here, The Global Bioinformatics Market is expected to be around US$ 17 Billion by 2025 at a CAGR of 14% in the given forecast period.

    ReplyDelete