Plotting and Graph Terminology
Explaining basic concepts such as axis, data source and data series used for plots.
What does series, data source, axis mean when dealing with plots? If you are going to create a graph or work with some plotting framework in Matlab, Python, Julia or perhaps a software packages such as Excel or Apple’s Numbers, it is useful to know what the different parts of a plot are.
I’ll try to explain it as a pipeline, starting with the data going all the way to the graphics on the screen.
The source data we want to visualize is referred to as the data source. Typically this is organized in a table. In scientific programming packages such as R, Matlab, NumPy and Julia these is usually referred to as a Data Frame. A data frame is a table where each column has a label we can refer to it.
A data frame is different from a matrix, in that a matrix is essentially a table with homogenous data. So e.g. every row and column contains floating point numbers. In a data frame every column can have different data from the next.
I will use terminology from Julia, which should be quite similar to other software packages. In a Julia DataFrame each column is a
The data in our data source needs to be mapped into one or more data series to be able to plot anything.
Each color in the different plots below represent a different series. Each series is a data relation you want to plot in one graph (plot). I like to think about each series as a function. A function maps input values to output values. Alternative ways of looking at it is that it maps a x value to a y value. Mathematicians call the x values domain and the y values codomain.
This means a series can’t just be a column in the data source (data array). You need two columns to form a series.
Example From Geology
Let me take an example from the domain I work in which is oil and gas. It is common to send measurement instruments down a well bore and measure different properties of the rock at different depths:
- Electrical resistivity in the rock formation. That is how much the rock resists electrical currents. You might have determined that a rock formation is porous and holds some fluid, but you don’t know if it is water or oil. Rock holding water will conduct electricity better than rock holding oil.
- Velocity of sound waves through the rock. Sending sound waves between two points in a rock formation will take some time. This time closely matches the density of the rock.
So at each depth we measure multiple values. The depths we measure at make up one column of data. There will be one column for electrical resistivity and one for the velocity of sound. Combining a column of measured data with the depth column gives us a series.
Plot Types and Geometries
The plot types define how the data series get visualized. As you can see below there are many different plot types or charts you can use.
Mapping Data to Plot
When you have picked data, defined series and decided what sort of plot to use (geometry), you still need how data should be mapped to the geometry you see in the plot. In Gadfly each aspect of the geometry which may be mapped to some data is referred to as an aesthetic.
In this example we are mapping columns in the data source name
ys to aesthetics in a line geometry called
y. For other geometries, there would be different number of aesthetics and names. For bar charts e.g. there is also a
color aesthetic and you may chose
xmax aesthetics as inputs for to the geometry rather than
Here is how I would have plotted a similar graph in Julia using Gadfly.
Create the data:
xs = 1:2:12
ys = xs.^2
Store it in a data frame
df = DataFrame(x = xs, y = ys)
Note, we can name the columns anything we like, so this is also valid:
df = DataFrame(foo = xs, bar = ys)
Tell Gadfly that:
dfis our data source.
xaesthetic connects to
foocolumn of the data source.
yaesthetic connects to
barcolumn of the data source.
plot(df, x = :foo, y = :bar, Geom.LineGeometry)
Note there are lots of other ways to specify data to the plotting function. You don’t have to use a data frame e.g. However I think this example is a good illustration of what plotting is essentially about.
You need some data, and then you need to define how that data is mapped to the geometry of the graph you are plotting. Regardless of the plotting library you use, it will need to offer you some way of describing that connection. Aesthetic and Geometry however does not seem to be as commonly used terminology for plots, the way data series is. Data frame is also pretty common for various programming languages and packages. You’ll also encounter that terminology when using the R language for statistical analysis.
Read more about working with data in tables in my
DataFrames focused story.