Member-only story

Ditch Excel and Use Julia Data Frames

Manipulating and visualizing pizza sales data using Julia DataFrames.jl and Plots.jl

20 min readOct 27, 2020

In this story we will look at pizza sales data found here:

https://vincentarelbundock.github.io/Rdatasets/csv/gt/pizzaplace.csv

This kind of data can be manipulated in a spreadsheet application such as Excel and using data frames popular in languages such as R, Python (Pandas) and Julia (DataFrames.jl).

Loading Data

First we will load the data in Julia and pick a subset (id, name, size and price) of columns in the table to work with:

using DataFrames, CSV

url = "https://vincentarelbundock.github.io/Rdatasets/csv/gt/pizzaplace.csv"
filename = download(url)
all_pizzas = CSV.read(filename, DataFrame)

# Get rid of column with row numbers
all_pizzas = all_pizzas[:, 2:end]

# Pick most interesting columns
pz = select(all_pizzas, :id, :name, :size, :price)

We can look at the first view rows to see what this looks like in the Julia REPL (Read Evaluate Program Loop):

julia> first(pz, 4)
4×4 DataFrame
│ Row │ id          │ name        │ size   │ price   │
│     │ String      │ String      │ String │ Float64 │
├─────┼─────────────┼─────────────┼────────┼─────────┤
│ 1   │ 2015-000001 │ hawaiian    │ M      │ 13.25   │
│ 2   │ 2015-000002 │ classic_dlx │ M      │ 16.0    │
│ 3   │ 2015-000002 │ mexicana    │ M      │ 16.0    │
│ 4   │ 2015-000002 │ thai_ckn    │ L      │ 20.75   │

julia> nrow(pz)
49574

However we are currently looking at the first 4 rows. But as you can see there are almost 50 thousand rows in this dataset so not very practical to paste into a spreadsheet. Also for educational reasons, will pick a smaller subset.

Sampling Data

We are going to pick a random sample of 16 rows from the 49 574 rows we have loaded in. To do that we will randomly shuffle the row indices from 1 to 49 574.

julia> using Random

julia> rows = shuffle(1:nrow(pz))

We can then pick the first 16 rows of these shuffled rows to get 16 random rows from our original data:

julia> sample = pz[rows[1:16]…

Ditch Excel and Use Julia Data Frames

Manipulating and visualizing pizza sales data using Julia DataFrames.jl and Plots.jl

Loading Data

Sampling Data

Written by Erik Engheim

Responses (2)