Member-only story
Ditch Excel and Use Julia Data Frames
Manipulating and visualizing pizza sales data using Julia DataFrames.jl and Plots.jl

In this story we will look at pizza sales data found here:
https://vincentarelbundock.github.io/Rdatasets/csv/gt/pizzaplace.csv
This kind of data can be manipulated in a spreadsheet application such as Excel and using data frames popular in languages such as R, Python (Pandas) and Julia (DataFrames.jl).
Loading Data
First we will load the data in Julia and pick a subset (id, name, size and price) of columns in the table to work with:
using DataFrames, CSV
url = "https://vincentarelbundock.github.io/Rdatasets/csv/gt/pizzaplace.csv"
filename = download(url)
all_pizzas = CSV.read(filename, DataFrame)
# Get rid of column with row numbers
all_pizzas = all_pizzas[:, 2:end]
# Pick most interesting columns
pz = select(all_pizzas, :id, :name, :size, :price)
We can look at the first view rows to see what this looks like in the Julia REPL (Read Evaluate Program Loop):
julia> first(pz, 4)
4×4 DataFrame
│ Row │ id │ name │ size │ price │
│ │ String │ String │ String │ Float64 │
├─────┼─────────────┼─────────────┼────────┼─────────┤
│ 1 │ 2015-000001 │ hawaiian │ M │ 13.25 │
│ 2 │ 2015-000002 │ classic_dlx │ M │ 16.0 │
│ 3 │ 2015-000002 │ mexicana │ M │ 16.0 │
│ 4 │ 2015-000002 │ thai_ckn │ L │ 20.75 │
julia> nrow(pz)
49574
However we are currently looking at the first 4 rows. But as you can see there are almost 50 thousand rows in this dataset so not very practical to paste into a spreadsheet. Also for educational reasons, will pick a smaller subset.
Sampling Data
We are going to pick a random sample of 16 rows from the 49 574 rows we have loaded in. To do that we will randomly shuffle the row indices from 1 to 49 574.
julia> using Random
julia> rows = shuffle(1:nrow(pz))
We can then pick the first 16 rows of these shuffled rows to get 16 random rows from our original data:
julia> sample = pz[rows[1:16]…