Legos

Author

Krystian Korzec

Published

January 9, 2023

Intoduction

My attempt to recreate David Robinson’s analysis of lego dataset (tidyTuesday from 2022-09-06)

Load datasets

First lets read data!

import pandas as pd

inventories = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-09-06/inventories.csv.gz')
inventory_sets = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-09-06/inventory_sets.csv.gz')
sets = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-09-06/sets.csv.gz')
import os, re
# get all files
files = os.listdir('../../data-screencasts/lego-data/') 
# concatenate files names and directory path
files_paths = ['../../data-screencasts/lego-data/' + file for file in files] 
# create names for data frames
names = [re.sub(r'\.csv.gz','', name) for name in files]
# load dataframes into dictionary
lego_datasets = {key: pd.read_csv(file) for (key, file) in zip(names, files_paths)}
inventories.value_counts('set_num')
set_num
657-2        16
659-2        12
666-1        11
266-2        11
264-2        11
             ..
6397-1        1
6396384-1     1
6396-1        1
6395-1        1
vwkit-1       1
Name: count, Length: 32348, dtype: int64
from IPython.display import Markdown
from tabulate import tabulate

table = inventory_sets.head()

Markdown(tabulate(
  table,
  headers="keys"
))
inventory_id set_num quantity
0 35 75911-1 1
1 35 75912-1 1
2 39 75048-1 1
3 39 75053-1 1
4 50 4515-1 1