We can use for
loops, if
statements, and dicts
to group data.
from pprint import pprint
cars = [
{"model": "Yaris", "make": "Toyota", "color": "red"},
{"model": "Auris", "make": "Toyota", "color": "red"},
{"model": "Camry", "make": "Toyota", "color": "green"},
{"model": "Prius", "make": "Toyota", "color": "yellow"},
{"model": "Civic", "make": "Honda", "color": "red"},
{"model": "Model 3", "make": "Tesla", "color": "red"}
]
cars_by_make = {}
for car in cars:
make = car['make']
if make in cars_by_make:
cars_by_make[make].append(car)
else:
cars_by_make[make] = [car]
pprint(cars_by_make)
This should output:
{'Honda': [{'make': 'Honda', 'model': 'Civic'}],
'Tesla': [{'make': 'Tesla', 'model': 'Model 3'}],
'Toyota': [{'make': 'Toyota', 'model': 'Yaris'},
{'make': 'Toyota', 'model': 'Auris'},
{'make': 'Toyota', 'model': 'Camry'},
{'make': 'Toyota', 'model': 'Prius'}]}
number_of_cars_by_make = {}
for car in cars:
make = car['make']
if make in number_of_cars_by_make:
number_of_cars_by_make[make] += 1
else:
number_of_cars_by_make[make] = 1
pprint(number_of_cars_by_make)
This should output:
{'Honda': 1, 'Tesla': 1, 'Toyota': 4}
If you're manipulating tabular data in Python, it may be a good idea to use the Pandas library. This provides an abstraction called a "DataFrame" (you may be familiar with this if you've used other statistical programming languages like R. It is basically just a representation of a spreadsheet table, but in Python.
If you use Python's Pandas library for data manipulation and analysis instead, the code for the above assignment would look like this: https://gist.github.com/AlJohri/59c9762845519f999eb28fe45276f4c1
- Read
vegtables.csv
into a variable calledvegtables
. - Group
vegtables
bycolor
as a variablevegtables_by_color
. - Output
vegtables_by_color
into a json calledvegtables_by_color.json
.