Aichi Prefecture Traffic Accident Data to Parquet Converter

Overview

This project contains a script to convert the Aichi Prefecture traffic accident data into Parquet format. The Parquet format offers significant performance advantages for analytical queries compared to other formats like CSV.

Data Source

The source data used is non-public and provided by the Aichi Prefectural Police. To use this script, set the environment variables DATA_DIR for the directory containing the source data and OUTPUT_DIR for the directory where the output data will be stored. If these environment variables are not set, the script will use internal and data directories by default, respectively.

How to Read the Data

After converting the source data to Parquet format, you can read the data using various tools and programming languages that support Parquet. Below are examples in R and Python:

R Example

library(arrow)
library(sf)

file_name <- "traffic-accidents-2021.parquet"
data_frame <- read_parquet(file_name)
traffic_accidents <- st_as_sf(
  data_frame,
  coords = c("longitude", "latitude"),
  crs    = 4326
)

Python Example

import pandas as pd
import geopandas as gpd

file_name = "traffic-accidents-2021.parquet"
data_frame = pd.read_parquet(file_name)
traffic_accidents = gpd.GeoDataFrame(
  data_frame,
  geometry=gpd.points_from_xy(data_frame.longitude, data_frame.latitude),
  crs="EPSG:4326"
)

Performance Comparison of CSV and Parquet Files

The benchmark compares read speed of CSV and Parquet files using dataset of traffic accidents recorded in 2021. The dataset, originally an Excel file (26,314 KB), was converted to CSV and Parquet formats for analysis.

Format	Size on Disk (KB)	Average Read Time (ms)	Standard Deviation (ms)
CSV	93,736	725.68	73.30
Parquet	1,767	74.77	24.64

The benchmarks were conducted in R using the microbenchmark package, with results averaged over 100 runs. Tests were performed on a machine with the following specifications:

Processor: 11th Gen Intel Core i7-11370H @ 3.30GHz
Memory: 32.0 GB RAM
Storage: SSD
Operating System: Windows

Licence

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
R		R
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
aichi-traffic-accident-parquet-converter.Rproj		aichi-traffic-accident-parquet-converter.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aichi Prefecture Traffic Accident Data to Parquet Converter

Overview

Data Source

How to Read the Data

R Example

Python Example

Performance Comparison of CSV and Parquet Files

Licence

About

Languages

License

NONONOexe/aichi-traffic-accident-parquet-converter

Folders and files

Latest commit

History

Repository files navigation

Aichi Prefecture Traffic Accident Data to Parquet Converter

Overview

Data Source

How to Read the Data

R Example

Python Example

Performance Comparison of CSV and Parquet Files

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Languages