Visualizing Big Datasets: Tools, Pitfalls, Experimental Example

Talk about Data visualization in Science. Based on my experience with ratCAVE project and suggested approaches in Python I created a talk for my fellow MSNE students. The talk covers main problems with use of scatter plot for big, convolved data and explains how to address it.

Summary:

What should we keep in mind, when working with big datasets? In case of Scatter plots - 3 hyperparameters:

overplotting - avoid obscuring the data
saturation - look howmany points overlapping cause saturation of intensity points
undersampling - taking a subset might not be an answer

Or instead you can work with Heatmaps and remember to address following problems (1 hyperparameter):

undersaturation
pick the color map in accordance to the

Talk explains how to get from left to right: impretable visualization of datasets.

Presented on 01.06.2018 at the retreat for Master of Science in Neuroengineering students.

Installing

To run jupyter notebook as slides I used:

RISE

The talk was based on the use of:

pandas
seaborn
datashader

Acknowledgments

Nicholas A. Del Grosso - for supervision and inspiration for this talk
Mohammad Bashiri - for feedback

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
.gitignore		.gitignore
README.md		README.md
data_talk.html		data_talk.html
data_talk.ipynb		data_talk.ipynb
data_talk.slides.html		data_talk.slides.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualizing Big Datasets: Tools, Pitfalls, Experimental Example

Summary:

Installing

Acknowledgments

About

Releases

Packages

Languages

alTeska/data_visualization_talk

Folders and files

Latest commit

History

Repository files navigation

Visualizing Big Datasets: Tools, Pitfalls, Experimental Example

Summary:

Installing

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages