Do you work with data in spreadsheets? This JHU Data Services workshop will teach you how to process and standardize your tabular data efficiently and reproducibly using OpenRefine. OpenRefine is a free, open-source tool with a graphical user interface (GUI) to clean and organize data – no coding required! The bulk of this 2.5-hour workshop will be a hands-on tutorial processing a dataset in OpenRefine.
After taking this workshop, participants will:
- Understand the importance of processing and standardizing data
- Be able to carry out several transformations in OpenRefine to standardize data for further analysis
- Leave with a test project that can be used to practice further data analysis or learn advanced features of OpenRefine, such as working with APIs
Website: dataservices.library.jhu.edu/
Contact us: dataservices@jhu.edu
JHU Data Services, part of the Johns Hopkins University Sheridan Libraries, helps the JHU community find, use, visualize, manage, and share data. We offer live webinars and self-paced online trainings on computational research and coding, GIS, data management, data visualization, and more. See all of our training topics on our website.
This repository contains materials for one of our live webinars open to JHU students, faculty, and staff. Please contact us with any questions.
As of March 2020, Data Services workshops are being held virtually on Zoom. See our calendar to register for upcoming workshops.
Before the class, be sure to download OpenRefine from their website. You will use OpenRefine in a web browser; we recommend Google Chrome or Microsoft Edge.
- Data: This folder contains raw data files to be used during hands-on activities in the workshop:
- workshop_data_nuforc.csv: user-entered data about UFO sightings, from the National UFO Reporting Center (NUFORC), nuforc.org
- In-ClassScripts: This folder contains additional files you will need for the workshop:
- OpenRefine_WorkshopGuide.docx
- PresentationMaterials: This folder contains PowerPoint slides and other presentation materials used in the workshop
- Resources: This folder contains cheatsheets to assist you during the workshop and links to external sources for you to continue your learning
If you have taken the live webinar for this class, please take this survey: https://www.surveymonkey.com/r/openrefine
The presentation materials are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0), attributable to Data Services, Johns Hopkins University.
See LICENSE file for code licensing and re-use information.
The images, external resources, and cheat sheets shared in this repository may have other licenses and terms of use.
Please cite this material as:
Johns Hopkins University Data Services. [Date of workshop]. Data Cleaning in OpenRefine. https://github.com/jhu-data-services/data-cleaning-openrefine