Skip to content

Latest commit

 

History

History
63 lines (39 loc) · 1.92 KB

README.md

File metadata and controls

63 lines (39 loc) · 1.92 KB

Exam metadata generation and ingest for DSpace

The Old Exams Repository is maintained by the University of Toronto Libraries. It contains the 3 most recent years of exams.

System Requirements


Installation

Clone or download the scripts to your local repository. Ensure you have a the pre-requistie software installed before running the scripts.

You must run step1.py before running step2.py, there are more details below about the usage and workflow.


Usage

  1. python step1.py /directory_path_to_pdf_exams/ campus[A, B or C]

  2. python step2.py '/directory_path_to_pdf_exams/


Workflow

1. Scanning & Filenaming

  • Exams are scanned into PDF with file names
  • Each PDF file must contain the course code, month and year.
  • DSpace Dublin Core metadata are generated based on each PDF's filename.

Example: Campus C, they should use "au" for August and "ap" for April to properly distinguish these two months.

detailed exam file naming convention found here

2. Generate metadata

  • Once exams are received in PDF format from campuses A, B or C file metadata is generated
  • Dublin Core metadata is generated from the file names using beautiful soup
  • The script also uses a CSV file of departmental codes per campus for mapping

sample generated metadata file found here

3. DSpace Simple Archive

  • step2.py script is used to package the PDFs and metadata into DSpace simple archives for ingest

4. Batch Import Into TSpace

  • DSpace simple archives are imported into their respective collections via batch import
  • Collections older than 3 years old are removed

License

DSpace Simple Archives Importer is licensed under Apache License 2.0.