RecordSearch

Current version: v1.1.1

This repository contains Jupyter notebooks to work with data from the National Archives of Australia's RecordSearch database.

RecordSearch is the online collection database of the National Archives of Australia. Based on the series system, RecordSearch provides rich, contextual information about series, items, agencies, and functions.

Unfortunately RecordSearch doesn't provide access to machine-readable data through an API, so we have to resort to screen scraping. The notebooks here make use of the RecordSearch Data Scraper.

See the RecordSearch section of the GLAM Workbench for more details.

Notebook topics

Harvesting data

Harvest items from a search in RecordSearch – save the results of an item search in RecordSearch as a downloadable dataset, you can also save images and PDFs from digitised files
Harvest files with the access status of 'closed' – find out what we're not allowed to see by harvesting details of 'closed' files
Harvest recently digitised files from RecordSearch – save details of files digitised in the past month
Harvest details of all series in RecordSearch – get details of all series registered in RecordSearch, also generates a summary dataset with the total number of items digitised, described and in each access category
Harvesting functions from the RecordSearch interface – extract information from the RecordSearch interface about the hierarchy of functions it uses to describe the work of government agencies
Harvest agencies associated with all functions – loops through the list of functions saving details of the agencies associated with each

Analysing data

Exploring harvested series data, 2021 – generates some basic statistics from the harvest of series data
Exploring harvested series data, 2022 – generates some basic statistics from the harvest of series data in 2022 and compares the results to the previous year
Summary of records digitised in the previous week – run this notebook to analyse the most recent dataset of recently digitised files, summarising the results by series
How many of the functions are actually used? – looks at the harvest of functions to see how many are actually in use
Who's responsible? – pick a function to which which agencies are have been responsible for it over time

Useful tools

DIY Redaction Art Collages – generates a random sample of ASIO redactions and packs them into one big image
Download the contents of a digitised file – get a digitised files as a folder full of images
Get a list of agencies associated with a function - pick a function and create a downloadable list of agencies responsible for it
DFAT Cable Finder – helps you find numbered cables created by DFAT

Data downloads

Summary data about all series in RecordSearch, May 2021 (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category.
Summary data about all series in RecordSearch, April 2022 (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category.
Recently digitised files (CSV) – containing details of files digitised between 25 February and 26 March 2021, for an ongoing record of digitised files see this repository which creates weekly snapsots.

Run these notebooks

There are a number of different ways to use these notebooks. Binder is quickest and easiest, but it doesn't save your data. I've listed the options below from easiest to most complicated (requiring more technical knowledge).

Using Binder

Click on the button above to launch the notebooks in this repository using the Binder service (it might take a little while to load). This is a free service, but note that sessions will close if you stop using the notebooks, and no data will be saved. Make sure you download any changed notebooks or harvested data that you want to save.

See the Using Binder section of the GLAM Workbench for more details.

Using Reclaim Cloud

Reclaim Cloud is a paid hosting service, aimed particularly at supported digital scholarship in hte humanities. Unlike Binder, the environments you create on Reclaim Cloud will save your data – even if you switch them off! To run this repository on Reclaim Cloud for the first time:

Create a Reclaim Cloud account and log in.
Click on the button above to start the installation process.
A dialogue box will ask you to set a password, this is used to limit access to your Jupyter installation.
Sit back and wait for the installation to complete!
Once the installation is finished click on the 'Open in Browser' button of your newly created environment (note that you might need to wait a few minutes before everything is ready).

See the Using Reclaim Cloud section of the GLAM Workbench for more details.

Using the Nectar Research Cloud

The Nectar Research Cloud (part of the Australian Research Data Commons) provides cloud computing services to researchers in Australian and New Zealand universities. Any university-affiliated researcher can log on to Nectar and receive up to 6 months of free cloud computing time. And if you need more, you can apply for a specific project allocation.

The GLAM Workbench is available in the Nectar Cloud as a pre-configured application. This means you can get it up and going without worrying about the technical infrastructure – just fill in a few details and you're away! To create an instance of this repository in the Nectar Cloud:

Log in to the Nectar Dashboard using your university credentials.
From the Dashboard choose Applications -> Browse Local.
Enter 'GLAM' in the filter box and hit Enter, you should see the GLAM Workbench application.
Click on the GLAM Workbench application's Quick Deploy button.
Step through the various configuration options. Some options are only available if you have a dedicated project allocation.
When asked to select a GLAM Workbench repository, choose 'Reccordsearch' from the dropdown.
Complete the configuration and deploy your GLAM Workbench instance.
The url to access your instance will be displayed once it's ready. Click on the url!

See Using Nectar for more details.

Using Docker

You can use Docker to run a pre-built computing environment on your own computer. It will set up everything you need to run the notebooks in this repository. This is free, but requires more technical knowledge – you'll have to install Docker on your computer, and be able to use the command line.

Install Docker Desktop.
Create a new directory for this repository and open it from the command line.

From the command line, run the following command:

docker run -p 8888:8888 --name recordsearch -v "$PWD":/home/jovyan/work quay.io/glamworkbench/recordsearch repo2docker-entrypoint jupyter lab --ip 0.0.0.0 --NotebookApp.token='' --LabApp.default_url='/lab/tree/index.ipynb'

It will take a while to download and configure the Docker image. Once it's ready you'll see a message saying that Jupyter Notebook is running.
Point your web browser to http://127.0.0.1:8888

See the Using Docker section of the GLAM Workbench for more details.

Setting up on your own computer

If you know your way around the command line and are comfortable installing software, you might want to set up your own computer to run these notebooks.

Assuming you have recent versions of Python and Git installed, the steps might be something like:

Create a virtual environment, eg: python -m venv recordsearch
Open the new directory" cd recordsearch
Activate the environment source bin/activate
Clone the repository: git clone https://github.com/GLAM-Workbench/recordsearch.git notebooks
Open the new notebooks directory: cd notebooks
Install the necessary Python packages: pip install -r requirements.txt
Run Jupyter: jupyter lab

See the GLAM Workbench for more details.

Cite as

See the GLAM Workbench or Zenodo for up-to-date citation details.

This repository is part of the GLAM Workbench.
If you think this project is worthwhile, you might like to sponsor me on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
.jupyter/lab/user-settings/@jupyterlab/docmanager-extension		.jupyter/lab/user-settings/@jupyterlab/docmanager-extension
binder		binder
data		data
images		images
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.zenodo.json		.zenodo.json
Find_cables.ipynb		Find_cables.ipynb
LICENSE		LICENSE
README.md		README.md
aggregate_closed_harvests.ipynb		aggregate_closed_harvests.ipynb
cables_data.json		cables_data.json
closed-files-2020-analysis.ipynb		closed-files-2020-analysis.ipynb
display_agencies_by_function.ipynb		display_agencies_by_function.ipynb
diy_redaction_collage.ipynb		diy_redaction_collage.ipynb
fact_sheet_cull.ipynb		fact_sheet_cull.ipynb
get_agencies_associated_with_function.ipynb		get_agencies_associated_with_function.ipynb
get_all_agencies_by_function.ipynb		get_all_agencies_by_function.ipynb
get_images_from_a_digitised_file.ipynb		get_images_from_a_digitised_file.ipynb
harvest_closed_files.ipynb		harvest_closed_files.ipynb
harvest_recently_digitised_files.ipynb		harvest_recently_digitised_files.ipynb
harvest_series_data.ipynb		harvest_series_data.ipynb
harvesting_functions_from_recordsearch.ipynb		harvesting_functions_from_recordsearch.ipynb
harvesting_items_from_a_search.ipynb		harvesting_items_from_a_search.ipynb
how_many_functions_are_used.ipynb		how_many_functions_are_used.ipynb
index.ipynb		index.ipynb
index.md		index.md
jupyter_config.json		jupyter_config.json
postBuild		postBuild
pyproject.toml		pyproject.toml
recently_digitised_update.ipynb		recently_digitised_update.ipynb
reclaim-manifest.jps		reclaim-manifest.jps
redactions-citation.jpg		redactions-citation.jpg
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
runtime.txt		runtime.txt
series_harvest_basic_stats.ipynb		series_harvest_basic_stats.ipynb
series_harvest_basic_stats_2022.ipynb		series_harvest_basic_stats_2022.ipynb
series_totals_April_2022.csv		series_totals_April_2022.csv
series_totals_May_2021.csv		series_totals_May_2021.csv
test_and_lint.sh		test_and_lint.sh
update_version.sh		update_version.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecordSearch

Notebook topics

Harvesting data

Analysing data

Useful tools

Data downloads

Run these notebooks

Using Binder

Using Reclaim Cloud

Using the Nectar Research Cloud

Using Docker

Setting up on your own computer

Cite as

About

Releases 4

Sponsor this project

Packages

Contributors 2

Languages

License

GLAM-Workbench/recordsearch

Folders and files

Latest commit

History

Repository files navigation

RecordSearch

Notebook topics

Harvesting data

Analysing data

Useful tools

Data downloads

Run these notebooks

Using Binder

Using Reclaim Cloud

Using the Nectar Research Cloud

Using Docker

Setting up on your own computer

Cite as

About

Resources

License

Stars

Watchers

Forks

Releases 4

Sponsor this project

Packages 0

Contributors 2

Languages

Packages