Skip to content

Latest commit

 

History

History
109 lines (85 loc) · 11.9 KB

README.md

File metadata and controls

109 lines (85 loc) · 11.9 KB

DHOxSS - Text to Tech

Materials for the Text to Tech workshop at the Digital Humanities Oxford Summer School by Kaspar von Beelen, Mariona Coll Ardanuy and Federico Nanni.

Google Colab

The workshop will mostly rely on Google Colab for the hands-on activities.

Day 1

  • Welcome slides
  • Intro to Python (a) Open In Colab
  • Intro to Python (b) Open In Colab
  • Intro to Python (c) Open In Colab
  • Functions Open In Colab

Day 2

  • Opening Files Open In Colab
  • Basic Text Processing Open In Colab
  • Regular Expressions Open In Colab
  • List, sets and tuples Open In Colab
  • Dictionaries and JSON Open In Colab
  • Text Processing Exercises Open In Colab
  • Data Structures Exercises Open In Colab

Day 3

  • Libraries Open In Colab
  • Working with tabular data Open In Colab
  • Exercises on Information Retrieval Open In Colab

Day 4

  • Introduction to Machine Learning for NLP: slides
  • Intro to NLP (1) Open In Colab
  • Intro to NLP (2) Open In Colab
  • Intro to NLP (3) Open In Colab
  • Intro to NLP (4) Open In Colab
  • Introduction to Language Modelling: slides
  • Word embeddings (1) Open In Colab
  • Word embeddings (2) Open In Colab

Day 5

  • Introduction to Foundation Models and Transfer Learning: slides
  • Transformers for NLP Open In Colab
  • Introduction to Generative AI: slides
  • Poking LLMs with HuggingFace Open In Colab
  • Using local LLMs Open In Colab

Local installation

Our entire course will be on Google Colab. If you want to set up the notebooks locally on your machine, these are the instructions. However bear in mind that some of the tools might not work well on certain old laptops (especially from Day 4 onwards).

  • Install Anaconda
  • Download the content of this repository and unzip
  • Open Anaconda Navigator
  • From Anaconda, create environment py39
  • Install JupyterLab in environment
  • Launch JupyterLab
  • Open terminal in Jupyter Lab
  • Write the following in the terminal, step-by-step:
    • conda activate py39
    • Update pip: pip install --upgrade pip
    • Change directory using the cd command in the terminal until you are in the course folder. There you should run: pip install -r requirements.txt
    • Add the environment to Jupyter (following instructions from here) or by running ipython kernel install --user --name=py39 Then you can already start using the notebooks: select as kernel py39 (restart JupyterLab if the correct kernel does not show)

You find more detailed instructions here.

Data

Datasets used:

  • The Living Machines atypical animacy dataset, freely available here.
  • MuSe: The Musical Sentiment Dataset Muse
  • A historical dataset on popular baby names in the United States from 1880 onwards. Available here.
  • A sample of British Library 19th Century Books collected from here.
  • A sample of British Newspapers articles, digitized by Heritage Made Digital.

Background reading (optional):

Advanced reading list (optional):

Other Resources

This course is based upon many previous resources. Apart from the ones above:

Resources mentioned during the workshop: slides