Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss

Jeffrey Lu, Ivan Rodriguez. 2023.

DOI: https://doi.org/10.48550/arXiv.2304.01046

Project Structure

|--- dataset
    |--- raw_data
    |--- cleaned
|--- preprocess
    |--- preprocess.py
    |--- run_cleaning.py
|--- models
tuner.py
README.md

Preprocessing

Text data is minimally cleaned before being reshaped and tokenized appropriately for the various pretrained models.

Data is either mixed or unmixed. Mixed data preserves the ordering of answers in the raw data. Unmixed data minimally reorders answers such that the correct answer is always presented last. Both mixed and unmixed data are generated for each configuration.

Cleaning can be completed by executing the following command:

python preprocess/run_cleaning.py

Final preprocessing is completed by the trainer.

Pretrained Models

Our pretrained models are from Huggingface Transformers. We used the following models:

ALBERT (albert-base-v2, albert-xxlarge-v2)
RoBERTa (roberta-large)
BERT (bert-base-uncased)
DistilBert (distilbert-base-uncased)

Note that BERT training was completed in Google Colaboratory, and is therefore not covered in this repository. BERT training and tuning can be easily added by importing and configuring the BERT model from Huggingface using the same method that all the other models were imported and configured.

Training

Our training was completed on the TPU configuration available in Google Colab (8 TPU v2) and in the TPU v3-8 VMs available from the Google TPU Research Cloud (TRC) program. This codebase may be easily adjusted to run on non-TPU machines.

Our results were extracted directly from the tuner. The tuner can be run using the following command, replacing items in brackets with the appropriate choices:

python tuner.py [baseline|polytuplet] [0|1|2] [mixed|unmixed]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss

Project Structure

Preprocessing

Pretrained Models

Training

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
dataset		dataset
models		models
preprocess		preprocess
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
tuner.py		tuner.py

License

USSiamaboat/polytuplet-loss

Folders and files

Latest commit

History

Repository files navigation

Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss

Project Structure

Preprocessing

Pretrained Models

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages