Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Green-Wood/CoMER
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: revidee/CoMER
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Able to merge. These branches can be automatically merged.
  • 7 commits
  • 23 files changed
  • 2 contributors

Commits on Oct 20, 2022

  1. Bump pytorch to 1.12, pytorch-lightning to 1.7.7, many changes due to…

    … package bumps - untested on gpus as of now
    Marc Ahlers committed Oct 20, 2022
    Copy the full SHA
    3389ed1 View commit details

Commits on Oct 25, 2022

  1. Add support for DDP Strategy in new config format

    Marc Ahlers committed Oct 25, 2022
    Copy the full SHA
    8f5e44b View commit details

Commits on Oct 30, 2022

  1. refactor: add typings and splitted crohme datamodule code into multip…

    …le files in preparation to support HME100K
    Marc Ahlers committed Oct 30, 2022
    Copy the full SHA
    2b16585 View commit details
  2. feat: Add splitting CROHME into labeled / unlabeled sets. Move batch …

    …creation from the datamodule to the batch module. Move & refactor default supervised datamodule into the variants module.
    Marc Ahlers committed Oct 30, 2022
    Copy the full SHA
    9a08a92 View commit details

Commits on May 16, 2023

  1. Copy the full SHA
    6bf9446 View commit details

Commits on May 20, 2023

  1. Update README.md typo

    revidee authored May 20, 2023
    Copy the full SHA
    f36283f View commit details

Commits on Aug 4, 2023

  1. Update README.md

    revidee authored Aug 4, 2023
    Copy the full SHA
    ea5a611 View commit details
63 changes: 49 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,40 @@
<div align="center">

# CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
# CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
## With Semi-Supervised Learning Methods and HME100K Dataset support

Original Paper

[![arXiv](https://img.shields.io/badge/arXiv-2207.04410-b31b1b.svg)](https://arxiv.org/abs/2207.04410)

Master Thesis adding different Methods

[Patrec @ TU Dortmund](https://patrec.cs.tu-dortmund.de/) - [Thesis](http://web.patrec.cs.tu-dortmund.de/pubs/theses/ma_ahlers.pdf)
</div>

This repo expands the official implementation of [CoMER](https://github.com/Green-Wood/CoMER) and adds

- Support for PyTorch 1.13 and PyTorch Lightning 1.19
- Improved Beam-Search with Pruning Methods from [Freitag et al (Beam Search Strategies for Neural Machine Translation)](https://arxiv.org/abs/1702.01806)
- With the addition of Constant Pruning
- Improves inference speeds on CROHME19 Trainingdata roughly 7-fold
- Self-Training Methods like [FixMatch](https://arxiv.org/abs/2001.07685)
- Calibration Methods via learnable [Temperature Scaling](https://arxiv.org/abs/1706.04599) and [LogitNorm](https://arxiv.org/abs/2205.09310)
- Multiple new Confidence Measures for further improving Calibration
- [RandAug](https://arxiv.org/abs/1909.13719) with two augmentation lists, the modified version being better suited for long formulae
- Support for different vocabularies
- Support for synthetic Pre-Training with a generated NTCIR12 MathIR Dataset
- Support for [HME100K Dataset](https://arxiv.org/abs/2203.01601)
- A partial-labeling heuristic to replace a hard threshold while filtering generated pseudo-labels
- Multi-GPU Evaluation support
- Evaluation with Augmentations
- Tools \& Scripts to Visualize the data, Test the implementation and benchmark the modified beam-search

The Features are included in branches:
- [feature/ssl](https://github.com/revidee/CoMER/tree/feature/ssl), no helpers (visualization), no HME100K support
- [feature/ssl_hme](https://github.com/revidee/CoMER/tree/feature/ssl_hme), no helpers (visualization)
- [feature/ssl_helpers](https://github.com/revidee/CoMER/tree/feature/ssl_helpers), no HME100K support

## Project structure
```bash
├── README.md
@@ -34,22 +63,23 @@
## Install dependencies
```bash
cd CoMER
# install project
# install project
# python >= 3.7 required. Tested with 3.7 & 3.10
conda create -y -n CoMER python=3.7
conda activate CoMER
conda install pytorch=1.8.1 torchvision=0.2.2 cudatoolkit=11.1 pillow=8.4.0 -c pytorch -c nvidia
# training dependency
conda install pytorch-lightning=1.4.9 torchmetrics=0.6.0 -c conda-forge
# install pytorch >= 1.8 & torchvision >= 0.2 with cudatoolkit / rocm.
conda install pytorch=1.8.1 torchvision=0.2.2 cudatoolkit=11.1 -c pytorch -c nvidia
pip install -e .
# evaluating dependency
conda install pandoc=1.19.2.1 -c conda-forge
pip install -e .

```

## Training
Next, navigate to CoMER folder and run `train.py`. It may take **7~8** hours on **4** NVIDIA 2080Ti gpus using ddp.
```bash
# train CoMER(Fusion) model using 4 gpus and ddp
python train.py --config config.yaml
# train CoMER(Fusion) model using 2 gpus and ddp
python train.py -c config.yaml fit
```

You may change the `config.yaml` file to train different models
@@ -71,17 +101,22 @@ cross_coverage: true
self_coverage: true
```
For single gpu user, you may change the `config.yaml` file to
For _single_ `gpu` usage, you may edit the `config.yaml`:
```yaml
gpus: 1
# gpus: 4
# accelerator: ddp
accelerator: 'gpu'
devices: 0
```

For _single_ `cpu` user, you may edit the `config.yaml`:
```yaml
accelerator: 'cpu'
# devices: 0
```

## Evaluation
Metrics used in validation during the training process is not accurate.

For accurate metrics reported in the paper, please use tools officially provided by CROHME 2019 oganizer:
For accurate metrics reported in the paper, please use tools officially provided by CROHME 2019 organizer:

A trained CoMER(Fusion) weight checkpoint has been saved in `lightning_logs/version_0`

@@ -96,4 +131,4 @@ unzip -q data.zip
# evaluate model in lightning_logs/version_0 on all CROHME test sets
# results will be printed in the screen and saved to lightning_logs/version_0 folder
bash eval_all.sh 0
```
```
12 changes: 1 addition & 11 deletions comer/datamodule/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1 @@
from .datamodule import Batch, CROHMEDatamodule
from .vocab import vocab

vocab_size = len(vocab)

__all__ = [
"CROHMEDatamodule",
"vocab",
"Batch",
"vocab_size",
]
from comer.datamodule.crohme.variants.supervised import CROHMESupvervisedDatamodule
18 changes: 18 additions & 0 deletions comer/datamodule/crohme/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from .entry import DataEntry, extract_data_entries
from .batch import Batch, BatchTuple, build_dataset
from .dataset import CROHMEDataset
from .variants.supervised import CROHMESupvervisedDatamodule
from .vocab import vocab

vocab_size = len(vocab)


__all__ = [
"CROHMESupvervisedDatamodule",
"CROHMEDataset",
"Batch",
"BatchTuple",
"build_dataset",
"vocab",
"vocab_size",
]
185 changes: 185 additions & 0 deletions comer/datamodule/crohme/batch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
from dataclasses import dataclass
from typing import List, Tuple, Callable, Any
from zipfile import ZipFile

import numpy as np
import torch
from torch import FloatTensor, LongTensor

from .entry import extract_data_entries, DataEntry
from .vocab import vocab


@dataclass
class Batch:
img_bases: List[str] # [b,]
imgs: FloatTensor # [b, 1, H, W]
mask: LongTensor # [b, H, W]
indices: List[List[int]] # [b, l]

def __len__(self) -> int:
return len(self.img_bases)

def to(self, device) -> "Batch":
return Batch(
img_bases=self.img_bases,
imgs=self.imgs.to(device),
mask=self.mask.to(device),
indices=self.indices,
)


# A BatchTuple represents a single batch which contains 3 lists of equal length (batch-len)
# [file_names, images, labels]
BatchTuple = Tuple[List[str], List[np.ndarray], List[List[str]]]


# Creates a Batch of (potentially) annotated images which pads & masks the images, s.t. they fit into a single tensor.
def create_batch_from_lists(file_names: List[str], images: List[np.ndarray], labels: List[List[str]]) -> Batch:
assert (len(file_names) == len(images) == len(images))
labels_as_word_indices = [vocab.words2indices(x) for x in labels]

heights_x = [s.size(1) for s in images]
widths_x = [s.size(2) for s in images]

n_samples = len(images)
max_height_x = max(heights_x)
max_width_x = max(widths_x)

x = torch.zeros(n_samples, 1, max_height_x, max_width_x)
x_mask = torch.ones(n_samples, max_height_x, max_width_x, dtype=torch.bool)
for idx, img in enumerate(images):
x[idx, :, : heights_x[idx], : widths_x[idx]] = img
x_mask[idx, : heights_x[idx], : widths_x[idx]] = 0

return Batch(file_names, x, x_mask, labels_as_word_indices)


# change according to your GPU memory
MAX_SIZE = 32e4


def build_batch_split_from_entries(
data: np.ndarray[Any, np.dtype[DataEntry]],
batch_size: int,
batch_imagesize: int = MAX_SIZE,
maxlen: int = 200,
max_imagesize: int = MAX_SIZE,
unlabeled_factor: int = 0,
) -> Tuple[List[BatchTuple], List[BatchTuple]]:
total_len = len(data)

random_idx_order = np.arange(total_len, dtype=int)
np.random.shuffle(random_idx_order)

if unlabeled_factor < 0:
unlabeled_factor = 0

labeled_end = total_len // (unlabeled_factor + 1)

return (
# labeled batches
build_batches_from_samples(
data[random_idx_order[:labeled_end]],
batch_size,
batch_imagesize,
maxlen,
max_imagesize
),
# unlabeled batches
build_batches_from_samples(
data[random_idx_order[labeled_end:]],
batch_size,
batch_imagesize,
maxlen,
max_imagesize
),
)


def build_batches_from_samples(
data: np.ndarray[Any, np.dtype[DataEntry]],
batch_size: int,
batch_imagesize: int = MAX_SIZE,
maxlen: int = 200,
max_imagesize: int = MAX_SIZE
) -> List[BatchTuple]:
if data.shape[0] == 0:
return list()
next_batch_file_names: List[str] = []
next_batch_images: List[np.ndarray] = []
next_batch_labels: List[List[str]] = []

total_fname_batches: List[List[str]] = []
total_feature_batches: List[List[np.ndarray]] = []
total_label_batches: List[List[List[str]]] = []

biggest_image_size = 0
get_entry_image_pixels: Callable[[DataEntry], int] = lambda x: x.image.size[0] * x.image.size[1]

# Sort the data entries via numpy by total pixel count and use the sorted indices to create a sorted array-view.
data_sorted = data[
np.argsort(
np.vectorize(get_entry_image_pixels)(data)
)
]

i = 0

for entry in data_sorted:
size = get_entry_image_pixels(entry)
image_arr = np.array(entry.image)
if size > biggest_image_size:
biggest_image_size = size
batch_image_size = biggest_image_size * (i + 1)
if len(entry.label) > maxlen:
print("label", i, "length bigger than", maxlen, "ignore")
elif size > max_imagesize:
print(
f"image: {entry.file_name} size: {image_arr.shape[0]} x {image_arr.shape[1]} = {size} bigger than {max_imagesize}, ignore"
)
else:
if batch_image_size > batch_imagesize or i == batch_size:
# a batch is full, add it to the "batch"-list and reset the current batch with the new entry.
total_fname_batches.append(next_batch_file_names)
total_feature_batches.append(next_batch_images)
total_label_batches.append(next_batch_labels)
# reset current batch
i = 0
biggest_image_size = size
next_batch_file_names = []
next_batch_images = []
next_batch_labels = []
# add the entry to the current batch
next_batch_file_names.append(entry.file_name)
next_batch_images.append(image_arr)
next_batch_labels.append(entry.label)
i += 1

# add last batch if it isn't empty
if len(next_batch_file_names) > 0:
total_fname_batches.append(next_batch_file_names)
total_feature_batches.append(next_batch_images)
total_label_batches.append(next_batch_labels)

print("total ", len(total_feature_batches), "batch data loaded")
return list(
# Zips batches into a 3-Tuple Tuple[ List[str] , List[np.ndarray], List[List[str]] ]
# Per batch: file_names, images , labels
zip(
total_fname_batches,
total_feature_batches,
total_label_batches
)
)



def build_dataset(
archive: ZipFile,
folder: str,
batch_size: int,
unlabeled_factor: int = 0,
) -> Tuple[List[BatchTuple], List[BatchTuple]]:
return build_batch_split_from_entries(extract_data_entries(archive, folder), batch_size,
unlabeled_factor=unlabeled_factor)
15 changes: 10 additions & 5 deletions comer/datamodule/dataset.py → comer/datamodule/crohme/dataset.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from typing import List

import torchvision.transforms as tr
from torch.utils.data.dataset import Dataset

from .transforms import ScaleAugmentation, ScaleToLimitRange
from comer.datamodule.crohme import BatchTuple
from comer.datamodule.utils.transforms import ScaleAugmentation, ScaleToLimitRange

K_MIN = 0.7
K_MAX = 1.4
@@ -13,7 +16,9 @@


class CROHMEDataset(Dataset):
def __init__(self, ds, is_train: bool, scale_aug: bool) -> None:
ds: List[BatchTuple]

def __init__(self, ds: List[BatchTuple], is_train: bool, scale_aug: bool) -> None:
super().__init__()
self.ds = ds

@@ -28,11 +33,11 @@ def __init__(self, ds, is_train: bool, scale_aug: bool) -> None:
self.transform = tr.Compose(trans_list)

def __getitem__(self, idx):
fname, img, caption = self.ds[idx]
file_names, images, labels = self.ds[idx]

img = [self.transform(im) for im in img]
images = [self.transform(im) for im in images]

return fname, img, caption
return file_names, images, labels

def __len__(self):
return len(self.ds)
File renamed without changes.
Loading