Preprocessing ERE #16

zhou6140919 · 2023-05-07T01:02:08Z

I ran the script preprocessing/process_ere.py and I discovered that the amount of sentences in train.w1.oneie.json (12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.

So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt')). An error occurs when processing this line entity.char_offsets_to_token_offsets(tokens), only a few docs. Ignoring all errors, I got 18895, but still not the same.

The text was updated successfully, but these errors were encountered:

ej0cl6 · 2023-11-16T21:06:27Z

Notice that sometimes package version matters as well. So please check if you packages match ours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing ERE #16

Preprocessing ERE #16

zhou6140919 commented May 7, 2023

ej0cl6 commented Nov 16, 2023

Preprocessing ERE #16

Preprocessing ERE #16

Comments

zhou6140919 commented May 7, 2023

ej0cl6 commented Nov 16, 2023