Skip to content

Latest commit

 

History

History
75 lines (58 loc) · 1.63 KB

README.org

File metadata and controls

75 lines (58 loc) · 1.63 KB

Some experiments with UD vs ERG

the data

To profiles from ERG where used:

Matrix

MRS sentences are almost identical to the MATRIX with 7 differences:

% diff matrix-en.sent mrs-en.sent
20c20
< Cats bark.
---
> Cats go.
22c22
< Some bark.
---
> Some went.
53c53
< Chased dogs bark.
---
> Chased dogs go.
55c55
< That the cat chases Browne is old.
---
> That the cat chases Browne is obvious.
62,64c62,64
< Browne's barks.
< Twenty three dogs bark.
< Two hundred twenty dogs bark.
---
> Browne's goes.
> Twenty three dogs go.
> Two hundred twenty dogs go.
79c79
< Abrams promised Browne to bark.
---
> Abrams promised Browne to go.
97c97
< The cats found a way to bark.
---
> The cats found a way to go.

CSLI

http://svn.delph-in.net/erg/trunk/tsdb/gold/csli/

The CSLI dataset https://w3.ual.es/~nperdu/hpsuite.htm (where the sentences are grouped by syntatic constructions). This page is replicated here in csli.txt. Note that some sentences are not grammatically correct (marked with *).

Translations was done using IBM Language Translator Service https://www.ibm.com/watson/services/language-translator/

conllu files

all processed with udpipe using the portuguese-bosque-ud-2.5-191206.udpipe and english-ewt-ud-2.5-191206.udpipe models.

udpipe --tokenize --tag --parse MODEL FILE.sent > FILE.conllu

plans

Some of these sentences were copied to

https://tatoeba.org/eng/sentences_lists/show/166576