Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformatting the lexicon files from usas to tsv #1

Merged
merged 59 commits into from
Nov 15, 2021
Merged
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
2aaa4fc
License and some text around file format
apmoore1 Oct 20, 2021
51b46a9
Added more headers
apmoore1 Oct 20, 2021
a82465c
Added formatting
apmoore1 Oct 20, 2021
d2b3d28
Single Word Lexicon file format
apmoore1 Oct 20, 2021
26c00a1
MWE file format
apmoore1 Oct 20, 2021
158e589
Welsh changed to TSV and Python script for testing lexicon file format
apmoore1 Oct 20, 2021
52f4afa
Changed , to .
apmoore1 Oct 20, 2021
84ff9db
Chinese MWE and single semantic lexicon in new format
apmoore1 Oct 20, 2021
ce89235
French single semantic lexicon in new format
apmoore1 Oct 20, 2021
9c199bd
Stated the changes made so far
apmoore1 Oct 20, 2021
57ad3e4
Arabic single semantic lexicon in new format
apmoore1 Oct 20, 2021
0d288fc
Czech single semantic lexicon in new format
apmoore1 Oct 20, 2021
10145d7
Dutch single semantic lexicon in new format
apmoore1 Oct 20, 2021
032b247
Finish single semantic lexicon in new format
apmoore1 Oct 21, 2021
0d1e146
Italian single and MWE semantic lexicon in new format
apmoore1 Oct 21, 2021
ecfe827
Malay single semantic lexicon in new format
apmoore1 Oct 21, 2021
863f698
Portuguese single and MWE semantic lexicon in new format
apmoore1 Oct 21, 2021
f0e07a1
Changed MWE file names
apmoore1 Oct 21, 2021
3e0fb1a
Russain single and MWE semantic lexicon in new format
apmoore1 Oct 21, 2021
700ed26
Spainsh single and MWE semantic lexicon in new format
apmoore1 Oct 21, 2021
f551d4c
Swedish single and MWE semantic lexicon in new format
apmoore1 Oct 21, 2021
8bb98f5
Urdu single semantic lexicon in new format
apmoore1 Oct 21, 2021
1dcb3cb
Swedish single and MWE semantic lexicon in new format
apmoore1 Oct 21, 2021
bd9232c
Added header names
apmoore1 Oct 21, 2021
4b86c36
Correct line removed and original line kept
apmoore1 Nov 5, 2021
146dece
Updated changelog
apmoore1 Nov 5, 2021
cdfb438
Made the test_collection.py script stricter
apmoore1 Nov 6, 2021
6133b74
Resolved tab space issue
apmoore1 Nov 6, 2021
1b4e53c
Added APA citation format
apmoore1 Nov 9, 2021
d1eaf07
Script that tests the format of all of the lexicon files
apmoore1 Nov 9, 2021
fdae9b6
Whether spelling mistake corrected
apmoore1 Nov 9, 2021
3efaa9b
Added POS tag intj
apmoore1 Nov 9, 2021
01b0be6
Lexicon resource statistics
apmoore1 Nov 10, 2021
9c32a0f
Language resources reformatted and contain BCP 47 language codes
apmoore1 Nov 10, 2021
e782604
New meta data format explained
apmoore1 Nov 11, 2021
06a9521
Updated with new lexicon resource meta data
apmoore1 Nov 11, 2021
2cc04c1
Added BCP 47 code to header of meta data statistics table
apmoore1 Nov 11, 2021
9746657
The changlog line removed was fixed with commit 3efaa9b
apmoore1 Nov 11, 2021
3ca5b56
Removed the token column as it was a duplicate of the lemma column
apmoore1 Nov 11, 2021
65aaf9b
Removed the token column as it was a duplicate of the lemma column
apmoore1 Nov 11, 2021
9322a48
Removed id column
apmoore1 Nov 11, 2021
e16fd12
Renamed column from unknown to feature
apmoore1 Nov 11, 2021
7b35d95
Contributing guidelines
apmoore1 Nov 12, 2021
d4381dd
Arabic text file
apmoore1 Nov 12, 2021
56f573d
All text versions of the lexicons
apmoore1 Nov 12, 2021
b8f3e83
Removed tabs that are not used in the header row
apmoore1 Nov 15, 2021
b55ffd8
Text to TSV file conversion
apmoore1 Nov 15, 2021
7e2fccd
Environment name issue resolved
apmoore1 Nov 15, 2021
f5206d7
Converted TXT to TSV
github-actions[bot] Nov 15, 2021
f788e49
Corrected removal mistake
apmoore1 Nov 15, 2021
3ef2f05
Converted TXT to TSV
github-actions[bot] Nov 15, 2021
91ff639
Formatting
apmoore1 Nov 15, 2021
c9e3ab0
All major changes
apmoore1 Nov 15, 2021
21ac563
Merge branch 'master' of github.com:apmoore1/Multilingual-USAS
apmoore1 Nov 15, 2021
b5ad6b9
stop it from failing when nothing to commit
apmoore1 Nov 15, 2021
307e2f4
Corrected if statement syntax
apmoore1 Nov 15, 2021
8dc65f6
Add CI badge
apmoore1 Nov 15, 2021
85b8579
Automatic lexicon statistics
apmoore1 Nov 15, 2021
daf017b
Updated lexicon statistics table
github-actions[bot] Nov 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Updated changelog
apmoore1 committed Nov 5, 2021
commit 146decefd9df3922731050d9294cf9d14c779e08
2 changes: 1 addition & 1 deletion Changelog.md
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@ MWE = Multi Word Expression
1. Changed the file extension of all semantic lexicon files from `.usas` to `.tsv` and added the relevant header names e.g. `lemma` and `semantic_tags` for single word lexicons and `mwe_template` and `semantic_tags` for MWE lexicons in accordance with the Lexicon File Format section within the [README.md](./README.md)
2. Added a [License file](./LICENSE) which contains the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This file was added so that on GitHub under the About section in the right hand corner there is a link that says `View License` which will direct users to this [License file](./LICENSE).
3. Reformated the [README.md](./README.md) so that it contains more structure. New content is added around File formats of single and MWE lexicon files.
4. Within the [MWE lexicon for Chinese](./Chinese/mwe-chi.tsv). Line 119 and 101 was removed as it contained a MWE template but no USAS tags. The MWE templates were `平顶_noun 女帽_noun` and `一_num 法郎_msr N3.1/I1` respectively.
4. Within the [MWE lexicon for Chinese](./Chinese/mwe-chi.tsv). Line 119 and 101 was removed as it contained a MWE template but no USAS tags. The MWE templates were `平顶_noun 女帽_noun` and `一_num 水盆_noun O2/O1.2` respectively.
5. Within the [single lexicon for Dutch](./Dutch/semantic_lexicon_dut.tsv). Line 218 we have added a tab so that the POS entry is now blank/None as no POS information existed, without adding this extra tab the TSV file would not be valid. The changed meant on line 218 it went from `alstublieft E4.2+ E2+ X7+ Z4` to `alstublieft E4.2+ E2+ X7+ Z4`
6. For the [Malay single lexicon](./Malay/semantic_laexicon_ms.tsv), I tested to see if the first column was a repeat of the second column using the [test_token_is_equal_to_lemma.py python script](./test_token_is_equal_to_lemma.py) and it was. The first, `token`, and the second column, `lemma`, both represented the `lemma` field. In addition the [Malay single lexicon](./Malay/semantic_laexicon_ms.tsv) also contained a `POS` column that only contained the word `POS` therefore this column was also removed.
7. Created a scripts section within the [README.md](./README.md), the scripts section contains explanations on what the new python scripts do and how to use them.