Skip to content

Commit d4083fa

Browse files
deploy: fa4ef11
1 parent c5e8f26 commit d4083fa

25 files changed

+754
-25
lines changed

.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 9d1f7b70fec6bd8c24ba447faf1f9688
3+
config: 027b59af90ed4610bb3f89fdddac8264
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"provenance": [],
7+
"collapsed_sections": [
8+
"7874DDJAneOV"
9+
]
10+
},
11+
"kernelspec": {
12+
"name": "python3",
13+
"display_name": "Python 3"
14+
},
15+
"language_info": {
16+
"name": "python"
17+
}
18+
},
19+
"cells": [
20+
{
21+
"cell_type": "markdown",
22+
"source": [
23+
"# Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach\n",
24+
"\n",
25+
"Vipasha Bansal\n",
26+
"\n",
27+
"ACL2024 SRW\n",
28+
"\n",
29+
"Abstract:\n",
30+
"\n",
31+
"> Deep semantic representations are useful for many NLU tasks (Droganova and Zeman, 2019; Schuster and Manning, 2016). Manual annotation to build these representations is time-consuming, and so automatic approaches are preferred (Droganova and Zeman, 2019; Bender et al. 2015). This paper demonstrates how rich semantic representations can be automatically derived for Thai Serial Verb Constructions (SVCs), where the semantic relationship between component verbs is not immediately clear from the surface forms. I present the first fully-implemented, unified analysis for Thai SVCs, deriving appropriate semantic representations (MRS; Copestake et al. 2005) from syntactic features, implemented within a DELPH-IN computational grammar (Slayden 2009). This analysis increases verified coverage of SVCs by 73% and decreases ambiguity by 46%.\n",
32+
"\n",
33+
"GitHub: https://github.com/VipashaB94/ThaiGrammar\n",
34+
"\n",
35+
"Paper: [Wait]\n",
36+
"\n",
37+
"\n",
38+
"\n",
39+
"---\n",
40+
"\n",
41+
"\n",
42+
"\n",
43+
"This notebook will guide you in running Thai semantic representations from the work.\n",
44+
"\n",
45+
"The notebook created by Wannaphong Phatthiyaphaibun, PyThaiNLP."
46+
],
47+
"metadata": {
48+
"id": "tG27M8_sn0xy"
49+
}
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"source": [
54+
"## Install\n",
55+
"\n",
56+
"Get latest ACE from http://sweaglesw.org/linguistics/ace/"
57+
],
58+
"metadata": {
59+
"id": "7874DDJAneOV"
60+
}
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": 1,
65+
"metadata": {
66+
"colab": {
67+
"base_uri": "https://localhost:8080/"
68+
},
69+
"id": "N4yiJEb-ms6w",
70+
"outputId": "261e926a-eacd-4649-cbee-f9e61476179a"
71+
},
72+
"outputs": [
73+
{
74+
"output_type": "stream",
75+
"name": "stdout",
76+
"text": [
77+
"--2024-08-12 12:50:35-- http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz\n",
78+
"Resolving sweaglesw.org (sweaglesw.org)... 216.129.123.154, 2001:1868:a100:105:beae:c5ff:fe24:d767\n",
79+
"Connecting to sweaglesw.org (sweaglesw.org)|216.129.123.154|:80... connected.\n",
80+
"HTTP request sent, awaiting response... 200 OK\n",
81+
"Length: 2526613 (2.4M) [application/x-gzip]\n",
82+
"Saving to: ‘ace-0.9.34-x86-64.tar.gz’\n",
83+
"\n",
84+
"ace-0.9.34-x86-64.t 100%[===================>] 2.41M 4.37MB/s in 0.6s \n",
85+
"\n",
86+
"2024-08-12 12:50:36 (4.37 MB/s) - ‘ace-0.9.34-x86-64.tar.gz’ saved [2526613/2526613]\n",
87+
"\n",
88+
"ace-0.9.34/\n",
89+
"ace-0.9.34/LICENSE\n",
90+
"ace-0.9.34/post/\n",
91+
"ace-0.9.34/post/english-postagger.hmm\n",
92+
"ace-0.9.34/erg-files/\n",
93+
"ace-0.9.34/erg-files/config.tdl\n",
94+
"ace-0.9.34/erg-files/ace-erg-qc.txt\n",
95+
"ace-0.9.34/RELEASE-NOTES\n",
96+
"ace-0.9.34/ace\n",
97+
"ace-0.9.34/doc/\n",
98+
"ace-0.9.34/doc/config.wiki\n",
99+
"ace-0.9.34/doc/options.wiki\n",
100+
"Cloning into 'ThaiGrammar'...\n",
101+
"remote: Enumerating objects: 199, done.\u001b[K\n",
102+
"remote: Counting objects: 100% (199/199), done.\u001b[K\n",
103+
"remote: Compressing objects: 100% (160/160), done.\u001b[K\n",
104+
"remote: Total 199 (delta 66), reused 0 (delta 0), pack-reused 0\u001b[K\n",
105+
"Receiving objects: 100% (199/199), 1.39 MiB | 2.53 MiB/s, done.\n",
106+
"Resolving deltas: 100% (66/66), done.\n",
107+
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
108+
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m186.8/186.8 kB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
109+
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.4/43.4 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
110+
"\u001b[?25h Building wheel for progress (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
111+
]
112+
}
113+
],
114+
"source": [
115+
"!wget http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz\n",
116+
"!mkdir run_ace\n",
117+
"!tar -xvzf ace-0.9.34-x86-64.tar.gz -C run_ace\n",
118+
"!git clone https://github.com/VipashaB94/ThaiGrammar.git\n",
119+
"!pip install -q pydelphin"
120+
]
121+
},
122+
{
123+
"cell_type": "markdown",
124+
"source": [
125+
"## Usage\n",
126+
"\n",
127+
"We use ACE for delphin."
128+
],
129+
"metadata": {
130+
"id": "aXjtb9ftnsF1"
131+
}
132+
},
133+
{
134+
"cell_type": "code",
135+
"source": [
136+
"from delphin import ace"
137+
],
138+
"metadata": {
139+
"id": "fvE8146VnJKP"
140+
},
141+
"execution_count": 2,
142+
"outputs": []
143+
},
144+
{
145+
"cell_type": "code",
146+
"source": [
147+
"ace.compile('./ThaiGrammar/thaigrammar/ace/config.tdl', 'thai.dat',executable=\"./run_ace/ace-0.9.34/ace\")"
148+
],
149+
"metadata": {
150+
"id": "-6opwCGVnTYH"
151+
},
152+
"execution_count": 3,
153+
"outputs": []
154+
},
155+
{
156+
"cell_type": "code",
157+
"source": [
158+
"response = ace.parse('thai.dat', 'สุรี ไป ซื้อ หนังสือ',executable=\"./run_ace/ace-0.9.34/ace\")\n",
159+
"response['results']"
160+
],
161+
"metadata": {
162+
"colab": {
163+
"base_uri": "https://localhost:8080/"
164+
},
165+
"id": "wlEN8Q-jnWQN",
166+
"outputId": "ff7c4502-df3a-4f5c-fab0-23a745d87b85"
167+
},
168+
"execution_count": 4,
169+
"outputs": [
170+
{
171+
"output_type": "execute_result",
172+
"data": {
173+
"text/plain": [
174+
"[{'result-id': 0,\n",
175+
" 'derivation': '(328 subj-head 0.000000 0 4 (322 bare-np 0.000000 0 1 (5 สุรี_33142 0.000000 0 1 (\"สุรี\"))) (327 head-comp 0.000000 1 4 (324 drop-obj 0.000000 1 2 (323 deic-purpose-trans-svc-lex 0.000000 1 2 (6 ไป_4158 0.000000 1 2 (\"ไป\")))) (326 head-comp 0.000000 2 4 (7 ซื้อ_4236 0.000000 2 3 (\"ซื้อ\")) (325 bare-np 0.000000 3 4 (8 หนังสือ_4404 0.000000 3 4 (\"หนังสือ\"))))))',\n",
176+
" 'mrs': '[ LTOP: h0 INDEX: e2 [ e SF: prop ] RELS: < [ named_rel<-1:-1> LBL: h4 CARG: \"สุรี\" ARG0: x3 ] [ \"exist_q_rel\"<-1:-1> LBL: h6 ARG0: x3 RSTR: h7 BODY: h8 ] [ \"_go_v_1_rel\"<-1:-1> LBL: h1 ARG0: e9 ARG1: x3 ARG2: x10 [ x COG-ST: type-id ] ] [ \"purpose_rel\"<-1:-1> LBL: h1 ARG0: e2 ARG1: e9 ARG2: e11 ] [ \"_buy_v_1_rel\"<-1:-1> LBL: h1 ARG0: e11 ARG1: x3 ARG2: x12 [ x PERS: 3 ] ] [ \"_book_n_1_rel\"<-1:-1> LBL: h13 ARG0: x12 ] [ \"exist_q_rel\"<-1:-1> LBL: h14 ARG0: x12 RSTR: h15 BODY: h16 ] > HCONS: < h0 qeq h1 h7 qeq h4 > ICONS: < > ]',\n",
177+
" 'tree': '(\"S\" (\"NP\" (\"N\" (\"สุรี\"))) (\"VP\" (\"V\" (\"V-M\" (\"V\" (\"ไป\")))) (\"VP\" (\"V\" (\"ซื้อ\")) (\"NP\" (\"N\" (\"หนังสือ\"))))))',\n",
178+
" 'flags': [(':ascore', 0.0), (':probability', 1.0)]}]"
179+
]
180+
},
181+
"metadata": {},
182+
"execution_count": 4
183+
}
184+
]
185+
},
186+
{
187+
"cell_type": "code",
188+
"source": [
189+
"response = ace.parse('thai.dat', 'ผม จะ เป็น คน ดี',executable=\"./run_ace/ace-0.9.34/ace\")\n",
190+
"response['results']"
191+
],
192+
"metadata": {
193+
"colab": {
194+
"base_uri": "https://localhost:8080/"
195+
},
196+
"id": "BD8ZQezQnbCM",
197+
"outputId": "a521789f-8e8a-419b-be14-6877ec1d1677"
198+
},
199+
"execution_count": 5,
200+
"outputs": [
201+
{
202+
"output_type": "execute_result",
203+
"data": {
204+
"text/plain": [
205+
"[{'result-id': 0,\n",
206+
" 'derivation': '(603 subj-head 0.000000 0 5 (598 bare-np 0.000000 0 1 (6 ผม_4375 0.000000 0 1 (\"ผม\"))) (602 head-comp 0.000000 1 5 (7 จะ_33089 0.000000 1 2 (\"จะ\")) (601 head-comp 0.000000 2 5 (8 เป็น_33088 0.000000 2 3 (\"เป็น\")) (600 bare-np 0.000000 3 5 (599 head-adj-int 0.000000 3 5 (12 คน_4133 0.000000 3 4 (\"คน\")) (13 ดี_4290 0.000000 4 5 (\"ดี\")))))))',\n",
207+
" 'mrs': '[ LTOP: h0 INDEX: e2 [ e TENSE: fut SF: prop ] RELS: < [ \"pron_rel\"<-1:-1> LBL: h4 ARG0: x3 [ x PERS: 1 NUM: sg GEND: m SPECI: + ] ] [ \"exist_q_rel\"<-1:-1> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ] [ \"_be_v_id_rel\"<-1:-1> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8 [ x PERS: 3 ] ] [ \"_person_n_1_rel\"<-1:-1> LBL: h9 ARG0: x8 ] [ \"_good_a_1_rel\"<-1:-1> LBL: h9 ARG0: e10 ARG1: x8 ] [ \"exist_q_rel\"<-1:-1> LBL: h11 ARG0: x8 RSTR: h12 BODY: h13 ] > HCONS: < h0 qeq h1 h6 qeq h4 h12 qeq h9 > ICONS: < > ]',\n",
208+
" 'tree': '(\"S\" (\"NP\" (\"N\" (\"ผม\"))) (\"VP\" (\"V\" (\"จะ\")) (\"VP\" (\"V\" (\"เป็น\")) (\"NP\" (\"N\" (\"N\" (\"คน\")) (\"ADJ\" (\"ดี\")))))))',\n",
209+
" 'flags': [(':ascore', 0.0), (':probability', 1.0)]}]"
210+
]
211+
},
212+
"metadata": {},
213+
"execution_count": 5
214+
}
215+
]
216+
}
217+
]
218+
}

_static/documentation_options.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
var DOCUMENTATION_OPTIONS = {
22
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
3-
VERSION: 'thai2plot-24-g4d48db9',
3+
VERSION: 'thai2plot-25-gfa4ef11',
44
LANGUAGE: 'en',
55
COLLAPSE_INDEX: false,
66
BUILDER: 'html',

genindex.html

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="utf-8" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6-
<title>Index &mdash; pythainlp-tutorials thai2plot-24-g4d48db9 documentation</title>
6+
<title>Index &mdash; pythainlp-tutorials thai2plot-25-gfa4ef11 documentation</title>
77
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
88
<link rel="stylesheet" type="text/css" href="_static/css/theme.css" />
99
<link rel="stylesheet" type="text/css" href="_static/style.css" />
@@ -58,6 +58,7 @@
5858
<li class="toctree-l1"><a class="reference internal" href="notebooks/spaCy_PyThaiNLP_demo.html">spaCy-PyThaiNLP</a></li>
5959
<li class="toctree-l1"><a class="reference internal" href="notebooks/text_classification.html">Wongnai Review Classification</a></li>
6060
<li class="toctree-l1"><a class="reference internal" href="notebooks/text_generation.html">Thai Wiki Language Model for Text Generation</a></li>
61+
<li class="toctree-l1"><a class="reference internal" href="notebooks/thai_semantic_representations.html">Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach</a></li>
6162
<li class="toctree-l1"><a class="reference internal" href="notebooks/thai_wav2vec2_onnx.html">Thai Wav2vec2 model to ONNX model</a></li>
6263
<li class="toctree-l1"><a class="reference internal" href="notebooks/wangchanberta_getting_started_aireseach.html">WangchanBERTa: Getting Started Notebook</a></li>
6364
<li class="toctree-l1"><a class="reference internal" href="notebooks/word2vec_examples.html">Thai2Vec Embeddings Examples</a></li>

index.html

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
55

66
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
7-
<title>Welcome to PyThaiNLP Tutorials &mdash; pythainlp-tutorials thai2plot-24-g4d48db9 documentation</title>
7+
<title>Welcome to PyThaiNLP Tutorials &mdash; pythainlp-tutorials thai2plot-25-gfa4ef11 documentation</title>
88
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
99
<link rel="stylesheet" type="text/css" href="_static/css/theme.css" />
1010
<link rel="stylesheet" type="text/css" href="_static/style.css" />
@@ -60,6 +60,7 @@
6060
<li class="toctree-l1"><a class="reference internal" href="notebooks/spaCy_PyThaiNLP_demo.html">spaCy-PyThaiNLP</a></li>
6161
<li class="toctree-l1"><a class="reference internal" href="notebooks/text_classification.html">Wongnai Review Classification</a></li>
6262
<li class="toctree-l1"><a class="reference internal" href="notebooks/text_generation.html">Thai Wiki Language Model for Text Generation</a></li>
63+
<li class="toctree-l1"><a class="reference internal" href="notebooks/thai_semantic_representations.html">Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach</a></li>
6364
<li class="toctree-l1"><a class="reference internal" href="notebooks/thai_wav2vec2_onnx.html">Thai Wav2vec2 model to ONNX model</a></li>
6465
<li class="toctree-l1"><a class="reference internal" href="notebooks/wangchanberta_getting_started_aireseach.html">WangchanBERTa: Getting Started Notebook</a></li>
6566
<li class="toctree-l1"><a class="reference internal" href="notebooks/word2vec_examples.html">Thai2Vec Embeddings Examples</a></li>
@@ -107,6 +108,7 @@ <h1>Welcome to PyThaiNLP Tutorials<a class="headerlink" href="#welcome-to-pythai
107108
<li class="toctree-l1"><a class="reference internal" href="notebooks/spaCy_PyThaiNLP_demo.html">spaCy-PyThaiNLP</a></li>
108109
<li class="toctree-l1"><a class="reference internal" href="notebooks/text_classification.html">Wongnai Review Classification</a></li>
109110
<li class="toctree-l1"><a class="reference internal" href="notebooks/text_generation.html">Thai Wiki Language Model for Text Generation</a></li>
111+
<li class="toctree-l1"><a class="reference internal" href="notebooks/thai_semantic_representations.html">Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach</a></li>
110112
<li class="toctree-l1"><a class="reference internal" href="notebooks/thai_wav2vec2_onnx.html">Thai Wav2vec2 model to ONNX model</a></li>
111113
<li class="toctree-l1"><a class="reference internal" href="notebooks/wangchanberta_getting_started_aireseach.html">WangchanBERTa: Getting Started Notebook</a></li>
112114
<li class="toctree-l1"><a class="reference internal" href="notebooks/word2vec_examples.html">Thai2Vec Embeddings Examples</a></li>

notebooks/Han-Coref.html

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
55

66
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
7-
<title>🪿 Han-Coref: Thai Coreference resolution by PyThaiNLP &mdash; pythainlp-tutorials thai2plot-24-g4d48db9 documentation</title>
7+
<title>🪿 Han-Coref: Thai Coreference resolution by PyThaiNLP &mdash; pythainlp-tutorials thai2plot-25-gfa4ef11 documentation</title>
88
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
99
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />
1010
<link rel="stylesheet" type="text/css" href="../_static/nbsphinx-code-cells.css" />
@@ -64,6 +64,7 @@
6464
<li class="toctree-l1"><a class="reference internal" href="spaCy_PyThaiNLP_demo.html">spaCy-PyThaiNLP</a></li>
6565
<li class="toctree-l1"><a class="reference internal" href="text_classification.html">Wongnai Review Classification</a></li>
6666
<li class="toctree-l1"><a class="reference internal" href="text_generation.html">Thai Wiki Language Model for Text Generation</a></li>
67+
<li class="toctree-l1"><a class="reference internal" href="thai_semantic_representations.html">Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach</a></li>
6768
<li class="toctree-l1"><a class="reference internal" href="thai_wav2vec2_onnx.html">Thai Wav2vec2 model to ONNX model</a></li>
6869
<li class="toctree-l1"><a class="reference internal" href="wangchanberta_getting_started_aireseach.html">WangchanBERTa: Getting Started Notebook</a></li>
6970
<li class="toctree-l1"><a class="reference internal" href="word2vec_examples.html">Thai2Vec Embeddings Examples</a></li>

notebooks/Thai_Dependency_Parser.html

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
55

66
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
7-
<title>Thai Dependency Parser &mdash; pythainlp-tutorials thai2plot-24-g4d48db9 documentation</title>
7+
<title>Thai Dependency Parser &mdash; pythainlp-tutorials thai2plot-25-gfa4ef11 documentation</title>
88
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
99
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />
1010
<link rel="stylesheet" type="text/css" href="../_static/nbsphinx-code-cells.css" />
@@ -64,6 +64,7 @@
6464
<li class="toctree-l1"><a class="reference internal" href="spaCy_PyThaiNLP_demo.html">spaCy-PyThaiNLP</a></li>
6565
<li class="toctree-l1"><a class="reference internal" href="text_classification.html">Wongnai Review Classification</a></li>
6666
<li class="toctree-l1"><a class="reference internal" href="text_generation.html">Thai Wiki Language Model for Text Generation</a></li>
67+
<li class="toctree-l1"><a class="reference internal" href="thai_semantic_representations.html">Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach</a></li>
6768
<li class="toctree-l1"><a class="reference internal" href="thai_wav2vec2_onnx.html">Thai Wav2vec2 model to ONNX model</a></li>
6869
<li class="toctree-l1"><a class="reference internal" href="wangchanberta_getting_started_aireseach.html">WangchanBERTa: Getting Started Notebook</a></li>
6970
<li class="toctree-l1"><a class="reference internal" href="word2vec_examples.html">Thai2Vec Embeddings Examples</a></li>

0 commit comments

Comments
 (0)