Skip to content

Commit

Permalink
Update Freqlist materials with the R code and graphic.
Browse files Browse the repository at this point in the history
  • Loading branch information
gederajeg committed Jul 22, 2024
1 parent 0a7dfb7 commit bf48557
Show file tree
Hide file tree
Showing 6 changed files with 94 additions and 91 deletions.
4 changes: 2 additions & 2 deletions 01-freqlist.html
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,7 @@ <h2>Two basic types of frequency</h2>
<p>absolute vs.&nbsp;relative</p>
</blockquote>
<ul>
<li><p>absolut frequency:</p>
<li><p>absolute frequency:</p>
<ul>
<li>real, observed freq. of an item in the (sub)corpus</li>
</ul></li>
Expand Down Expand Up @@ -539,7 +539,7 @@ <h2>Two basic types of frequency</h2>
<div class="tab-content">
<div id="tabset-1-1">
<p>Important to know how to compute relative frequency!</p>
<p>Sometimes (most of the time?), the corpus-software tool we use <strong>cannot</strong> do want we want.</p>
<p>Sometimes (most of the time?), the corpus-software tool we use <strong>cannot</strong> do what we want.</p>
</div>
<div id="tabset-1-2">
<div class="cell">
Expand Down
Binary file modified 01-freqlist.pdf
Binary file not shown.
4 changes: 2 additions & 2 deletions 01-freqlist.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ The estuary of the Bak Blau lake, on the Enggano island, Indonesia.

> absolute vs. relative
- absolut frequency:
- absolute frequency:

- real, observed freq. of an item in the (sub)corpus

Expand Down Expand Up @@ -197,7 +197,7 @@ $$

Important to know how to compute relative frequency!

Sometimes (most of the time?), the corpus-software tool we use **cannot** do want we want.
Sometimes (most of the time?), the corpus-software tool we use **cannot** do what we want.

### Examples

Expand Down
103 changes: 52 additions & 51 deletions 01-practice.html
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,16 @@ <h2 id="toc-title">Table of contents</h2>
<li><a href="#first-approach-the-basic-find-all-approach-then-filtering" id="toc-first-approach-the-basic-find-all-approach-then-filtering" class="nav-link" data-scroll-target="#first-approach-the-basic-find-all-approach-then-filtering"><span class="header-section-number">1.1</span> First approach: The basic, find all approach then filtering</a></li>
<li><a href="#second-approach-your-turn-the-basic-starting-with-feature" id="toc-second-approach-your-turn-the-basic-starting-with-feature" class="nav-link" data-scroll-target="#second-approach-your-turn-the-basic-starting-with-feature"><span class="header-section-number">1.2</span> Second approach (your turn): The basic, <code>Starting with</code> feature</a></li>
</ul></li>
<li><a href="#practice-2-windshield-vs.-windscreen" id="toc-practice-2-windshield-vs.-windscreen" class="nav-link" data-scroll-target="#practice-2-windshield-vs.-windscreen"><span class="header-section-number">2</span> Practice 2: <em>windshield</em> vs.&nbsp;<em>windscreen</em></a>
<li><a href="#practice-2-words-of-certain-word-classpart-of-speech-containing-certain-affixes" id="toc-practice-2-words-of-certain-word-classpart-of-speech-containing-certain-affixes" class="nav-link" data-scroll-target="#practice-2-words-of-certain-word-classpart-of-speech-containing-certain-affixes"><span class="header-section-number">2</span> Practice 2: Words of certain word-class/part-of-speech containing certain affixes</a>
<ul class="collapse">
<li><a href="#task" id="toc-task" class="nav-link" data-scroll-target="#task"><span class="header-section-number">2.1</span> Task</a></li>
<li><a href="#operationalisation" id="toc-operationalisation" class="nav-link" data-scroll-target="#operationalisation"><span class="header-section-number">2.2</span> Operationalisation:</a></li>
<li><a href="#results" id="toc-results" class="nav-link" data-scroll-target="#results"><span class="header-section-number">2.3</span> Results</a></li>
<li><a href="#requirement" id="toc-requirement" class="nav-link" data-scroll-target="#requirement"><span class="header-section-number">2.1</span> Requirement</a></li>
<li><a href="#task" id="toc-task" class="nav-link" data-scroll-target="#task"><span class="header-section-number">2.2</span> Task</a></li>
</ul></li>
<li><a href="#practice-3-words-of-certain-word-classpart-of-speech-containing-certain-affixes" id="toc-practice-3-words-of-certain-word-classpart-of-speech-containing-certain-affixes" class="nav-link" data-scroll-target="#practice-3-words-of-certain-word-classpart-of-speech-containing-certain-affixes"><span class="header-section-number">3</span> Practice 3: Words of certain word-class/part-of-speech containing certain affixes</a>
<li><a href="#practice-3-windshield-vs.-windscreen" id="toc-practice-3-windshield-vs.-windscreen" class="nav-link" data-scroll-target="#practice-3-windshield-vs.-windscreen"><span class="header-section-number">3</span> Practice 3: <em>windshield</em> vs.&nbsp;<em>windscreen</em></a>
<ul class="collapse">
<li><a href="#requirement" id="toc-requirement" class="nav-link" data-scroll-target="#requirement"><span class="header-section-number">3.1</span> Requirement</a></li>
<li><a href="#task-1" id="toc-task-1" class="nav-link" data-scroll-target="#task-1"><span class="header-section-number">3.2</span> Task</a></li>
<li><a href="#task-1" id="toc-task-1" class="nav-link" data-scroll-target="#task-1"><span class="header-section-number">3.1</span> Task</a></li>
<li><a href="#operationalisation" id="toc-operationalisation" class="nav-link" data-scroll-target="#operationalisation"><span class="header-section-number">3.2</span> Operationalisation:</a></li>
<li><a href="#results" id="toc-results" class="nav-link" data-scroll-target="#results"><span class="header-section-number">3.3</span> Results</a></li>
</ul></li>
<li><a href="#practice-4-frequency-of-lexical-verb-tags-in-american-ame-vs.-british-bre-english-for-the-press-editorial-ame-bre-vs.-press-reportage-ame-bre" id="toc-practice-4-frequency-of-lexical-verb-tags-in-american-ame-vs.-british-bre-english-for-the-press-editorial-ame-bre-vs.-press-reportage-ame-bre" class="nav-link" data-scroll-target="#practice-4-frequency-of-lexical-verb-tags-in-american-ame-vs.-british-bre-english-for-the-press-editorial-ame-bre-vs.-press-reportage-ame-bre"><span class="header-section-number">4</span> Practice 4: Frequency of lexical-verb tags in American (AmE) vs.&nbsp;British (BrE) English for the Press: Editorial (AmE &amp; BrE) vs.&nbsp;Press: Reportage (AmE &amp; BrE)</a>
<ul class="collapse">
Expand Down Expand Up @@ -221,10 +221,46 @@ <h3 data-number="1.2" class="anchored" data-anchor-id="second-approach-your-turn
</ol>
</section>
</section>
<section id="practice-2-windshield-vs.-windscreen" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="practice-2-windshield-vs.-windscreen"><span class="header-section-number">2</span> Practice 2: <em>windshield</em> vs.&nbsp;<em>windscreen</em></h2>
<section id="task" class="level3" data-number="2.1">
<h3 data-number="2.1" class="anchored" data-anchor-id="task"><span class="header-section-number">2.1</span> Task</h3>
<section id="practice-2-words-of-certain-word-classpart-of-speech-containing-certain-affixes" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="practice-2-words-of-certain-word-classpart-of-speech-containing-certain-affixes"><span class="header-section-number">2</span> Practice 2: Words of certain word-class/part-of-speech containing certain affixes</h2>
<p>We are still in the <code>BASIC</code> tab. We will explore the productivity (the number of different items/type frequency) of English adjectives containing suffixes. We focus on suffixes meaning ‘having a resemblance of’, particularly comparing -<em>esque</em> (which has a more specialised meaning of ‘in the style of ~’) and -<em>ish</em> <span class="citation" data-cites="bauer2022">(see <a href="#ref-bauer2022" role="doc-biblioref">Bauer 2022: 55</a>)</span>.</p>
<section id="requirement" class="level3" data-number="2.1">
<h3 data-number="2.1" class="anchored" data-anchor-id="requirement"><span class="header-section-number">2.1</span> Requirement</h3>
<ul>
<li><p>IMPORTANT: This involves results from two searches: one for each suffix</p></li>
<li><p>Later, in the output interface, look at the upper left corner to find basic quantitative information:</p>
<ul>
<li><p>the number of items (i.e., the type frequency)</p></li>
<li><p>the total frequency of these items (i.e., the token frequency especially of adjectives having these suffixes)</p></li>
</ul></li>
</ul>
</section>
<section id="task" class="level3" data-number="2.2">
<h3 data-number="2.2" class="anchored" data-anchor-id="task"><span class="header-section-number">2.2</span> Task</h3>
<ul>
<li><p>Conceptual aspect:</p>
<ul>
<li>try to intuite which suffix would be more productive, in terms of their type and token frequency, in the corpus we use (later check this intuition with the results).</li>
</ul></li>
<li><p>Operationalisation:</p>
<ul>
<li><p>How would you devise a targeted search using just the <code>BASIC</code> feature to retrieve <strong>only adjectives</strong> ending with these suffixes? 🤔</p></li>
<li><p>REMEMBER: you need to run two searches for each suffix</p></li>
</ul></li>
<li><p>Results:</p>
<ul>
<li><p>How many items are there for -<em>esque</em> and what is the total frequency of these -<em>esque</em> adjectives? (Answer key: <a href="https://ske.li/15k" class="uri">https://ske.li/15k</a>)</p></li>
<li><p>How many items are there for -<em>ish</em> and what is the total frequency of these -<em>ish</em> adjectives? (Answer key: <a href="https://ske.li/15l" class="uri">https://ske.li/15l</a>)</p></li>
<li><p>Which suffix is more productive (in terms of the total number of items) in this <code>Brown Family (CLAWS + TreeTagger tags)</code> corpus? Is your intuition supported by the data?</p></li>
</ul></li>
</ul>
</section>
</section>
<section id="practice-3-windshield-vs.-windscreen" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="practice-3-windshield-vs.-windscreen"><span class="header-section-number">3</span> Practice 3: <em>windshield</em> vs.&nbsp;<em>windscreen</em></h2>
<p>We are now in the <code>ADVANCED</code> tab of Wordlist.</p>
<section id="task-1" class="level3" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="task-1"><span class="header-section-number">3.1</span> Task</h3>
<ul>
<li><p>Conceptual aspect:</p>
<ul>
Expand All @@ -233,8 +269,8 @@ <h3 data-number="2.1" class="anchored" data-anchor-id="task"><span class="header
</ul></li>
</ul>
</section>
<section id="operationalisation" class="level3" data-number="2.2">
<h3 data-number="2.2" class="anchored" data-anchor-id="operationalisation"><span class="header-section-number">2.2</span> Operationalisation:</h3>
<section id="operationalisation" class="level3" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="operationalisation"><span class="header-section-number">3.2</span> Operationalisation:</h3>
<ul>
<li><p>try to check their frequency in ALL Brown Family first (think about how you would devise the search)</p></li>
<li><p>then, check their frequency in the combined region corpus.</p>
Expand All @@ -245,8 +281,8 @@ <h3 data-number="2.2" class="anchored" data-anchor-id="operationalisation"><span
</ul></li>
</ul>
</section>
<section id="results" class="level3" data-number="2.3">
<h3 data-number="2.3" class="anchored" data-anchor-id="results"><span class="header-section-number">2.3</span> Results</h3>
<section id="results" class="level3" data-number="3.3">
<h3 data-number="3.3" class="anchored" data-anchor-id="results"><span class="header-section-number">3.3</span> Results</h3>
<ul>
<li><p>Overall frequency in the WHOLE BROWN FAMILY of</p>
<ul>
Expand All @@ -265,41 +301,6 @@ <h3 data-number="2.3" class="anchored" data-anchor-id="results"><span class="hea
<p>This practice is inspired from Stefanowitsch <span class="citation" data-cites="stefanowitsch2020">(<a href="#ref-stefanowitsch2020" role="doc-biblioref">2020</a>)</span>.</p>
</section>
</section>
<section id="practice-3-words-of-certain-word-classpart-of-speech-containing-certain-affixes" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="practice-3-words-of-certain-word-classpart-of-speech-containing-certain-affixes"><span class="header-section-number">3</span> Practice 3: Words of certain word-class/part-of-speech containing certain affixes</h2>
<p>We are still in the <code>BASIC</code> tab. We will explore the productivity (the number of different items/type frequency) of English adjectives containing suffixes. We focus on suffixes meaning ‘having a resemblance of’, particularly comparing -<em>esque</em> (which has a more specialised meaning of ‘in the style of ~’) and -<em>ish</em> <span class="citation" data-cites="bauer2022">(see <a href="#ref-bauer2022" role="doc-biblioref">Bauer 2022: 55</a>)</span>.</p>
<section id="requirement" class="level3" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="requirement"><span class="header-section-number">3.1</span> Requirement</h3>
<ul>
<li><p>IMPORTANT: This involves results from two searches: one for each suffix</p></li>
<li><p>Later, in the output interface, look at the upper left corner to find basic quantitative information:</p>
<ul>
<li><p>the number of items (i.e., the type frequency)</p></li>
<li><p>the total frequency of these items (i.e., the token frequency especially of adjectives having these suffixes)</p></li>
</ul></li>
</ul>
</section>
<section id="task-1" class="level3" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="task-1"><span class="header-section-number">3.2</span> Task</h3>
<ul>
<li><p>Conceptual aspect:</p>
<ul>
<li>try to intuite which suffix would be more productive, in terms of their type and token frequency, in the corpus we use (later check this intuition with the results).</li>
</ul></li>
<li><p>Operationalisation:</p>
<ul>
<li><p>How would you devise a targeted search using just the <code>BASIC</code> feature to retrieve <strong>only adjectives</strong> ending with these suffixes? 🤔</p></li>
<li><p>REMEMBER: you need to run two searches for each suffix</p></li>
</ul></li>
<li><p>Results:</p>
<ul>
<li><p>How many items are there for -<em>esque</em> and what is the total frequency of these -<em>esque</em> adjectives? (Answer key: <a href="https://ske.li/15k" class="uri">https://ske.li/15k</a>)</p></li>
<li><p>How many items are there for -<em>ish</em> and what is the total frequency of these -<em>ish</em> adjectives? (Answer key: <a href="https://ske.li/15l" class="uri">https://ske.li/15l</a>)</p></li>
<li><p>Which suffix is more productive (in terms of the total number of items) in this <code>Brown Family (CLAWS + TreeTagger tags)</code> corpus? Is your intuition supported by the data?</p></li>
</ul></li>
</ul>
</section>
</section>
<section id="practice-4-frequency-of-lexical-verb-tags-in-american-ame-vs.-british-bre-english-for-the-press-editorial-ame-bre-vs.-press-reportage-ame-bre" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="practice-4-frequency-of-lexical-verb-tags-in-american-ame-vs.-british-bre-english-for-the-press-editorial-ame-bre-vs.-press-reportage-ame-bre"><span class="header-section-number">4</span> Practice 4: Frequency of lexical-verb tags in American (AmE) vs.&nbsp;British (BrE) English for the Press: Editorial (AmE &amp; BrE) vs.&nbsp;Press: Reportage (AmE &amp; BrE)</h2>
<section id="requirements" class="level3" data-number="4.1">
Expand Down Expand Up @@ -343,7 +344,7 @@ <h3 data-number="4.2" class="anchored" data-anchor-id="task-2"><span class="head
<li><p>In the <code>View options</code>, check the Frequency per million words to see the relative frequency</p></li>
<li><p>Answer key: <a href="https://ske.li/15m" class="uri">https://ske.li/15m</a></p></li>
<li><p>There are several list of verb tags, focus on comparing a given tag between Editorial and Reportage (within variety) and between the same sub-genre across variety.</p></li>
<li><p>For more insightful analysese, we need to further process these datasets in Excel or in statistical programming language such as R.</p>
<li><p>For more insightful analyses, we need to further process these datasets in Excel or in statistical programming language such as R.</p>
<ul>
<li>Most of the time, corpus tools like Sketch Engine or AntConc are just part of the means (e.g., providing raw data) to an end. For example, we might want to directly visualise the distribution (i.e., relative frequency) of selected tags by genres between varieties as shown in <a href="#fig-lexverb-tag" class="quarto-xref">Figure&nbsp;4</a>. This requires processing the output of corpus tool further that these two corpus tools cannot do. The graph in <a href="#fig-lexverb-tag" class="quarto-xref">Figure&nbsp;4</a> is produced in R (<a href="https://github.com/complexico/dipscorling2024/blob/main/01-freqlist-code.R">code file</a>) based on the <a href="https://github.com/complexico/dipscorling2024/tree/main/results">data in the repository here</a> (see the .csv files starting with <code>lex-verb-...</code>).</li>
</ul></li>
Expand Down
Binary file modified 01-practice.pdf
Binary file not shown.
Loading

0 comments on commit bf48557

Please sign in to comment.