Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/examples.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/feature_builder.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/temporal_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_chat_level_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/check_embeddings.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preprocess.doctree
Binary file not shown.
10 changes: 8 additions & 2 deletions docs/build/html/_sources/examples.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,10 @@ Now we are ready to call the FeatureBuilder on our data. All we need to do is de
speaker_id_col = "speaker_nickname",
message_col = "message",
timestamp_col = "timestamp",
grouping_keys = ["batch_num", "round_num"],
grouping_keys = ["batch_num", "round_num"], # NOTE: This example demonstrates grouping. Use conversation_id_col if you have a single conversation identifier.
vector_directory = "./vector_data/",
output_file_base = "jury_output",
turns = True
turns = True # NOTE: This defaults to False. Decide whether you want to combine successive 'utterances' by the same person as a 'turn.'
)
jury_feature_builder.featurize()

Expand Down Expand Up @@ -219,6 +219,12 @@ Regenerating Vector Cache

* By default, **we assume that, if your output file is named the same, that the underlying vectors are the same**. If this isn't true, you should set **regenerate_vectors = True** in order to clear out the cache and re-generate the RoBERTa and SBERT outputs.


Generating Vectors using GPU
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, we use the CPU to generate sentence vectors and cached RoBERTa sentimets. To override this feature and use a GPU when available (which will speed up the computation of the vectors), turn ``use_gpu`` to True.


Custom Features
~~~~~~~~~~~~~~~~~

Expand Down
8 changes: 6 additions & 2 deletions docs/build/html/examples.html
Original file line number Diff line number Diff line change
Expand Up @@ -158,10 +158,10 @@ <h3>Configuring the FeatureBuilder<a class="headerlink" href="#configuring-the-f
<span class="n">speaker_id_col</span> <span class="o">=</span> <span class="s2">&quot;speaker_nickname&quot;</span><span class="p">,</span>
<span class="n">message_col</span> <span class="o">=</span> <span class="s2">&quot;message&quot;</span><span class="p">,</span>
<span class="n">timestamp_col</span> <span class="o">=</span> <span class="s2">&quot;timestamp&quot;</span><span class="p">,</span>
<span class="n">grouping_keys</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;batch_num&quot;</span><span class="p">,</span> <span class="s2">&quot;round_num&quot;</span><span class="p">],</span>
<span class="n">grouping_keys</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;batch_num&quot;</span><span class="p">,</span> <span class="s2">&quot;round_num&quot;</span><span class="p">],</span> <span class="c1"># NOTE: This example demonstrates grouping. Use conversation_id_col if you have a single conversation identifier.</span>
<span class="n">vector_directory</span> <span class="o">=</span> <span class="s2">&quot;./vector_data/&quot;</span><span class="p">,</span>
<span class="n">output_file_base</span> <span class="o">=</span> <span class="s2">&quot;jury_output&quot;</span><span class="p">,</span>
<span class="n">turns</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">turns</span> <span class="o">=</span> <span class="kc">True</span> <span class="c1"># NOTE: This defaults to False. Decide whether you want to combine successive &#39;utterances&#39; by the same person as a &#39;turn.&#39;</span>
<span class="p">)</span>
<span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">featurize</span><span class="p">()</span>
</pre></div>
Expand Down Expand Up @@ -302,6 +302,10 @@ <h5>Regenerating Vector Cache<a class="headerlink" href="#regenerating-vector-ca
</li>
</ul>
</section>
<section id="generating-vectors-using-gpu">
<h5>Generating Vectors using GPU<a class="headerlink" href="#generating-vectors-using-gpu" title="Link to this heading"></a></h5>
<p>By default, we use the CPU to generate sentence vectors and cached RoBERTa sentimets. To override this feature and use a GPU when available (which will speed up the computation of the vectors), turn <code class="docutils literal notranslate"><span class="pre">use_gpu</span></code> to True.</p>
</section>
<section id="custom-features">
<h5>Custom Features<a class="headerlink" href="#custom-features" title="Link to this heading"></a></h5>
<ul>
Expand Down
25 changes: 24 additions & 1 deletion docs/build/html/feature_builder.html

Large diffs are not rendered by default.

16 changes: 0 additions & 16 deletions docs/build/html/features/temporal_features.html
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@
<li class="toctree-l3"><a class="reference internal" href="politeness_features.html">politeness_features module</a></li>
<li class="toctree-l3"><a class="reference internal" href="hedge.html">hedge module</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">temporal_features module</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#features.temporal_features.coerce_to_date_or_number"><code class="docutils literal notranslate"><span class="pre">coerce_to_date_or_number()</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="#features.temporal_features.get_time_diff"><code class="docutils literal notranslate"><span class="pre">get_time_diff()</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="#features.temporal_features.get_time_diff_startend"><code class="docutils literal notranslate"><span class="pre">get_time_diff_startend()</span></code></a></li>
</ul>
Expand Down Expand Up @@ -111,21 +110,6 @@

<section id="module-features.temporal_features">
<span id="temporal-features-module"></span><h1>temporal_features module<a class="headerlink" href="#module-features.temporal_features" title="Link to this heading"></a></h1>
<dl class="py function">
<dt class="sig sig-object py" id="features.temporal_features.coerce_to_date_or_number">
<span class="sig-prename descclassname"><span class="pre">features.temporal_features.</span></span><span class="sig-name descname"><span class="pre">coerce_to_date_or_number</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#features.temporal_features.coerce_to_date_or_number" title="Link to this definition"></a></dt>
<dd><p>Helper function in which we check that the timestamp column contains either a datetime value or a number
that can be interpreted as a time elapsed; otherwise, sets it equal to none.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>value</strong> – The value to check; type can be anything</p>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>Either the value itself (if it is a valid timestamp value) or None otherwise</p>
</dd>
</dl>
</dd></dl>

<dl class="py function">
<dt class="sig sig-object py" id="features.temporal_features.get_time_diff">
<span class="sig-prename descclassname"><span class="pre">features.temporal_features.</span></span><span class="sig-name descname"><span class="pre">get_time_diff</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">on_column</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">conversation_id_col</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_unit</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#features.temporal_features.get_time_diff" title="Link to this definition"></a></dt>
Expand Down
17 changes: 11 additions & 6 deletions docs/build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ <h1 id="index">Index</h1>
| <a href="#S"><strong>S</strong></a>
| <a href="#T"><strong>T</strong></a>
| <a href="#U"><strong>U</strong></a>
| <a href="#V"><strong>V</strong></a>
| <a href="#W"><strong>W</strong></a>

</div>
Expand All @@ -106,8 +107,6 @@ <h2 id="A">A</h2>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="utils/preprocess.html#utils.preprocess.assert_key_columns_present">assert_key_columns_present() (in module utils.preprocess)</a>
</li>
<li><a href="utils/assign_chunk_nums.html#utils.assign_chunk_nums.assign_chunk_nums">assign_chunk_nums() (in module utils.assign_chunk_nums)</a>
</li>
</ul></td>
Expand Down Expand Up @@ -169,8 +168,6 @@ <h2 id="C">C</h2>
<li><a href="features/readability.html#features.readability.classify_text_dalechall">classify_text_dalechall() (in module features.readability)</a>
</li>
<li><a href="features/politeness_v2_helper.html#features.politeness_v2_helper.clean_text">clean_text() (in module features.politeness_v2_helper)</a>
</li>
<li><a href="features/temporal_features.html#features.temporal_features.coerce_to_date_or_number">coerce_to_date_or_number() (in module features.temporal_features)</a>
</li>
<li><a href="features/politeness_v2_helper.html#features.politeness_v2_helper.commit_data">commit_data() (in module features.politeness_v2_helper)</a>
</li>
Expand All @@ -186,10 +183,10 @@ <h2 id="C">C</h2>
</li>
<li><a href="features/politeness_v2_helper.html#features.politeness_v2_helper.conjection_seperator">conjection_seperator() (in module features.politeness_v2_helper)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="features/word_mimicry.html#features.word_mimicry.Content_mimicry_score">Content_mimicry_score() (in module features.word_mimicry)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="features/word_mimicry.html#features.word_mimicry.Content_mimicry_score_per_conv">Content_mimicry_score_per_conv() (in module features.word_mimicry)</a>
</li>
<li><a href="feature_builder.html#feature_builder.FeatureBuilder.conv_level_features">conv_level_features() (feature_builder.FeatureBuilder method)</a>
Expand Down Expand Up @@ -932,6 +929,14 @@ <h2 id="U">U</h2>
</ul></td>
</tr></table>

<h2 id="V">V</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="feature_builder.html#feature_builder.FeatureBuilder.verify_timestamp_format">verify_timestamp_format() (feature_builder.FeatureBuilder method)</a>
</li>
</ul></td>
</tr></table>

<h2 id="W">W</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
Expand Down
Binary file modified docs/build/html/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/searchindex.js

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions docs/build/html/utils/calculate_chat_level_features.html
Original file line number Diff line number Diff line change
Expand Up @@ -348,8 +348,7 @@
<dd><p>Calculate features relevant to the timestamps of each chat.</p>
<p>This function calculates and appends the following temporal feature to the chat data:
- Time difference between messages sent</p>
<p>It checks whether the ‘timestamp’ column is available. If not, it tries to calculate
using ‘timestamp_start’ and ‘timestamp_end’ columns.</p>
<p>It assumes the ‘timestamp’ column is available, which is checked in feature_builder.py.</p>
<dl class="field-list simple">
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>None</p>
Expand Down
Loading