deploy: 60055ad

thedataquarry · Feb 9, 2024 · ea86c69 · ea86c69
1 parent 7f72516
commit ea86c69
Show file tree

Hide file tree

Showing 17 changed files with 196 additions and 205 deletions.
diff --git a/index.html b/index.html
@@ -198,14 +198,11 @@ <h2 id="whats-covered-in-this-book"><a class="header" href="#whats-covered-in-th
 <tr><td>Mock data generation</td><td>File-handling</td><td>RNG, sampling</td></tr>
 <tr><td>Age grouping</td><td>File-handling</td><td>enums</td></tr>
 <tr><td>Datetime parsing</td><td>File-handling</td><td>chrono, lifetimes</td></tr>
-<tr><td>Preprocessing data for NLP</td><td>Parallelism</td><td>rayon, parallelism</td></tr>
+<tr><td>Extract pronouns from text</td><td>Parallelism</td><td>rayon, parallelism</td></tr>
 <tr><td>Polars datetimes</td><td>DataFrames</td><td>datetimes</td></tr>
 <tr><td>Polars EDA</td><td>DataFrames</td><td>TBD</td></tr>
 <tr><td>Postgres</td><td>Databases</td><td>async, sqlx, tokio</td></tr>
-<tr><td>DuckDB</td><td>Databases</td><td>arrow, in-memory DB</td></tr>
 <tr><td>Meilisearch</td><td>Databases</td><td>async, async-std, clap</td></tr>
-<tr><td>Qdrant</td><td>Databases</td><td>async, tokio, gRPC</td></tr>
-<tr><td>KùzuDB</td><td>Databases</td><td>async, graph</td></tr>
 <tr><td>REST API to Postgres</td><td>APIs</td><td>axum, async, tokio</td></tr>
 <tr><td>REST API to local LLM</td><td>APIs</td><td>axum, LLMs</td></tr>
 <tr><td>PyO3 mock data generation</td><td>Unification</td><td>TBD</td></tr>

diff --git a/introduction/_index.html b/introduction/_index.html
@@ -198,14 +198,11 @@ <h2 id="whats-covered-in-this-book"><a class="header" href="#whats-covered-in-th
 <tr><td>Mock data generation</td><td>File-handling</td><td>RNG, sampling</td></tr>
 <tr><td>Age grouping</td><td>File-handling</td><td>enums</td></tr>
 <tr><td>Datetime parsing</td><td>File-handling</td><td>chrono, lifetimes</td></tr>
-<tr><td>Preprocessing data for NLP</td><td>Parallelism</td><td>rayon, parallelism</td></tr>
+<tr><td>Extract pronouns from text</td><td>Parallelism</td><td>rayon, parallelism</td></tr>
 <tr><td>Polars datetimes</td><td>DataFrames</td><td>datetimes</td></tr>
 <tr><td>Polars EDA</td><td>DataFrames</td><td>TBD</td></tr>
 <tr><td>Postgres</td><td>Databases</td><td>async, sqlx, tokio</td></tr>
-<tr><td>DuckDB</td><td>Databases</td><td>arrow, in-memory DB</td></tr>
 <tr><td>Meilisearch</td><td>Databases</td><td>async, async-std, clap</td></tr>
-<tr><td>Qdrant</td><td>Databases</td><td>async, tokio, gRPC</td></tr>
-<tr><td>KùzuDB</td><td>Databases</td><td>async, graph</td></tr>
 <tr><td>REST API to Postgres</td><td>APIs</td><td>axum, async, tokio</td></tr>
 <tr><td>REST API to local LLM</td><td>APIs</td><td>axum, LLMs</td></tr>
 <tr><td>PyO3 mock data generation</td><td>Unification</td><td>TBD</td></tr>

diff --git a/introduction/why-rust.html b/introduction/why-rust.html
@@ -185,7 +185,7 @@ <h1 id="why-use-rust-with-python"><a class="header" href="#why-use-rust-with-pyt
 <p>It's possible to write relatively high-performance code in Python these days by leveraging its rich library ecosystem (which are typically wrappers around C/C++/Cython runtimes). However, performance and concurrency are <em>not</em> Python's strong suits, and this requires performance-critical code to be implemented in lower-level languages. For many Python developers, using languages like C, C++ and Cython is a daunting prospect.</p>
 <p><a href="https://www.rust-lang.org/">Rust</a> is a statically typed, compiled programming language that's known for its relatively steep learning curve. Its design philosophy is centered around three core functions: performance, safety, and fearless concurrency. It offers a modern, high-level syntax and a rich type system that makes it possible to write code that runs really fast without the need for manual memory management, eliminating entire classes of bugs.</p>
 <p>Although it's possible to write all sorts of complex tools and applications in Rust, it's not the best option for <em>every</em> situation. In cases like research and prototyping, where speed of iteration is important, Rust's strict compiler can slow down development, and Python is still the better choice.</p>
-<p>We believe that Python 🐍 and Rust 🦀 form a near-perfect pair to address either side of the so-called &quot;two-world problem&quot;, explained below.</p>
+<p>We believe that Python 🐍 and Rust 🦀 form a near-perfect pair to address either side of the so-called "two-world problem", explained below.</p>
 <h2 id="the-two-world-problem"><a class="header" href="#the-two-world-problem">The two-world problem</a></h2>
 <p>The programming world often finds itself divided in two: those who prefer high-level, dynamically typed languages, and those who prefer low-level, statically typed languages.</p>
 <p>Many high-level languages are interpreted (i.e., they execute each line as it's read, sequentially). These languages are generally easier to learn because they abstract away the details of memory management, allowing for rapid prototyping and development.</p>
@@ -194,7 +194,7 @@ <h2 id="the-two-world-problem"><a class="header" href="#the-two-world-problem">T
 <p><img src="/image/two-world-problem.png" alt="" /></p>
 <p>The image above is a figurative representation of two distributions of people, typically disparate individuals from either background (with the languages listed in no specific order).</p>
 <h2 id="has-the-two-world-problem-been-solved-before"><a class="header" href="#has-the-two-world-problem-been-solved-before">Has the two-world problem been solved before?</a></h2>
-<p>A lot of readers will have heard of Julia, a dynamically typed, just-in-time (JIT) compiled alternative to Python and is often touted as a &quot;high-level language with the performance of C&quot;. While Julia is no doubt a great language, it's popularity is largely limited to the scientific community and its library ecosystem and user community haven't yet matured to the extent that Python's has. As such, the &quot;two-language problem&quot; that Julia <a href="https://julialang.org/blog/2012/02/why-we-created-julia/">attempts to solve</a>, is still largely unsolved.</p>
+<p>A lot of readers will have heard of Julia, a dynamically typed, just-in-time (JIT) compiled alternative to Python and is often touted as a "high-level language with the performance of C". While Julia is no doubt a great language, it's popularity is largely limited to the scientific community and its library ecosystem and user community haven't yet matured to the extent that Python's has. As such, the "two-language problem" that Julia <a href="https://julialang.org/blog/2012/02/why-we-created-julia/">attempts to solve</a>, is still largely unsolved.</p>
 <p>Other languages like Mojo explain in <a href="https://docs.modular.com/mojo/why-mojo.html">their vision</a> how they aim to solve the two-world problem by providing a single unified language (acting like a superset of Python) that can be compiled to run on any hardware. However, Mojo is still very much in its infancy as a language and hasn't gained widespread adoption, and its user community is non-existent.</p>
 <h2 id="rust-and-pyo3"><a class="header" href="#rust-and-pyo3">Rust and PyO3</a></h2>
 <p>The most interesting aspect about PyO3 in combination with Rust is that they offer a new way to think the two-world problem. Rather than trying to <em>solve</em> the problem by creating a new language that offers the best of many worlds, Rust and PyO3 <em>embrace</em> the problem by allowing a developer to move between the worlds and choose the best tool for parts of a larger task.</p>

diff --git a/pieces/hello-world.html b/pieces/hello-world.html
@@ -185,15 +185,15 @@ <h1 id="hello-world"><a class="header" href="#hello-world">Hello world!</a></h1>
 <p>Navigate to the <code>pieces/hello_world</code> directory in the <a href="https://github.com/thedataquarry/rustinpieces/tree/main/pieces/hello_world">repo</a> to get started.</p>
 <h2 id="python"><a class="header" href="#python">Python</a></h2>
 <p>The file <code>main.py</code> has just one line of code:</p>
-<pre><code class="language-python">print(&quot;Hello, world!&quot;)
+<pre><code class="language-python">print("Hello, world!")
 </code></pre>
 <p>The program is run as follows:</p>
 <pre><code class="language-bash">python main.py
 </code></pre>
 <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <p>The file <code>main.rs</code> has just three lines of code:</p>
 <pre><code class="language-rs">fn main() {
-    println!(&quot;Hello, world!&quot;);
+    println!("Hello, world!");
 }
 </code></pre>
 <p>The program is run via <code>cargo</code>:</p>

diff --git a/pieces/intro/dicts_vs_hashmaps.html b/pieces/intro/dicts_vs_hashmaps.html
@@ -189,27 +189,27 @@ <h2 id="python"><a class="header" href="#python">Python</a></h2>
 <p>Consider the below function in Python, where we define a dictionary of processors and their
 corresponding market names.</p>
 <pre><code class="language-py">    processors = {
-        &quot;13900KS&quot;: &quot;Intel Core i9&quot;,
-        &quot;13700K&quot;: &quot;Intel Core i7&quot;,
-        &quot;13600K&quot;: &quot;Intel Core i5&quot;,
-        &quot;1800X&quot;: &quot;AMD Ryzen 7&quot;,
-        &quot;1600X&quot;: &quot;AMD Ryzen 5&quot;,
-        &quot;1300X&quot;: &quot;AMD Ryzen 3&quot;,
+        "13900KS": "Intel Core i9",
+        "13700K": "Intel Core i7",
+        "13600K": "Intel Core i5",
+        "1800X": "AMD Ryzen 7",
+        "1600X": "AMD Ryzen 5",
+        "1300X": "AMD Ryzen 3",
     }
 
     # Check for presence of value
-    is_item_in_dict = &quot;AMD Ryzen 3&quot; in processors.values()
-    print(f'Is &quot;AMD Ryzen 3&quot; in the dict of processors?: {is_item_in_dict}')
+    is_item_in_dict = "AMD Ryzen 3" in processors.values()
+    print(f'Is "AMD Ryzen 3" in the dict of processors?: {is_item_in_dict}')
     # Lookup by key
-    key = &quot;13900KS&quot;
+    key = "13900KS"
     lookup_by_key = processors[key]
-    print(f'Key &quot;{key}&quot; has the value &quot;{lookup_by_key}&quot;')
+    print(f'Key "{key}" has the value "{lookup_by_key}"')
 </code></pre>
 <p>The first portion checks for the presence of a value in the dictionary, while the second portion
 looks up the value by key.</p>
 <p>Running the above function via <code>main.py</code> gives us the following output:</p>
-<pre><code class="language-bash">Is &quot;AMD Ryzen 3&quot; in the dict of processors?: True
-Key &quot;13900KS&quot; has the value &quot;Intel Core i9&quot;
+<pre><code class="language-bash">Is "AMD Ryzen 3" in the dict of processors?: True
+Key "13900KS" has the value "Intel Core i9"
 </code></pre>
 <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <p>We define the below function in Rust, where we define a hashmap of processors and their
@@ -218,25 +218,25 @@ <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 
 fn run8() {
     let mut processors = HashMap::new();
-    processors.insert(&quot;13900KS&quot;, &quot;Intel Core i9&quot;);
-    processors.insert(&quot;13700K&quot;, &quot;Intel Core i7&quot;);
-    processors.insert(&quot;13600K&quot;, &quot;Intel Core i5&quot;);
-    processors.insert(&quot;1800X&quot;, &quot;AMD Ryzen 7&quot;);
-    processors.insert(&quot;1600X&quot;, &quot;AMD Ryzen 5&quot;);
-    processors.insert(&quot;1300X&quot;, &quot;AMD Ryzen 3&quot;);
+    processors.insert("13900KS", "Intel Core i9");
+    processors.insert("13700K", "Intel Core i7");
+    processors.insert("13600K", "Intel Core i5");
+    processors.insert("1800X", "AMD Ryzen 7");
+    processors.insert("1600X", "AMD Ryzen 5");
+    processors.insert("1300X", "AMD Ryzen 3");
 
     // Check for presence of value
-    let value = &quot;AMD Ryzen 3&quot;;
+    let value = "AMD Ryzen 3";
     let mut values = processors.values();
     println!(
-        &quot;Is \&quot;AMD Ryzen 3\&quot; in the hashmap of processors?: {}&quot;,
+        "Is \"AMD Ryzen 3\" in the hashmap of processors?: {}",
         values.any(|v| v == &amp;value)
     );
     // Lookup by key
-    let key = &quot;13900KS&quot;;
+    let key = "13900KS";
     let lookup_by_key = processors.get(key);
     println!(
-        &quot;Key \&quot;{}\&quot; has the value \&quot;{}\&quot;&quot;,
+        "Key \"{}\" has the value \"{}\"",
         key,
         lookup_by_key.unwrap()
     );
@@ -245,8 +245,8 @@ <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <p>Just like in the Python version, the first portion checks for the presence of a value in the
 hashmap, while the second portion looks up the value by key.</p>
 <p>Running the function via <code>main.rs</code> gives us the same output as in Python:</p>
-<pre><code class="language-bash">Is &quot;AMD Ryzen 3&quot; in the hashmap of processors?: true
-Key &quot;13900KS&quot; has the value &quot;Intel Core i9&quot;
+<pre><code class="language-bash">Is "AMD Ryzen 3" in the hashmap of processors?: true
+Key "13900KS" has the value "Intel Core i9"
 </code></pre>
 <h2 id="takeaways"><a class="header" href="#takeaways">Takeaways</a></h2>
 <p>Python and Rust contain collections that store key-value pairs for fast lookups. A key difference is
@@ -255,18 +255,18 @@ <h2 id="takeaways"><a class="header" href="#takeaways">Takeaways</a></h2>
 <p>In Python, this <code>dict</code> is perfectly valid:</p>
 <pre><code class="language-py"># You can have a dict with keys of different types
 example = {
-    &quot;a&quot;: 1,
+    "a": 1,
     1: 2
 }
 </code></pre>
 <p>In Rust, the compiler will enforce that the keys and values are of the same type, based on
 the first entry's inferred types.</p>
 <pre><code class="language-rs">let mut example = HashMap::new();
-example.insert(&quot;a&quot;, 1);
+example.insert("a", 1);
 // This errors because the first entry specified the key as &amp;str
 example.insert(1, 2);
 // This is valid
-example.insert(&quot;b&quot;, 2);
+example.insert("b", 2);
 </code></pre>
 
                     </main>

diff --git a/pieces/intro/enumerate.html b/pieces/intro/enumerate.html
@@ -188,9 +188,9 @@ <h2 id="python"><a class="header" href="#python">Python</a></h2>
 <code>Person</code> class with a name and an age attribute.</p>
 <p>We can instantiate a list of <code>Person</code> objects and iterate over them using <code>enumerate</code>.</p>
 <pre><code class="language-py">def run2() -&gt; None:
-    persons = [Person(&quot;James&quot;, 33), Person(&quot;Salima&quot;, 31)]
+    persons = [Person("James", 33), Person("Salima", 31)]
     for i, person in enumerate(persons):
-        print(f&quot;Person {i}: {str(person)}&quot;)
+        print(f"Person {i}: {str(person)}")
 </code></pre>
 <p>Running the above function via <code>main.py</code> gives us the same output as in Rust:</p>
 <pre><code class="language-bash">Person 0: James is 33 years old
@@ -211,9 +211,9 @@ <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <p>For most purposes, vectors in Rust perform the same function as Python lists.
 Unlike a Python list, a vector in Rust can only contain objects of the same type, in this case, <code>Person</code>.</p>
 <pre><code class="language-rs">fn run2() {
-    let persons = vec![Person::new(&quot;James&quot;, 33), Person::new(&quot;Salima&quot;, 31)];
+    let persons = vec![Person::new("James", 33), Person::new("Salima", 31)];
     for (i, p) in persons.iter().enumerate() {
-        println!(&quot;Person {}: {}&quot;, i, p)
+        println!("Person {}: {}", i, p)
     }
 }
 </code></pre>

diff --git a/pieces/intro/lambdas_vs_closures.html b/pieces/intro/lambdas_vs_closures.html
@@ -190,10 +190,10 @@ <h2 id="python"><a class="header" href="#python">Python</a></h2>
 <p>In the following example, we use the <code>sorted</code> function to sort a list of <code>Person</code> objects by their
 age.</p>
 <pre><code class="language-py">def run5() -&gt; None:
-    persons = [Person(&quot;Aiko&quot;, 41), Person(&quot;Rohan&quot;, 18)]
+    persons = [Person("Aiko", 41), Person("Rohan", 18)]
     sorted_by_age = sorted(persons, key=lambda person: person.age)
     youngest_person = sorted_by_age[0]
-    print(f&quot;{youngest_person.name} is the youngest person at {youngest_person.age} years old&quot;)
+    print(f"{youngest_person.name} is the youngest person at {youngest_person.age} years old")
 </code></pre>
 <p>The <code>sorted</code> function takes an optional <code>key</code> argument, which is a function that is called on each
 item in the list to determine the value to sort by. In this case, we use a lambda to return the
@@ -206,12 +206,12 @@ <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <p>In the following example, we use the <code>sort_by_key</code> method to sort a vector of <code>Person</code> objects by
 their age.</p>
 <pre><code class="language-rs">fn run5() {
-    let mut persons = vec![Person::new(&quot;Aiko&quot;, 41), Person::new(&quot;Rohan&quot;, 18)];
+    let mut persons = vec![Person::new("Aiko", 41), Person::new("Rohan", 18)];
     // Sort by age
     persons.sort_by_key(|p| p.age);
     let youngest_person = persons.first().unwrap();
     println!(
-        &quot;{} is the youngest person at {} years old&quot;,
+        "{} is the youngest person at {} years old",
         youngest_person.name, youngest_person.age
     );
 </code></pre>

diff --git a/pieces/intro/list_comprehensions_vs_map.html b/pieces/intro/list_comprehensions_vs_map.html
@@ -189,14 +189,14 @@ <h2 id="python"><a class="header" href="#python">Python</a></h2>
 <p>Consider the following function in which we print a message depending on which persons from
 a list of <code>Person</code> objects are born after the year 1995, based on their current age.</p>
 <pre><code class="language-py">def run7() -&gt; None:
-    &quot;&quot;&quot;
+    """
     1. List comprehensions
-    &quot;&quot;&quot;
-    persons = [Person(&quot;Issa&quot;, 39), Person(&quot;Ibrahim&quot;, 26)]
+    """
+    persons = [Person("Issa", 39), Person("Ibrahim", 26)]
     persons_born_after_1995 = [
         (person.name, person.age) for person in persons if approx_year_of_birth(person) &gt; 1995
     ]
-    print(f&quot;Persons born after 1995: {persons_born_after_1995}&quot;)
+    print(f"Persons born after 1995: {persons_born_after_1995}")
 </code></pre>
 <p>The list comprehension in the above function essentially does the following:</p>
 <ol>
@@ -211,13 +211,13 @@ <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <p>We can define the below function in Rust, where we print a message depending on which persons from
 a vector of <code>Person</code> objects are born after the year 1995, based on their current age.</p>
 <pre><code class="language-rs">fn run7() {
-    let persons = vec![Person::new(&quot;Issa&quot;, 39), Person::new(&quot;Ibrahim&quot;, 26)];
+    let persons = vec![Person::new("Issa", 39), Person::new("Ibrahim", 26)];
     let result = persons
         .into_iter()
         .filter(|p| approx_year_of_birth(p) &gt; 1995)
         .map(|p| (p.name, p.age))
         .collect::&lt;Vec&lt;(String, u8)&gt;&gt;();
-    println!(&quot;Persons born after 1995: {:?}&quot;, result)
+    println!("Persons born after 1995: {:?}", result)
 </code></pre>
 <p>The <code>filter</code> and <code>map</code> functions in the above function essentially do the following:</p>
 <ol>
@@ -227,7 +227,7 @@ <h2 id="rust"><a class="header" href="#rust">Rust</a></h2>
 <li>Collect all the tuples into a vector of unsigned 8-bit integers</li>
 </ol>
 <p>Running the function via <code>main.rs</code> gives us the same output as in Python:</p>
-<pre><code class="language-bash">Persons born after 1995: [(&quot;Ibrahim&quot;, 26)]
+<pre><code class="language-bash">Persons born after 1995: [("Ibrahim", 26)]
 </code></pre>
 <p>The Rust version is a little more verbose than the Python version, but it's still quite readable.</p>
 <h2 id="takeaways"><a class="header" href="#takeaways">Takeaways</a></h2>