Skip to content

Commit

Permalink
managing input data Chapter is ready for review
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Jan 6, 2025
1 parent 7b58ab6 commit d60424c
Show file tree
Hide file tree
Showing 82 changed files with 11,883 additions and 7,098 deletions.
8 changes: 2 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,9 @@ clean:
poetry run jupyter-book clean tamingllms/


convert:
convert-to-markdown:
poetry run jupyter nbconvert --to markdown $(file)

d2:
d2 -t 1 --sketch tamingllms/_static/safety/design.d2 tamingllms/_static/safety/design.svg


convert-latex:
jupyter nbconvert tamingllms/notebooks/structured_output.ipynb --to latex
d2 -t 1 --sketch $(file) $(output)

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Abstract: **The current discourse around Large Language Models (LLMs) tends to f
| About the Book | | | [html](https://www.tamingllms.com/markdown/intro.html) | N/A | *Ready for Review* |
| Chapter 1: The Evals Gap | [pdf](https://www.dropbox.com/scl/fi/voyhpqp0glkhijopyev71/DRAFT_Chapter-1-The-Evals-Gap.pdf?rlkey=ehzf6g4ngsssuoe471on8itu4&st=zqv98w2n&dl=0) | [podcast](https://tamingllm.substack.com/p/chapter-1-podcast-the-evals-gap) | [html](https://www.tamingllms.com/notebooks/evals.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/evals.ipynb) | *Ready for Review* |
| Chapter 2: Structured Output| [pdf](https://www.dropbox.com/scl/fi/x3a84bm1ewcfemj4p7b5p/DRAFT_Chapter-2-Structured-Output.pdf?rlkey=zysw6mat7har133rs7am7bb8n&st=4ns4ak24&dl=0) | podcast | [html](https://www.tamingllms.com/notebooks/structured_output.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/structured_output.ipynb) | *Ready for Review* |
| Chapter 3: Managing Input Data | | | [html](https://www.tamingllms.com/notebooks/input.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/input.ipynb) | WIP |
| Chapter 3: Managing Input Data | | | [html](https://www.tamingllms.com/notebooks/input.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/input.ipynb) | *Ready for Review* |
| Chapter 4: Safety | | | [html](https://www.tamingllms.com/notebooks/safety.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/safety.ipynb) | *Ready for Review* |
| Chapter 5: Preference-Based Alignment | | | [html](https://www.tamingllms.com/notebooks/alignment.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/alignment.ipynb) | *Ready for Review* |
| Chapter 6: Local LLMs in Practice | | | [html](https://www.tamingllms.com/notebooks/local.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/local.ipynb) | *Ready for Review* |
Expand Down
5 changes: 5 additions & 0 deletions TESTIMONIALS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
> "I clicked on the link to quickly read the comparison result before going to bed. Ended up reading almost the whole website. Great resource, which covers everything I’ve learned in the past year and much more! Thank you!"
-- Julien Nahum, Founder of NotionForms, Ex-SDE at AWS

> This is amazing content, thank you so much for sharing!!!
-- Didier Lopes, Founder of OpenBB
Binary file added meta2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
894 changes: 894 additions & 0 deletions meta2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,392 changes: 1,391 additions & 1 deletion poetry.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ markitdown = "^0.0.1a3"
docling = "^2.14.0"
python-levenshtein = "^0.26.1"
sphinx-math-dollar = "^1.2.1"
chromadb = "^0.6.1"
sentence-transformers = "^3.3.1"


[build-system]
Expand Down
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/intro.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/preface.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/toc.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/alignment.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/cost.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/evals.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/input.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/local.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/safety.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/structured_output.doctree
Binary file not shown.
Binary file added tamingllms/_build/html/_images/LC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
118 changes: 118 additions & 0 deletions tamingllms/_build/html/_images/embedding.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions tamingllms/_build/html/_images/incontext.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/llm_judge.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
879 changes: 0 additions & 879 deletions tamingllms/_build/html/_images/llm_judge.svg

This file was deleted.

Binary file added tamingllms/_build/html/_images/meta2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
882 changes: 0 additions & 882 deletions tamingllms/_build/html/_images/meta2.svg

This file was deleted.

4 changes: 4 additions & 0 deletions tamingllms/_build/html/_images/rag.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/similarity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 8 additions & 4 deletions tamingllms/_build/html/_sources/markdown/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,15 @@ Throughout this book, we'll tackle the following (non-exhaustive) list of critic

3. **Testing Complexity**: Traditional software testing methodologies break down when dealing with non-deterministic and generative systems, requiring new approaches.

4. **Safety and Alignment**: LLMs can generate harmful, biased, or inappropriate content, requiring robust safeguards and monitoring systems to ensure safe deployment.
4. **Safety**: LLMs can generate harmful, biased, or inappropriate content, requiring robust safeguards and monitoring systems to ensure safe deployment.

5. **Vendor Lock-in**: Cloud-based LLM providers can create significant dependencies and lock-in through their proprietary APIs and infrastructure, making it difficult to switch providers or self-host solutions.
5. **Alignment**: LLMs are next-token prediction models, which means they are not aligned with the user's preferences by default.

6. **Cost Optimization**: The computational and financial costs of operating LLM-based systems can quickly become prohibitive without careful management, and optimization.
6. **Vendor Lock-in**: Cloud-based LLM providers can create significant dependencies and lock-in through their proprietary APIs and infrastructure, making it difficult to switch providers or self-host solutions.

7. **Cost Optimization**: The computational and financial costs of operating LLM-based systems can quickly become prohibitive without careful management, and optimization.

We conclude with a discussion on the future of LLMs and the challenges that will arise as we move forward.


## A Practical Approach
Expand Down Expand Up @@ -171,7 +175,7 @@ Now that your environment is set up, let's begin our exploration of LLM challeng

## About the Author

Tharsis Souza (Ph.D. Computer Science, UCL University of London) is a computer scientist and product leader specializing in AI-based products. He is a Lecturer at Columbia University's Master of Science program in Applied Analytics, (*incoming*) Head of Product, Equities at Citadel, and former Senior VP at Two Sigma Investments. He mentors under-represented students & working professionals to help create a more diverse global AI1 ecosystem.
Tharsis Souza (Ph.D. Computer Science, UCL University of London) is a computer scientist and product leader specializing in AI-based products. He is a Lecturer at Columbia University's Master of Science program in Applied Analytics, (*incoming*) Head of Product, Equities at Citadel, and former Senior VP at Two Sigma Investments. He mentors under-represented students & working professionals to help create a more diverse global AI ecosystem.

With over 15 years of experience delivering technology products across startups and Fortune 500 companies, he is also an author of numerous scholarly publications and a frequent speaker at academic and business conferences. Grounded on academic background and drawing from practical experience building and scaling up products powered by language models at early-stage startups, major institutions as well as contributing to open source projects, he brings a unique perspective on bridging the gap between LLMs promised potential and their practical implementation challenges to enable the next generation of AI-powered products.

12 changes: 11 additions & 1 deletion tamingllms/_build/html/_sources/markdown/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,14 @@ Abstract: *The current discourse around Large Language Models (LLMs) tends to fo

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC-BY--NC--SA-4.0-lightgrey.svg
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC-BY--NC--SA-4.0-lightgrey.svg

```
@misc{tharsistpsouza2024tamingllms,
author = {Tharsis T. P. Souza},
title = {Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software},
year = {2024},
journal = {GitHub repository},
url = {https://github.com/souzatharsis/tamingLLMs)
}
```
4 changes: 3 additions & 1 deletion tamingllms/_build/html/_sources/notebooks/cost.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Quantization is a powerful technique for reducing the memory footprint of LLMs. This can be exemplified by the case of LLaMa 3.3 70B as quantized by {cite}`unsloth2024llama3` [^unsloth]. The model's memory requirements vary significantly based on the quantization level used as demonstrated in {numref}`quantized`.\n",
"Quantization[^visual-quantization] is a powerful technique for reducing the memory footprint of LLMs. This can be exemplified by the case of LLaMa 3.3 70B as quantized by {cite}`unsloth2024llama3` [^unsloth]. The model's memory requirements vary significantly based on the quantization level used as demonstrated in {numref}`quantized`.\n",
"\n",
"[^visual-quantization]: Maarten Grootendorst provides the best visual guide for model quantization {cite}`grootendorst2024quantization`.\n",
"\n",
"[^unsloth]: Unsloth runs a business of making LLMs fine-tuning streamlined. Check them out at [unsloth.ai](https://unsloth.ai).\n",
"\n",
Expand Down
6 changes: 3 additions & 3 deletions tamingllms/_build/html/_sources/notebooks/evals.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -853,7 +853,7 @@
"4. **Run Evaluations**: Use the judge model to score outputs. Consider using a large and/or more capable model as a judge to provide more nuanced assessments.\n",
"5. **Aggregate and Analyze Results**: Interpret scores to refine applications.\n",
"\n",
"```{figure} ../_static/evals/llm_judge.svg\n",
"```{figure} ../_static/evals/llm_judge.png\n",
"---\n",
"name: llm_judge\n",
"alt: Conceptual Overview\n",
Expand Down Expand Up @@ -1187,11 +1187,11 @@
"\n",
"An alternative to the above approaches is to use humans to directly evaluate the LLM-judges themselves. A notable example of this is [Judge Arena](https://judgearena.com/) {cite}`judgearena2024`, which is a platform that allows users to vote on which AI model made the better evaluation. Under this approach, the performance of the LLM evaluator is given by the (blind) evaluation of humans who perform the voting on randomly generated pairs of LLM judges as depicted in {numref}`meta2`. Only after submitting a vote, users can see which models were actually doing the judging.\n",
"\n",
"```{figure} ../_static/evals/meta2.svg\n",
"```{figure} ../_static/evals/meta2.png\n",
"---\n",
"name: meta2\n",
"alt: Human-in-the-loop meta evaluation Conceptual Overview\n",
"scale: 60%\n",
"scale: 75%\n",
"align: center\n",
"---\n",
"Human-in-the-loop Meta Evaluation.\n",
Expand Down
Loading

0 comments on commit d60424c

Please sign in to comment.