Skip to content

Commit

Permalink
[IN-19] Semantic Chunking Implementation (#4)
Browse files Browse the repository at this point in the history
* Save

* First section

* Draft

* Tidy up

* Update text

* Update based on review comments
  • Loading branch information
osw282 authored Sep 9, 2024
1 parent 7cb99a9 commit f8219be
Show file tree
Hide file tree
Showing 8 changed files with 6,215 additions and 118 deletions.
8 changes: 5 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ repos:
- id: trailing-whitespace
files: "\\.(py|txt|yaml|json|md|toml|lock|cfg|html|sh|js|yml)$"
- id: end-of-file-fixer
exclude: data/
- id: check-added-large-files
args: ["--maxkb=1000"]
- id: check-case-conflict
Expand All @@ -26,20 +27,21 @@ repos:

- repo: https://github.com/charliermarsh/ruff-pre-commit
# Ruff version.
rev: "v0.6.2"
rev: "v0.5.6"
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, "--config=pyproject.toml"] # enable autofix

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
rev: v1.11.1
hooks:
- id: mypy
args: ["--config-file=pyproject.toml"]
exclude: ^tests/
additional_dependencies: [types-requests]

- repo: https://github.com/crate-ci/typos
rev: v1.24.1
rev: v1.23.6
hooks:
- id: typos
args: [--config=pyproject.toml]
Expand Down
3,209 changes: 3,095 additions & 114 deletions poetry.lock

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,15 @@ readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10, <3.13"
langchain-cohere = "^0.2.3"
scikit-learn = "^1.5.1"
matplotlib = "^3.9.2"

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.3"
pytest-cov = "^4.1.0"
licensecheck = "^2024.1.2"
ipykernel = "^6.29.5"


[build-system]
Expand Down Expand Up @@ -93,7 +97,7 @@ max-complexity = 5

# typos configuration
[tool.typos.files]
extend-exclude=[".gitignore", "LICENSE", ".*",]
extend-exclude=[".gitignore", "LICENSE", ".*", "**/*.ipynb"]

[tool.typos.default.extend-words]
center = "center"
Expand Down
Binary file added rag_semantic_chunking/assets/ideal_threshold.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
19 changes: 19 additions & 0 deletions rag_semantic_chunking/data/random_facts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Deep in the lush forests of Madagascar, the elusive aye-aye uses its long, thin middle finger to tap on trees, listening for the echoes that indicate the presence of grubs. This nocturnal primate, often mistaken for a rodent, defies conventional evolutionary paths, showcasing nature's unpredictable creativity.

Quantum computing, leveraging the principles of superposition and entanglement, promises to revolutionize industries by solving problems currently deemed intractable for classical computers. As researchers push the boundaries of quantum mechanics, the potential applications range from cryptography to drug discovery, heralding a new era of technological advancement.

In the remote, frozen expanses of Antarctica, emperor penguins huddle together in massive colonies to survive the brutal winter. These birds endure some of the most extreme conditions on Earth, relying on their collective warmth and energy to protect themselves from the piercing cold and relentless winds.

In the fast-paced world of digital finance, blockchain technology emerges as a disruptive force, reshaping how we think about money and transactions. At its core, this decentralized system challenges traditional banking structures, enabling peer-to-peer exchanges without intermediaries. As Bitcoin and other cryptocurrencies gain traction, the debate intensifies over the implications for global economies, regulation, and the balance between privacy and transparency in an increasingly interconnected world.

In the serene ambiance of a traditional Japanese tea house, the ancient practice of Ikebana unfolds with deliberate grace. This centuries-old art form, deeply intertwined with Zen philosophy, transcends mere decoration, offering a spiritual journey that honors the fleeting beauty of nature. Each carefully placed stem and petal symbolizes the impermanence of life, capturing a moment of tranquility in an ever-changing world.

Amid the cacophony of life in the Amazon rainforest, an intricate web of existence thrives under the dense canopy. From the symbiotic relationships between plants and insects to the jaguars stealthily prowling through the underbrush, this rainforest is a battleground of survival and adaptation. Beyond its biological wonders, the Amazon is a critical regulator of the Earth’s atmosphere, absorbing vast amounts of carbon dioxide and releasing oxygen, a process vital for life on the planet. Its destruction, driven by deforestation and exploitation, threatens to disrupt not just a habitat, but the very equilibrium of our global climate.

The ancient city of Petra, carved into the rose-red cliffs of Jordan, stands as a testament to the architectural and engineering prowess of the Nabataeans. This UNESCO World Heritage site, with its intricate facades and hidden tombs, continues to captivate archaeologists and tourists alike.

In a groundbreaking study, scientists have successfully edited the genes of a human embryo using CRISPR technology, sparking ethical debates worldwide. This advancement in genetic engineering holds the promise of eradicating hereditary diseases, but it also raises profound questions about the future of humanity.

The art of parkour, originating in France, involves navigating urban environments through running, jumping, and climbing with fluidity and precision. Practitioners, known as traceurs, view the cityscape as a playground, transforming mundane architecture into opportunities for creative movement.

The Moai statues of Easter Island, with their enigmatic expressions and colossal size, have puzzled historians for centuries. These monolithic figures, carved by the Rapa Nui people, are believed to represent ancestral spirits, yet their exact purpose and method of transportation remain shrouded in mystery.
Loading

0 comments on commit f8219be

Please sign in to comment.