Skip to content

Commit 1c8c579

Browse files
2 parents 5939e3c + e785e4b commit 1c8c579

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ text = "Your text to be split..."
4848
chunks = TextChunker.split(text)
4949
```
5050

51-
This will chunk up your text using the default parameters - a chunk size of `1000`, chunk overlap of `200`, format of :`plaintext` and using the `RecursiveChunk` strategy.
51+
This will chunk up your text using the default parameters - a chunk size of `1000`, chunk overlap of `200`, format of `:plaintext` and using the `RecursiveChunk` strategy.
5252

5353
The split method returns `Chunks` of your text. These chunks include the start and end bytes of each chunk.
5454

@@ -66,7 +66,7 @@ If you wish to adjust these parameters, configuration can optionally be passed v
6666

6767
- `chunk_size` - The approximate target chunk size, as measured per code points. This means that both `a` and `👻` count as one. Chunks will not exceed this maximum, but may sometimes be smaller. **Important note** This means that graphemes *may* be split. For example, `👩‍🚒` may be split into `👩,🚒` or not depending on the split boundary.
6868
- `chunk_overlap` - The contextual overlap between chunks, as measured per code point. Overlap is *not* guaranteed; again this should be treated as a maximum. The size of an individual overlap will depend on the semantics of the text being split.
69-
- `format` (informs separator selection). Because we are trying to preserve meaning between the chunks, the format of the text we are splitting is important. It's important to split newlines in plain text; it's important to split `###` headings in markdown.
69+
- `format` - What informs separator selection. Because we are trying to preserve meaning between the chunks, the format of the text we are splitting is important. It's important to split newlines in plain text; it's important to split `###` headings in markdown.
7070

7171
```elixir
7272
text = """
@@ -102,7 +102,7 @@ iex(10)> TextChunker.split(text)
102102
]
103103

104104
text = "This is a sample text. It will be split into properly-sized chunks using the TextChunker library."
105-
opts = [chunk_size: 50, chunk_overlap: 5, format: :plaintext, strategy: &TextChunker.Strategies.RecursiveChunk.split/2]
105+
opts = [chunk_size: 50, chunk_overlap: 5, format: :plaintext, strategy: TextChunker.Strategies.RecursiveChunk]
106106

107107
iex(10)> TextChunker.split(text, opts)
108108

lib/text_chunker.ex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ defmodule TextChunker do
1313
@default_opts [
1414
chunk_size: 2000,
1515
chunk_overlap: 200,
16-
strategy: &RecursiveChunk.split/2,
16+
strategy: RecursiveChunk,
1717
format: :plaintext
1818
]
1919

@@ -43,6 +43,6 @@ defmodule TextChunker do
4343
def split(text, opts \\ []) do
4444
opts = Keyword.merge(@default_opts, opts)
4545

46-
opts[:strategy].(text, opts)
46+
opts[:strategy].split(text, opts)
4747
end
4848
end

0 commit comments

Comments
 (0)