|
22 | 22 | * `txt`: `--path` is path to txt |
23 | 23 | * `url`: `--path` must be a valid http(s) link |
24 | 24 | * `anki`: must be set: `--anki_profile`. Optional: `--anki_deck`, |
25 | | - `--anki_notetype`, `--anki_mode`. See in loader specific arguments |
| 25 | + `--anki_notetype`, `--anki_fields`. See in loader specific arguments |
26 | 26 | below for details. |
27 | 27 | * `string`: no other parameters needed, will provide a field where |
28 | 28 | you must type or paste the string |
|
45 | 45 |
|
46 | 46 | --- |
47 | 47 |
|
48 | | -* `--modelname`: str, default `"openrouter/anthropic/claude-3.5-sonnet"` |
| 48 | +* `--modelname`: str, default `"openrouter/anthropic/claude-3.5-sonnet:beta"` |
49 | 49 | * Keep in mind that given that the default backend used is litellm |
50 | 50 | the part of modelname before the slash (/) is the backend name (also called provider). |
51 | 51 | If the backend is 'testing/' then a fake LLM will be used |
|
110 | 110 | if contains `hyde` but modelname contains `testing` then `hyde` will |
111 | 111 | be removed. |
112 | 112 |
|
113 | | -* `--query_eval_modelname`: str, default `"openrouter/anthropic/claude-3.5-sonnet"` |
| 113 | +* `--query_eval_modelname`: str, default `"openrouter/anthropic/claude-3.5-sonnet:beta"` |
114 | 114 | * Cheaper and quicker model than modelname. Used for intermediate |
115 | 115 | steps in the RAG, not used in other tasks. |
116 | 116 | If the value is not part of the model list of litellm, will use |
|
179 | 179 | can be used for example to send notification on your phone |
180 | 180 | using ntfy.sh to get summaries. |
181 | 181 |
|
182 | | -* `--chat_memory`: bool, default `True` |
183 | | - * if True, will remember the messages across a given chat exchange. |
| 182 | +* `--memoryless`: bool, default `False` |
| 183 | + * if False, will remember the messages across a given chat exchange. |
184 | 184 | Disabled if using a testing model. |
185 | 185 |
|
186 | 186 | * `--disable_llm_cache`: bool, default `False` |
|
220 | 220 | * `--import_mode`: bool, default `False` |
221 | 221 | * if True, will return the answer from query instead of printing it |
222 | 222 |
|
| 223 | +* `--disable_md_printing`: bool, default `True` |
| 224 | + * if True, instead of using rich to display some information, default to simpler colored prints. |
| 225 | + |
223 | 226 | * `--cli_kwargs`: dict, optional |
224 | 227 | * Any remaining keyword argument will be parsed as a loader |
225 | 228 | specific argument ((see below)[#loader-specific-arguments]). |
|
243 | 246 | e.g. `science::physics::freshman_year::lesson1` |
244 | 247 | * `--anki_notetype`: str |
245 | 248 | * If it's part of the card's notetype, that notetype will be kept. |
246 | | - Case insensitive. |
247 | | - |
| 249 | + Case insensitive. Note that suspended cards are always ignored. |
248 | 250 | * `--anki_fields`: List[str] |
249 | 251 | * List of fields to keep |
250 | | -* `--anki_mode`: str |
251 | | - * any of `window`, `concatenate`, `singlecard`: (or _ separated |
252 | | - value like `concatenate_window`). By default `singlecard` |
253 | | - is used. |
254 | | - * Modes: |
255 | | - * `singlecard`: 1 document is 1 anki card. |
256 | | - * `window`: 1 documents is 5 anki note, overlapping (so |
257 | | - 10 anki notes will result in 5 documents) |
258 | | - * `concatenate`: 1 document is all anki notes concatenated as a |
259 | | - single wall of text then split like any long document. |
260 | | - |
261 | | - Whichever you choose, you can later filter out documents by metadata |
262 | | - filtering over the `anki_mode` key. |
263 | 252 |
|
264 | 253 | * `--audio_backend`: str |
265 | 254 | * either 'whisper' or 'deepgram' to transcribe audio. |
|
381 | 370 | BeautifulSoup. Useful to decode html stored in .js files. |
382 | 371 | Do tell me if you want more of this. |
383 | 372 |
|
384 | | -* `--min_lang_prob`: float, default `0.5` |
| 373 | +* `--docheck_min_lang_prob`: float, default `0.5` |
385 | 374 | * float between 0 and 1 that sets the threshold under which to |
386 | 375 | consider a document invalid if the estimation of |
387 | 376 | fasttext's langdetect of any language is below that value. |
388 | 377 | For example, setting it to 0.9 means that only documents that |
389 | 378 | fasttext thinks have at least 90% probability of being a |
390 | 379 | language are valid. |
| 380 | +* `--docheck_min_token`: int, default `50` |
| 381 | + * if we find less that that many token in a document, crash. |
| 382 | +* `--docheck_max_token`: int, default `1_000_000` |
| 383 | + * if we find more that that many token in a document, crash. |
| 384 | +* `--docheck_max_lines`: int, default `100_000` |
| 385 | + * if we find more that that many lines in a document, crash. |
391 | 386 |
|
392 | 387 | * `--source_tag`: str, default `None` |
393 | 388 | * a string that will be added to the document metadata at the |
|
401 | 396 | # Runtime flags |
402 | 397 |
|
403 | 398 | * `DOCTOOLS_TYPECHECKING` |
404 | | - * Setting for runtime type checking. Default value is `disabled`. |
405 | | - * Possible values: |
406 | | - * `disabled`: disable typechecking |
407 | | - * `warn`: print a red warning if a typechecking fails |
408 | | - * `crash`: crash if a typechecking fails in any function |
| 399 | + * Setting for runtime type checking. Default value is `warn`. * Possible values: |
| 400 | + The typing is checked using [beartype](https://beartype.readthedocs.io/en/latest/) so shouldn't slow down the runtime. |
| 401 | + * `disabled`: disable typechecking. |
| 402 | + * `warn`: print a red warning if a typechecking fails. |
| 403 | + * `crash`: crash if a typechecking fails in any function. |
0 commit comments