You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The snippet generator example suggests that the offsets produced by snippet.highlighted() can be used for slicing the text of the corresponding document:
Because the ranges are relative to the fragment and not the document, if the snippet is located in a later portion of the document such that the fragment itself is offset, then using these ranges will not retrieve the correct text for highlighting:
# %%fromtantivyimport (
Document,
Index,
SchemaBuilder,
SnippetGenerator,
)
doc_schema=SchemaBuilder().add_text_field("text", stored=True).build()
index=Index(doc_schema)
writer=index.writer()
doc_1=Document()
doc_1.add_text("text", "Teach a man to fish and he will eat for the rest of his life.")
_=writer.add_document(doc_1)
doc_2=Document()
doc_2.add_text(
"text",
"""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. In the first forty days a boy had been with him. But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week. It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast. The sail was patched with flour sacks and, furled, it looked like the flag of permanent defeat.The old man was thin and gaunt with deep wrinkles in the back of his neck. The brown blotches of the benevolent skin cancer the sun brings from its reflection on the tropic sea were on his cheeks. The blotches ran well down the sides of his face and his hands had the deep-creased scars from handling heavy fish on the cords. But none of these scars were fresh. They were as old as erosions in a fishless desert.""",
)
_=writer.add_document(doc_2)
_=writer.commit()
_=writer.wait_merging_threads()
index.reload()
defsearch(query_string: str) ->None:
query=index.parse_query(query_string, ["text"])
searcher=index.searcher()
doc_results=searcher.search(query, limit=10).hitssnippet_generator=SnippetGenerator.create(searcher, query, doc_schema, "text")
for_, doc_addressindoc_results:
doc=searcher.doc(doc_address)
doc_text=doc.get_first("text")
ifnotdoc_text:
raiseValueError("Doc text not found")
snippet=snippet_generator.snippet_from_doc(doc)
print("Snippet HTML: ", snippet.to_html())
forsnippet_rangeinsnippet.highlighted():
print("Highlighted: ", doc_text[snippet_range.start : snippet_range.end])
search("fish")
"""Snippet HTML: Teach a man to <b>fish</b> and he will eat for the rest of his lifeHighlighted: fishSnippet HTML: He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a <b>fish</b>. In the first forty days aHighlighted: fish"""search("heavy fish")
"""Snippet HTML: the tropic sea were on his cheeks. The blotches ran well down the sides of his face and his hands had the deep-creased scars from handling <b>heavy</b> <b>fish</b>Highlighted: orty Highlighted: ays Snippet HTML: Teach a man to <b>fish</b> and he will eat for the rest of his lifeHighlighted: fish"""
The text was updated successfully, but these errors were encountered:
The snippet generator example suggests that the offsets produced by
snippet.highlighted()
can be used for slicing the text of the corresponding document:However, looking at the source implementation of
to_html
, these offsets are relative to the snippet's fragment and not the document text: https://docs.rs/tantivy/latest/src/tantivy/snippet/mod.rs.html#149Because the ranges are relative to the fragment and not the document, if the snippet is located in a later portion of the document such that the fragment itself is offset, then using these ranges will not retrieve the correct text for highlighting:
The text was updated successfully, but these errors were encountered: