Skip to content

Commit

Permalink
Added tests for text
Browse files Browse the repository at this point in the history
  • Loading branch information
Ansh5461 committed Sep 1, 2023
1 parent bc70e9a commit 6e85a91
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 2 deletions.
6 changes: 4 additions & 2 deletions querent/ingestors/texts/text_ingestor.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ async def create(

class TextIngestor(BaseIngestor):
def __init__(self, processors: List[AsyncProcessor]):
super().__init__(IngestorBackend.TEXT)
self.processors = processors
super.__init__(IngestorBackend.TEXT)

async def ingest(
self, poll_function: AsyncGenerator[CollectedBytes, None]
Expand Down Expand Up @@ -56,6 +56,7 @@ async def ingest(
yield text

except Exception as e:
print(e)
yield []

async def extract_and_process_text(
Expand All @@ -64,8 +65,9 @@ async def extract_and_process_text(
text = await self.extract_text_from_file(collected_bytes)
return await self.process_data(text=text)

async def extract_text_from_file(collected_bytes: CollectedBytes) -> str:
async def extract_text_from_file(self, collected_bytes: CollectedBytes) -> str:
text = collected_bytes.data.decode("utf-8")
print(text)
return text

async def process_data(self, text: str) -> List[str]:
Expand Down
12 changes: 12 additions & 0 deletions tests/data/text/asyncgenerator.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Asynchronous generator functions are part of Python version 3.6, they were introduced by PEP-525. Asynchronous generator
functions are much like regular asynchronous functions except that they contain the yield keyword in the function body.
Which in turn, makes them much like regular generators, except for that you can use the await keyword in there as well.

When calling an asynchronous generator function, the result that is returned is an asynchronous generator object. In
contrast to calling regular asynchronous functions which return a coroutine object.
Since the asynchronous generator is, no surprise, asynchronous you are allowed to use the await keyword inside the
asynchronous generator.

You can use this, for example, to send out HTTP requests in the asynchronous generator and yielding the response.

Besides asynchronous iterables you can use asynchronous generators with the async for-loop as well.
41 changes: 41 additions & 0 deletions tests/test_text_ingestor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import asyncio
from pathlib import Path
from querent.collectors.fs.fs_collector import FSCollectorFactory
from querent.config.collector_config import FSCollectorConfig
from querent.common.uri import Uri
from querent.ingestors.ingestor_manager import IngestorFactoryManager
import pytest


@pytest.mark.asyncio
async def test_collect_and_ingest_txt():
# Set up the collector
collector_factory = FSCollectorFactory()
uri = Uri("file://" + str(Path("./tests/data/text/").resolve()))
config = FSCollectorConfig(root_path=uri.path)
collector = collector_factory.resolve(uri, config)

# Set up the ingestor
ingestor_factory_manager = IngestorFactoryManager()
ingestor_factory = await ingestor_factory_manager.get_factory(
"txt"
) # Notice the use of await here
ingestor = await ingestor_factory.create("txt", [])

# Collect and ingest the PDF
ingested_call = ingestor.ingest(collector.poll())
counter = 0

async def poll_and_print():
counter = 0
async for ingested in ingested_call:
assert ingested is not None
if len(ingested) == 0:
counter += 1
assert counter == 0

await poll_and_print() # Notice the use of await here


if __name__ == "__main__":
asyncio.run(test_collect_and_ingest_txt())

0 comments on commit 6e85a91

Please sign in to comment.