Trallie (“Transfer learning for information extraction”) boosts IE for search among textual asset descriptions by doing away with costly human annotation, instead leveraging LLM capabilities to follow NL guidelines, understand labels, and manipulate NL like it does for code.
Problem: Natural language descriptions of assets and resources are here to stay, both as legacy or as flexible catch-alls. Clustering and categorizing them to run structured search queries traditionally requires information extraction (IE), with some partial solutions offered by RAG and dense embedding matching. This often is bottlenecked by costly human annotation, if only to provide few-shot examples of categories.
Ambition: Trallie brings transfer learning and world understanding afforded by LLM to make information extraction agile. We deliver multilingual, IE-fine-tuned checkpoints of various open model architectures; and for reproducibility, our full fine-tuning recipe including prompt templates.
Impact: Transfer learning and natural language input imply impact on legacy and low-resource scenarios, improving discoverability of hidden asset collections, plurality of sources through easier access to search tools, improved trust and privacy.
Team: At Pi School, our experience of rapid prototyping in AI, acquired over >100 AI projects, gives us an advantage in exploiting the rapidly moving SOTA.