Skip to content

feat: develop advanced input serialization #20

@DelusionalSimon

Description

@DelusionalSimon

Initially we wanted to use serialized JSON input for the LLM calls but found that even heavily pruned JSON files were way too big to fit in the token window of the selected model. See commit 497378f for the function used before its deletion.

To improve the stability of proChariot we should look into alternative methods of serializations or way to structure a JSON or other serialized data file to keep it small enough to feed to the LLM

Tasks

  • Find the minimal set of data to keep
  • Is it worth it to do batching into contigs or other structures?
  • Explore different possible data structures
    • Each row as an entry
    • Data prefixes
    • Alternatives to JSON
    • Check size against a maximum
  • Test thoroughly and weigh against the .tsv approach
    • Using big and small genomes to make sure the size is kept low enough
  • Update system prompt to improve fidelity

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions