Skip to content

Embedding Model Classifier #474

@CodedNil

Description

@CodedNil
  • I have checked the existing issues to avoid duplicates
  • I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue

Is your feature request related to a problem? Please describe

The classifier works fairly well but obviously misses a ton of content, with Unknown category being a significant portion of the total.

Describe the solution you'd like

Recently llm embedding models have become very lightweight and powerful, could a locally ran open source model be used with the torrents name plus other metadata which can output similarity to queries like "tv series" "porn" etc, highest one is chosen.
For example a lot of software is missed as it's only looking for exes etc, but if the file is zipped it will miss that even though a embedding model would easily figure out it's software from a title like "Adobe Photoshop".
This wouldn't replace the existing classifier, only the portions that rely on matching a list of strings.

Describe alternatives you've considered

The classifier could always be improved over time, such as adding more keywords, but that will result in more false positives too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions