Skip to content

Conversation

@IgorSusmelj
Copy link
Contributor

@IgorSusmelj IgorSusmelj commented Jan 9, 2026

What has changed and why?

  • Backend part of Allow filtering for data without labels #374
  • Builds on Igor lig 7501 refactor only #383
  • Introduces a shared AnnotationFilter (annotation_label_ids + include_no_annotations) to centralize annotation filtering
  • Applies the shared filter across image, frame, and video count resolvers (video uses frame annotations)
  • Keeps “No annotations” counts in count endpoints and updates related backend tests
  • Updates schema to expose the new filter parameter where applicable

Note that this PR introduces already the "No annotations" class that will be also shipped to the frontend and will be available for filtering. But with this PR we DO NOT yet implement the frontend logic.

How has it been tested?

  • Update tests

Did you update CHANGELOG.md?

  • Yes
  • Not needed (internal change)

@IgorSusmelj
Copy link
Contributor Author

/review


/** Total number of unique labels in the dataset */
totalLabels: 71,
totalLabels: 72,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to increase since we now also have "No Annotations"


# Define the path to the dataset directory
dataset_path = env.path("EXAMPLES_DATASET_PATH", "/path/to/your/dataset")
dataset_path = env.path("DATASET_PATH", "../../../../yolo-data-example/dataset.yaml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo changes to this file. Instead, you should use the .env local file.

"""Returns the number of samples in the collection without annotations."""
total_no_annotations_query = (
select(func.count())
.select_from(ImageTable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This select should be from SampleTable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wait, that is strange. We use images everywhere in this file, and the filters have width and height filters. Is count_annotations_by_collection function specific for Images?


def _resolve_annotation_label_ids(
session: Session, annotation_label_names: list[str] | None
) -> list[UUID] | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we allow None? Isn't empty list enough?

Comment on lines +30 to +31
annotation_label_ids: list[UUID] | None = None
include_unannotated_samples: bool | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring that explains what is the meaning and how are the conditions combined.

include_unannotated_samples: bool | None,
preserve_empty_label_ids: bool = False,
) -> AnnotationFilter | None:
"""Build an AnnotationFilter from raw filter values."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of preserve_empty_label_ids? Please add a docstring describing the args.

def apply_to_samples(self, query: QueryType, sample_id_column: Any) -> QueryType:
"""Apply annotation filters using the provided sample ID column."""
if self.annotation_label_ids is None:
return query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So include_unannotated_samples=False is ignored if annotation_label_ids is None?

from lightly_studio.type_definitions import QueryType


class AnnotationFilter(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to a separate file.

"""
# TODO(Igor, 01/2026): Use _CountFilters as the input argument to simplify this API.
total_counts = _get_total_counts(session=session, collection_id=collection_id)
filtered_label_ids = _resolve_annotation_label_ids(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a wrong suggestion to store the ids, names are more natural since the request comes from the user. Can you revert it? Or perhaps more practically, add a todo and do it in a follow-up?

select(AnnotationBaseTable.parent_sample_id).select_from(AnnotationBaseTable).distinct()
)

if self.annotation_label_ids is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always True - we exit early above for the opposite condition. Let's remove the if.

.distinct()
)

if self.annotation_label_ids is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above - always True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants