Skip to content

As a user, I want to query for documents where a specific search field exists in the documentΒ #406

@jordanpadams

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Data User

πŸ’ͺ Motivation

...so that I can find products that contain (or do not contain) a specific field value in the document.

πŸ“– Additional Details

The exists operator allows users to query for documents based on the presence (or absence) of specific fields in the document.

Syntax:

  • Check if exact field exists: pds:Target/pds:name exists
  • Check if field does NOT exist: not (pds:Investigation/pds:stop_date_time exists)
  • Regex pattern matching: "pds:Target.*" exists - matches any field starting with pds:Target
  • Complex regex patterns: ".*Bounding_Coordinates.*" exists

Implementation Notes:

  • For exact field names (unquoted), the query checks for that specific field in the document
  • For quoted strings, the value is treated as a Java regex pattern and matched against all known field names from the OpenSearch mapping
  • When using regex patterns, all matching field names are retrieved and an existence check is created for each
  • Returns error if regex pattern matches no known fields
  • exists returns true only if the field is present AND has a non-null/non-empty value
  • For multi-valued fields, exists returns true if at least one value is present

Use Case Examples:

  • Find all products that have bounding box coordinates: cart:Bounding_Coordinates/cart:north_bounding_coordinate exists
  • Find investigations that are still ongoing (no stop date): not (pds:Investigation/pds:stop_date_time exists)
  • Find products with any target field using regex: "pds:Target.*" exists
  • Find products missing author information: not (pds:Citation_Information/pds:author_list exists)
  • Complex pattern matching: ".*Coordinates.*" exists finds all fields containing "Coordinates"

Related: See #402 for related query operator issues.

Acceptance Criteria

Given I am querying the registry API
When I use field_name exists with an exact field name in my query
Then the API returns only documents where the specified field exists and has a non-null/non-empty value

Given I am querying the registry API
When I use not (field_name exists) in my query
Then the API returns only documents where the specified field does not exist or is null/empty

Given I am querying for a multi-valued field
When I use field_name exists and at least one value is present
Then the API returns that document in the results

Given I want to check for fields matching a regex pattern
When I use "regex_pattern" exists with a quoted string
Then the API:

  • Retrieves all field names from the OpenSearch mapping
  • Matches the regex against all known field names
  • Returns documents where any matched field exists with a non-null/non-empty value
  • Returns an error if the regex matches no known field names

Given I combine exists with other query operators
When I use queries like pds:Target/pds:name exists and pds:Target/pds:type eq "Planet"
Then the API correctly applies both conditions and returns matching documents

βš™οΈ Engineering Details

Implementation completed in PR #700:

  • Updated ANTLR4 lexer grammar (Search.g4) to support postfix EXISTS keyword with both FIELD and STRINGVAL tokens
  • Updated Antlr4SearchListener to handle existence checks:
    • Exact field name matching for unquoted field names
    • Java regex pattern matching for quoted strings
    • Retrieves mapping from OpenSearch via ProductsController.productPropertiesList() for regex matching
    • Generates OpenSearch ExistsQuery for each matched field
  • Proper handling of NOT for negation
  • Error handling when regex matches no fields

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🧊 Icebox

Status

Todo

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions