Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audrey/add recency decay #290

Open
wants to merge 3 commits into
base: rc/2024-10
Choose a base branch
from
Open

Conversation

aulorbe
Copy link
Collaborator

@aulorbe aulorbe commented Sep 30, 2024

Problem

This is a (quality of life) QoL feature I have proactively built into our upcoming RC, on the client side. We currently have no utilities that empower customers to customize their /rerank results downstream of the API, which I'm anticipating will cause friction with heavy users of /rerank.

Solution

Reranking results by recency is a super popular, common, and useful feature for all of our clients to have. This PR can serve as the POC for this feature's integration into our other clients.

Overview

  • A developer hits the /rerank endpoint and gets results back
  • This same dev wants to tweak the endpoint's results before sending them to their application's user -- this dev wants the results to have some notion of "recency"

Use cases

  • Let's say you're an eCommerce developer and you want to boost the ranking of results based on seasonality. You can make sure that a search for "best football tshirts" returns not only the most relevant results, but also the NEWEST results to your shop (that you just uploaded b/c it's Fall). You can crank up the recency decay to make sure older items are pushed farther down the results list compared to newer items, even if both items are equally relevant to the query

Toggles available to the user (passed in via options ):

  • decay true/false

  • decayWeight (default 0.5): The magnitude of the decay's impact on document scores.

    • Increasing this value:
      • Effect: Decay has a stronger impact on document scores; older docs are heavily penalized.
      • Use case: You want to more strongly prioritize recency.
    • Decreasing this value:
      • Effect: Decay has a weaker impact on document scores; older documents have a better chance at retaining their original score/ranking.
      • Use case: You want to prioritize recency less.
  • decayThreshold (default 30 days): Time period (in days) after which the decay starts significantly affecting. If a document is within the threshold, the decay will scale based on how old the document is. If it is older than the threshold, the document is treated as fully decayed (normalized decay of 1).

    • Increasing this value:
      • Effect: Recency decay is more gradual; documents remain relevant for a longer time.
      • Use case: When freshness/recency is less important (e.g. product reviews)
    • Decreasing this value:
      • Effect: Recency decay is more abrupt; documents lose relevance faster.
      • Use case: When freshness/recency is more important (e.g. news articles).

Future toggles

It'd be ideal in the future to allow the user to choose between common recency decay functions other than additive, e.g. multiplicative, exponential, and log.

User requirements

The user must pass in documents that contain a timestamp field that is a stringified timestamp, to the millisecond, e.g. "2010-08-10 00:03:21"

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

New integration and unit tests pass.

Still need to do:

  • README updates
  • Example in /rerank docstring of how to pass in options containing recency stuff

Further reading:


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant