Skip to content

Releases: opensanctions/yente

v3.8.4

27 Feb 16:26
Compare
Choose a tag to compare

This is a maintenance release which addresses a potential vulnerability in orjson. It does not change any scoring behaviour.

What's Changed

New Contributors

Full Changelog: v3.8.3...v3.8.4

v3.8.3

05 Feb 16:06
Compare
Choose a tag to compare

This release includes two changes to the match API:

  • Fix a bug where custom datasets that are much smaller than the OpenSanctions data were not scored correctly in search results and therefore didn't return even if they were a good match for the query.
  • Fix the phonetics matcher to cut off results where the raw (levenshtein) edit distance between the proposed match and the query exceeds a threshold.

What's Changed

Full Changelog: v3.8.2...v3.8.3

v3.8.2

16 Jan 16:42
Compare
Choose a tag to compare

This release makes functional changes in response to user feedback, in particular the following:

  • Indexer stability: the indexer process is struggling with interrupted downloads of source data, in part due to the growth of our database (error: "Payload not completed"). We've now switched to a different HTTP client library and added support for HTTP/2 binary streams in an effort to add more stability to this process. We've also disabled the option to conduct multiple indexing jobs at the same time.
  • Phonetic search yields overly broad results: this also results in missed matches due to an abnormally large number of match candidates being generated. We've further limited the way that phonetic search works in an effort to reduce false positives.
  • Default data update checks (YENTE_CRONTAB) are now conducted every two hours.
  • Improved handling of exceptions from the search index.
  • Introduced a new index_stale boolean flag in /catalog for monitoring purposes.

v3.8.0

04 Dec 09:13
Compare
Choose a tag to compare

This release brings a number of improvements:

  • Updated nomenklatura matching model (logic-v1) which now does SWIFT BIC matching and handles names with different tokenization better ("Jean-Paul Sartre" == "JeanPaul Sartre").
  • logic-v1 is now the default algorithm for the match API
  • The match API now supports a topics argument that can be used to match only entities with a particular topic tag (e.g. role.pep, sanction).
  • The /catalog endpoint now carries freshness data, giving the index_version for each dataset, and listing an array of all current and outdated datasets in the index.
  • Various dependency upgrades.

v3.7.3

09 Oct 13:16
Compare
Choose a tag to compare
  • Improvements to matching of company names
  • Disable phonetic matching on names that do not use a Western-style alphabet
  • Fix a race condition in the indexer which can delete the active index

Full Changelog: v3.7.2...v3.7.3

v3.7.2

05 Oct 10:10
Compare
Choose a tag to compare

This release is very focussed on improving the scoring quality of the matcher system. Four areas in particular have seen work:

  • Improvements to the candidate generation system which finds possible matches using ElasticSearch. The candidate generation is the step before the generation of result scores, which pre-selected possible matches from the OpenSanctions database. It has been re-worked to assign higher scores to literal name matches, and to weight the individual terms in a company or person name in more detail (in particular, considering company type information less strongly).
  • We've made the logic-v1 matching implementations for Jaro-Winkler and Metaphone more precise in their ratings, meaning they score higher for close matches but also decrease in score for invalid candidates.
  • We've introduced a method to assign custom weights to the features in the logic-v1 algorithm, allowing API users to fine-tune the scoring system to their needs. More information: https://www.opensanctions.org/docs/api/scoring/#tuning
  • We've re-introcuced the Jaro-Winkler and Soundex implementations from yente 3.6.1 and frozen those in place, providing stability to any adopters.

What's Changed

  • Add schema facet and option to specify which facets are included in the response by @jbothma in #332
  • Bump jellyfish from 1.0.0 to 1.0.1 by @dependabot in #333
  • Bump elasticsearch[async] from 8.9.0 to 8.10.0 by @dependabot in #334
  • Bump fastapi from 0.103.1 to 0.103.2 by @dependabot in #336

New Contributors

Full Changelog: v3.7.0...v3.7.2

v3.7.0

18 Sep 08:50
Compare
Choose a tag to compare

What's Changed

Full Changelog: v3.6.2...v3.7.0

v3.6.2

13 Sep 12:27
Compare
Choose a tag to compare

This is mainly a maintenance release that updates software components. It introduces two new features:

  • The changed_since query parameter on both the /match and /search endpoints constrains results to only entities which have changed since the given ISO timestamp.
  • The API now has CORS access enabled, which is used by the OpenRefine reconciliation API.

What's Changed

New Contributors

Full Changelog: v3.6.1...v3.6.2

v3.6.1

08 Aug 14:04
Compare
Choose a tag to compare

This version includes a lot of small changes based on customer feedback. In particular:

  • Introduce an exclude_dataset query parameter to /match and /search to remove a single dataset from results.
  • Make the maximal result count of /match configurable via the server variable YENTE_MAX_MATCHES
  • The index freshness check now tests if the new index has the given alias assigned, not just if it exists. This should handle partial indexing more gracefully.

What's Changed

Full Changelog: v3.6.0...v3.6.1

v.3.6.0

24 Jul 09:14
Compare
Choose a tag to compare

This release includes improved metadata handling for datasets, introduces some new entity types in the followthemoney data model and allows for less performance-heavy matching queries using the fuzzy flag. In detail:

  • We've introduced several new entity types in the followthemoney data model which will be used to provide more detailed information regarding politically exposed persons. We advise all users to update the API now so that the new entity types will be reflected correctly.
  • Using the /match API on a very large dataset can cause heavy load on the ElasticSearch index because of the Levenshtein-based fuzzy matching it uses. In this version, we've introduced a fuzzy= query parameter, which lets users disable that functionality. Please note that this doesn't affect the scores generated by the API; but it may lead to less recall on very specific queries.

What's Changed

Full Changelog: v3.5.0...v3.6.0