From 694a2163966241e7ff07eb0b28822082fcc0ed1a Mon Sep 17 00:00:00 2001 From: lemon24 Date: Mon, 24 Jun 2024 10:50:56 +0300 Subject: [PATCH] Dev notes for #340. --- docs/dev.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/dev.rst b/docs/dev.rst index c05f58e7..8f801658 100644 --- a/docs/dev.rst +++ b/docs/dev.rst @@ -721,6 +721,8 @@ Duplicate entries Duplicate entries are mainly handled by the :mod:`reader.entry_dedupe` plugin. * Using MinHash to speed up similarity checks (maybe): https://gist.github.com/lemon24/b9af5ade919713406bda9603847d32e5 +* Discussion of unifying "on-line" dedupe (after an entry is added/updated), + and "on-demand" dedupe (backfill): :issue:`340`. However, it is also possible for a feed to have two entries with the same id – yes, even though in most (if not all) formats,