Reporting indexing events to SDR based on solr responses #1366

thatbudakguy · 2024-02-27T22:56:58Z

This refactors the SolrBetterJsonWriter so that there's a data structure that connects the desired actions (add/update vs. delete) to each document/ID. We then use that data to report the status of each record after communicating with solr, so that we can dispatch one message per druid indicating if the addition/deletion was successful or not.

The calls that currently do this in the indexer config are removed in favor of doing it in the writer, since it's more accurate to report based on the actual Solr response.

thatbudakguy · 2024-02-28T17:57:06Z

lib/traject/config/geo_config.rb

  t0 = context.clipboard[:benchmark_start_time]
  t1 = Time.now

  logger.debug('geo_config.rb') { "Processed #{context.output_hash['id']} (#{t1 - t0}s)" }
-  SdrEvents.report_indexing_success(record.druid, target: settings['purl_fetcher.target'])


moved to writer

thatbudakguy · 2024-02-28T17:57:27Z

lib/traject/config/geo_config.rb

@@ -170,7 +170,6 @@ def geoserver_url(record)
  # delete records form the index
  context.output_hash['id'] = ["stanford-#{druid}"]

-  SdrEvents.report_indexing_deleted(druid, target: settings['purl_fetcher.target'])


moved to writer

jcoyne · 2024-02-29T13:26:12Z

lib/traject/writers/solr_better_json_writer.rb

+  # Collection of Traject contexts to be sent to solr
+  class Batch


I'm having trouble understanding what the responsibility of this class is. It seems like it does several things, which makes me wonder if it should be several classes.

I initially extracted it because I needed an underlying data structure that mapped desired actions to documents. That data structure could then be used to generate the JSON to solr as well as to report about the status of the actions. I could instead extract the reporting-related stuff to methods directly on the writer that take a Batch, I guess?

Yeah, it feels to me like the reporting stuff should be separate if possible.

Moved it to the SdrEvents class.

jcoyne · 2024-02-29T13:27:30Z

lib/traject/writers/solr_better_json_writer.rb

+    # and data is either the doc id or the full doc hash. Druid is empty for
+    # non-SDR content.
+    def actions
+      @contexts.map do |context|


Should this be memoized? It's called in generate_json and then again on the same data in report_success, right?

Yeah, that's a good idea

jcoyne · 2024-02-29T13:52:45Z

lib/traject/writers/solr_better_json_writer.rb

+          "delete: #{JSON.generate(data)}"
+        when :add
+          "add: #{JSON.generate(doc: data)}"


I know this isn't your code, but these keys should be json strings. https://solr.apache.org/guide/8_0/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands
So: JSON.generate(add: { doc: data })

I wondered about this! Much more intuitive to just build it as a regular hash instead of concat-ing a string. I'll change it

Unfortunately, it's Solr's own specific flavor of JSON which allows duplicate keys, so I think you'll still have to concat some strings to do it.

added quotes

Closes #1321

cbeer · 2024-03-02T02:22:55Z

This broke indexing from FOLIO. See #1383.

I've reverted it in #1384

Refactor SolrBetterJsonWriter for easier management of record batches

c3bcde5

thatbudakguy changed the title ~~report via traject writer~~ Reporting indexing events to SDR based on solr responses Feb 27, 2024

thatbudakguy marked this pull request as ready for review February 27, 2024 23:00

thatbudakguy commented Feb 28, 2024

View reviewed changes

jcoyne requested changes Feb 29, 2024

View reviewed changes

Shift some indexing reporting to the SolrBetterJsonWriter

f48aac7

Closes #1321

thatbudakguy force-pushed the report-via-traject-writer branch from 6f400ca to f48aac7 Compare February 29, 2024 23:18

thatbudakguy requested a review from jcoyne February 29, 2024 23:19

jcoyne approved these changes Mar 1, 2024

View reviewed changes

jcoyne merged commit c5e47a4 into main Mar 1, 2024
2 checks passed

jcoyne deleted the report-via-traject-writer branch March 1, 2024 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reporting indexing events to SDR based on solr responses #1366

Reporting indexing events to SDR based on solr responses #1366

thatbudakguy commented Feb 27, 2024 •

edited

Loading

thatbudakguy Feb 28, 2024

thatbudakguy Feb 28, 2024

jcoyne Feb 29, 2024

thatbudakguy Feb 29, 2024

jcoyne Feb 29, 2024

thatbudakguy Feb 29, 2024

jcoyne Feb 29, 2024

thatbudakguy Feb 29, 2024

thatbudakguy Feb 29, 2024

jcoyne Feb 29, 2024

thatbudakguy Feb 29, 2024

jcoyne Feb 29, 2024

thatbudakguy Feb 29, 2024

cbeer commented Mar 2, 2024

		# Collection of Traject contexts to be sent to solr
		class Batch

Reporting indexing events to SDR based on solr responses #1366

Reporting indexing events to SDR based on solr responses #1366

Conversation

thatbudakguy commented Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbeer commented Mar 2, 2024

thatbudakguy commented Feb 27, 2024 •

edited

Loading