-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporting indexing events to SDR based on solr responses #1366
Conversation
t0 = context.clipboard[:benchmark_start_time] | ||
t1 = Time.now | ||
|
||
logger.debug('geo_config.rb') { "Processed #{context.output_hash['id']} (#{t1 - t0}s)" } | ||
SdrEvents.report_indexing_success(record.druid, target: settings['purl_fetcher.target']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to writer
@@ -170,7 +170,6 @@ def geoserver_url(record) | |||
# delete records form the index | |||
context.output_hash['id'] = ["stanford-#{druid}"] | |||
|
|||
SdrEvents.report_indexing_deleted(druid, target: settings['purl_fetcher.target']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to writer
# Collection of Traject contexts to be sent to solr | ||
class Batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding what the responsibility of this class is. It seems like it does several things, which makes me wonder if it should be several classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially extracted it because I needed an underlying data structure that mapped desired actions to documents. That data structure could then be used to generate the JSON to solr as well as to report about the status of the actions. I could instead extract the reporting-related stuff to methods directly on the writer that take a Batch
, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it feels to me like the reporting stuff should be separate if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to the SdrEvents
class.
# and data is either the doc id or the full doc hash. Druid is empty for | ||
# non-SDR content. | ||
def actions | ||
@contexts.map do |context| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be memoized? It's called in generate_json
and then again on the same data in report_success
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"delete: #{JSON.generate(data)}" | ||
when :add | ||
"add: #{JSON.generate(doc: data)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't your code, but these keys should be json strings. https://solr.apache.org/guide/8_0/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands
So: JSON.generate(add: { doc: data })
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this! Much more intuitive to just build it as a regular hash instead of concat-ing a string. I'll change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it's Solr's own specific flavor of JSON which allows duplicate keys, so I think you'll still have to concat some strings to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added quotes
6f400ca
to
f48aac7
Compare
Closes #1321
This refactors the SolrBetterJsonWriter so that there's a data structure that connects the desired actions (add/update vs. delete) to each document/ID. We then use that data to report the status of each record after communicating with solr, so that we can dispatch one message per druid indicating if the addition/deletion was successful or not.
The calls that currently do this in the indexer config are removed in favor of doing it in the writer, since it's more accurate to report based on the actual Solr response.