Skip to content
Ere Maijala edited this page Oct 11, 2022 · 12 revisions

Command Line Reference

N.B. This reference is for the 1.x version of RecordManager. Newer versions include a command reference with the ./console command as the entry point. While command names as some parameter names have changed, they still resemble the old functions.

All the command line programs should report their activities and end with a summary. If a program ends abruptly without an error or success message, see the PHP error log for any errors, or try to run the program with the --verbose parameter to see more verbose output on its activity.

Configuration parameters can be changed on the fly with the --config switch, e.g. php manage.php --func=updatesolr --config.Solr.update_url=http://localhost:8983/solr/update

Import

The import program can be used to import files containing metadata records into the RecordManager database.

Command to run: php import.php --file=filename --source=xyz ...

Parameters:

Name Description
file Name of the file to import
source The data source id (section name or idPrefix in datasources.ini)
verbose Print verbose output (for debugging purposes)
lockfile If specified, RecordManager will use the lock file to avoid running multiple tasks in parallel. Useful when an impport task is scheduled regularly, and there's no certainty that the previous task is completed before the next one is scheduled.

Harvest

The harvest program can be used to run harvesting from OAI-PMH data sources.

Command to run: php harvest.php --source=xyz ...

Parameters:

Name Description
source The data source id (section name or idPrefix in datasources.ini) or * if harvesting is to be run for all data sources that have url defined. You can also enter multiple sources by separating them with a comma.
from Optional parameter to override stored start date of harvesting
until Optional parameter to override stored end date of harvesting
verbose Print verbose output (for debugging purposes)
override Optional parameter to start harvesting using a specific resumptionToken. Note that resumptionTokens might have a limited lifetime. Must not be used with --source=* parameter.
reharvest[=date] Perform a reharvesting, that will mark deleted all records that are not received during the harvesting. Also implies --all. Useful when an OAI-PMH data source or harvesting settings have changed so that a different set of records is available. You can also specify the date and time to be used as baseline instead of harvesting start moment e.g. if harvesting was previously interrupted.
lockfile If specified, RecordManager will use the lock file to avoid running multiple tasks in parallel. Useful when e.g. harvesting task is scheduled regularly, and there's no certainty that the previous task is completed before the next one is scheduled.

Management

The management program can be used to run renormalization, deduplication and Solr update process.

Command to run: php manage.php --func=... [function-specific parameters, see below]

Functions (using the --func parameter):

Name Description
renormalize Run normalization again for the original data
deduplicate Run deduplication for records that are marked for processing by harvest, import, renormalization or deletion
updatesolr Update the Solr index with any records that don't need further processing
dump Display the contents of a single record
markdeleted Mark all records from the given data source deleted. This is a soft delete that can be gracefully handled by a subsequent solrupdate.
deletesource Delete all records of the given data source from the database. Read Deleting a data source before using this function.
deletesolr Delete all records of the given data source from the Solr index. Use only if deduplication is not used the for the data source. Otherwise see Deleting a data source
optimizesolr Optimize the Solr index. Do not use unless you know what you're doing (see Solr Documentation).
count Count the number of distinct occurrences of a string in a Solr field. Specify the field to count with the --field parameter. You can also specify --mapped so that the counting process runs the records through any mapping procedures.
checkdedup Check dedup records and verify they are correct (no dangling references etc.)
comparesolr Like updatesolr, but instead of updating the index, compare the records to be updated to what already is in the index and report all differences. Useful for verifying changes when a record driver, mapping or transformation has been changed.
dumpsolr Like updatesolr, but dumps the Solr update requests to files instead of sending them to Solr. Use with --dumpprefix to define a file name prefix and/or directory for the dump files (see below). This can be useful e.g. when testing different Solr configurations and the exact same index content is needed. See below for an example command to upload the dump files to Solr.
purgedeleted Purge deleted records from the database. Deleted records are normally kept in the database so that deletions can be sent to Solr properly. A large number of them may accumulate during operation, so purging them may improve performance. Make sure that any Solr index is up to date before purging deleted records.
markdedup Mark all records for deduplication. Just like deduplicate with --all parameter, but doesn't actually start the deduplication process. Useful for situations where deduplication is scheduled to run regularly and some records just need to be queued for deduplication.
markforupdate Mark records of the given data source to be indexed in Solr (updates the timestamp of the records). Also --single can be used to mark a single record.
checksolr Check the Solr index for orphaned records that no longer exist in RecordManager or are marked deleted.

Parameters:

Name Description
source The data source id (section name or idPrefix in datasources.ini) if processing is to be run for a single data source only (mandatory for deletesource, deletesolr, markdeleted and markforupdate). Multiple data sources can be specified as a comma-separated list. updatesolr supports also exclusion rules prefixed with a minus sign. Exclusion rules may be data source names or regular expressions. Regular expressions are enclosed in slashes, e.g. --source=-/^lib1.*/
all deduplicate: Run deduplication for all records regardless of whether they are marked as waiting for processing.
all markdedup: Mark all records for deduplication.
all updatesolr: Import all records regardless of last update date
from updatesolr: override the stored date of last Solr index update (any format that strtotime understands or yyyy-mm-dd to avoid any ambiguity). Note: record timestamps are stored in UTC time zone.
single deduplicate, updatesolr, dump, markforupdate: Process only a single record, useful for testing. Include prefix in the ID (e.g. --single=samplesource.123)
verbose Print verbose output (for debugging purposes)
nocommit updatesolr: Do not ask Solr to commit changes at the end (useful especially for debugging purposes as commit takes some time)
field count: Field to analyze with the count function
mapped count: If specified, any mappings are applied to the records before counting
force deletesolr: force data source deletion even when deduplication is enabled
config.section.name=value Temporarily override a configuration setting
lockfile If specified, RecordManager will use the lock file to avoid running multiple tasks in parallel. Useful when e.g. index updates are scheduled regularly, and there's no certainty that the previous update is completed before the next one is scheduled.
comparelog comparesolr: A log file to use when comparing records with the ones in Solr
dumpprefix dumpsolr: File name prefix (may include a path) to be used when dumping Solr update requests to files. E.g. --dumpprefix=dump/solr would create files dump/solr-1.json, dump/solr-2.json etc.
daystokeep purgedeleted: How many last days of deleted records to keep (just to be safe)
dateperserver updatesolr: Track last Solr update timestamp per server url. Useful if the same RecordManager instance is used to update multiple Solr indexes.

An example command for uploading dump files created with --func=dumpsolr --dumpprefix=dump/solr (8 parallel requests by default, change the -P parameter to modify):

ls -1tr dump/solr*.json | xargs -P 8 -n 1 -I"{}" sh -c 'echo "{}"; curl 'http://localhost:8983/solr/biblio/update' -H "Content-Type: application/json" --data-binary @"{}"'

Remember also to perform a commit as necessary to make the changes visible:

curl 'http://localhost:8983/solr/biblio/update?softCommit=true'

Export

The export program can be used to export data from the RecordManager database.

Command to run: php export.php --file=filename

Parameters:

Name Description
file=... File to write the records to. Any existing file will be overwritten. --file=- can be used along with --quiet for using stdout with piping to further processing
deleted=... File where IDs of any deleted records are written (one per line)
from=... Date since last export so that only changes are exported (any format that PHP's strtotime function understands, but "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss" is recommended).
quiet Disable progress reports and any other messages
skip=... Skip can be used to jump over x records per a single exported record to create a subset of records from the data source.
source=... A data source id (section name or idPrefix in datasources.ini) that can be specified to export only a single data source or multiple sources (separated with a comma)
single=... Export only a single record. Useful for testing. Include prefix in the ID (e.g. --single=samplesource.123)
xpath=... Export only records matching an XPath expression to e.g. dump records containing a specific field. See examples below.
verbose Print verbose output (for debugging purposes)
sortdedup If specified, export file is sorted by dedup id. Helpful when e.g. checking the results of deduplication.
dedupid=... deduped = Add dedup id's to records that have duplicates
         | always  = Always add dedup id's to the records (for non-duplicate records this is the prefixed record id to keep it unique)
         | Otherwise dedup id's are not added to the records

xpath Parameter Examples

To export MARC records that have a 740 field:

php export.php --file=- --xpath=//datafield[@tag='740']

To match the field contents:

php export.php --file=- --xpath="//datafield[@tag='740']/subfield[@code='a' and text()='In the land of the crane']"

Datasources

The datasources program is meant for accessing and manipulating datasources.ini programmatically, though it's currently work in progress and doesn't support many features yet.

Command to run: php datasources.php --search=...

Parameters:

Name Description
search=... Search for something in datasources.ini and return list of data source id's as a result. Allows one to list e.g. all sources using a given format (--search='format=lido'). Note that the comparison is done on the raw data, i.e. there must be no white space or quotes between the settings and their values.