A set of functions to handle data using Python pymarc
library :
marc_utils_4.py
using version 4.2.2 of the librarymarc_utils_5.py
using version 5.2.0
__sort_subfields()
parametercurr_subf
was renammed tosubf_list
force_indicators
:- Parameter
indicators
(list of strings, defaulted to[" ", " "]
) was changed to become two parameterind1
&ind2
, bothstr
defaulted toNone
- Now, if their value is set to
None
(default behaviour), keeps their current value (before, default behaviour was to write to a blank and this function could not be used to keep the current field indicators)
- Parameter
edit_specific_repeatable_subfield_content_with_regexp()
returns a list ofpymarc.field.Subfield
instead of a list of string following old subfield managing system (code 1, value 1, code 2, value 2, etc.)- Added
edit_repeatable_subf_content_with_regexp_for_tag()
which is the same asedit_specific_repeatable_subfield_content_with_regexp()
except it takes as argument a record and a tag and edit all fields with that tag, like all functions except the two using regular expressions replace_specific_repeatable_subfield_content_not_matching_regexp()
returns a list ofpymarc.field.Subfield
instead of a list of string following old subfield managing system (code 1, value 1, code 2, value 2, etc.)- Added
replace_repeatable_subf_content_not_matching_regexp_for_tag()
which is the same asreplace_specific_repeatable_subfield_content_not_matching_regexp()
except it takes as argument a record and a tag and edit all fields with that tag, like all functions except the two using regular expressions - Added
field_as_string()
andrecord_as_string()
which returns the field / record as a string in WinIBW style - Added all functions to retrieve dates from the record (
get_years_in_specific_subfield()
,get_year_from_UNM_100()
,get_years_less_accurate()
&get_years()
) - Added
merge_all_subfields_with_code()
to merge all subfields witht the same code for each field with given tag
See pymarc
releases in the GitLab repository for important changes in the library.
Returns a list
of ints
containing the first 4 consecutive numbers in the specified field-subfield couple.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tagcode
(str
) : the subfields code tag
Returns a list
of ints
containing the position 9-12 or 0-3 of the 100$a
if they are 4 consecutive numbers.
Takes as argument :
record
(pymarc.record.Record
)- [Optionnal]
creation
(bool
, defaulted toFalse
) : return the creation date (pos. 0-3) instead of the publication one (pos. 9-12)
Returns a list
of ints
containing the first 4 consecutive numbers in the specified field (all subfields are analyzed).
Note : dates inferior to 1700 or superioir to 2100 are deleted before returning the list
.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag
Returns a list
of ints
containing either :
- The first 4 consecutive numbers in the specified field-subfield (calls
get_years_in_specific_subfield()
) - The first 4 consecutive numbers in the specified field (calls
get_years_less_accurate()
).
Takes as argument :
record
(pymarc.record.Record
)tags
(list
oftuple
of 2str
) : a list of field + subfield or field +None
Example :
marc_utils.get_years(record, [
("214", "d"),
("330", None),
("615", "a"),
("200", None),
])
# Will call get_years_in_specific_subfield() for 214$d
# Then get_years_less_accurate() for 330
# Then get_years_in_specific_subfield() for 615$a
# Then get_years_less_accurate() for 200
Every functions that uses sort on subfields follow this logic :
- If a subfield code is not in the
sort
argument, its position will stay the same relative to the other subfields not moving - If a subfield is in the
sort
argument, it will be moved to that order - To sort at the end, use
*
as a code to separate codes used to sort at the beginning from codes used to sort at the end
For example :
["a", "b"]
will put at the beginning all subfields using codea
, followed by all subfields using codeb
, followed by all other subfields, keeping their original order["*", "y", "z"]
will keep every subfield in their original order except those with codesy
&z
, followed by all subfields with codey
, followed at the end by all subfields with codez
["a", "*", "z"]
will put at the beginning all subfields using codea
, followed by all other subfields except those with codesz
, followed at the end by all subfields with codez
Sorts the record fields by their tag.
Takes as argument : record
(pymarc.record.Record
)
Sorts subfields for all fields with given tag.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to sortsort
(list
ofstr
) : list of subfields codes to sort. Seesort
argument logic to see how to configure it for this indicator.
Forces the indicators on every field with given tag.
If an indicator is set to None
, the field will keep it's current value.
The function does not check if given values are legal values.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to editind1
(str
, default toNone
) : first indicator valueind2
(str
, default toNone
) : second indicator value
Adds a new subfield with given code & value to every field with given tag if they do not already have a subfield with that code.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to editcode
(str
) : the subfield codeval
(str
) : the subfield value to add- [Optionnal]
pos
(int
, defaulted to999
) : the position of the new subfield (if added). First position is0
Applies a substitution using regular expression to all subfields with given codes for all fields with given tag. No flag are used and no flag can be set.
Alternate version edit_specific_repeatable_subfield_content_with_regexp()
is not described here to keep all documented functions using the same logic in arguments to pass.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to editcodes
(list
ofstr
) : list of subfield codes to editpattern
(str
) : regular expression matching patternrepl
(str
) : regular expression substitution expression
Replaces all subfields value with given codes for all fields with given tag if they do not match given regular expression. No flag are used and no flag can be set.
Alternate version replace_specific_repeatable_subfield_content_not_matching_regexp()
is not described here to keep all documented functions using the same logic in arguments to pass.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to editcodes
(list
ofstr
) : list of subfield codes to editpattern
(str
) : regular expression matching patternrepl
(str
) : the replacement text to use
Change tags in 7XX
fields to make sure that only 1 7X0
is in the record and there's at least one 7X0
if there are 7X1
or 7X2
Takes as argument :
record
(pymarc.record.Record
)- [Optionnal]
prioritize_71X
(bool
, default toFalse
) : prioritize710
over700
Merges all fields with given tag, sorting the subfields if wanted.
Edits the record and returns the new field (pymarc.field.Field
).
Indicators used are those from the first field occurrence.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to merge- [Optionnal]
sort
(list
ofstr
, default to no sort) : list of subfields codes to sort. Seesort
argument logic to see how to configure it.
Merges all subfields with given code in every field with given tag. The position of the first subfield will be kept.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to editcode
(str
) : the subfield code to mergeseparator
(str
) : the separtor to use between subfields
Splits all fields with given tag into multiple fields if the field has multiple subfields with given code. Every other subfield is copied into all created fields.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to splitcode
(str
) : the subfield code to check
Splits all fields with given tag into multiple fields if there are multiple subfields with the same code. In case some fields sufields have less occurrence than other subfields, copies the first subfield occurence for this code.
record
(pymarc.record.Record
)tag
(str
) : the fields tag to split
Deletes every empty subfields in whole the record.
Takes as argument : record
(pymarc.record.Record
)
Deletes every empty fields in whole the record.
Takes as argument : record
(pymarc.record.Record
)
For all fields with given tag, deletes the entire field if all subfields with given code match the given regular expression.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to deletecode
(str
) : the subfield code to checkpattern
(str
) : regular expression matching pattern- [Optionnal]
keep_if_no_subf
(bool
, default toTrue
) : if no subfield has given code, should the field be kept
For all fields with given tag, only keeps the first subfield with given code.
Takes as argument :
record
(pymarc.record.Record
)tag
(str
) : the fields tag to editcode
(str
) : the subfield code to keep only once
Returns the field as a string in WinIBW (blank indicators are returned as #
)
Takes as argument :
field
(pymarc.field.Field
)
Returns the record as a string in WinIBW (blank indicators are returned as #
)
Takes as argument :
record
(pymarc.record.Record
)