Common syntax to reference PICA fields, independent from content #271

nichtich · 2021-08-16T19:30:49Z

Split from #248. The syntax to select and filter PICA records or record content should be the same for all tools, at least for the basic use cases. The most basic use case is to reference a list of PICA fields, independent from their content.

In PICA Path Expression (based on MARCSpec) the current syntax is

# tag                      # optional occurrence or occurrence range
([012.][0-9.][0-9.][A-Z@.])(\[([0-9.]{2,3}|[0-9]+-[0-9]+)\])?

[...] instead of / was used for occurrences because MARCSpec already used / for substring ranges. These are mainly relevant to fixed width MARC fields (having no occurrences and subfields), and / is used before occurrences in PICA Plain anyway. So the common syntax can use / (partly breaking some backwards compatibility in Catmandu::PICA). Open Issues:

Wildcard characters in tags (PICA Path supports .)
Wildcard character in occurrences (PICA Path supports .)
Lists of fields (not part of PICA Path yet)

PICA Fields are often grouped in levels (first digit) and ranges (second and third digit) that's what wildcards in tags are mainly used for. Wildcard character in occurrences are less relevant because they can mainly be replaced by ranges. The . clashes with its use as subfield indicator (alternative to $) in pica-rs, so we could introduce * at the end of a tag instead. (e.g. 0* for level 0 or 001* for system tags on level 0). The syntax would then be (space for readability):

(\* | [012] (\* | [0-9] (\* | ([0-9] ([A-Z@*]) ) ) )
(\/ ([0-9]{2,3} | [0-9]{1,3} - [0-9]{1,3} ) )?

Alternatively keep the . as wildcard.

Lists of multiple fields could be separated by ,, | or any space character (?)

The text was updated successfully, but these errors were encountered:

nwagner84 · 2021-12-07T08:49:57Z

I think it is not necessary to have a unified syntax for selecting fields/subfields. pica-rs provides a first set of syntax rules to express selection and projection operations. At the moment these two basic operations are enough, but I have already ideas to extend or change this syntax (template expressions, aggregation function, etc.). This a specific pica-rs feature which should not be unified between other tools.

nichtich · 2021-12-07T09:35:29Z

I fully agree, this was more an idea or duplicate. With #346 we have a common subset to reference fields and subfields (and optionally character ranges within subfield values). I've just updated the specification with more explanation (in German), minor details can still be discussed. PICA Path covers referencing as most important use case. Everything beyond (conditions, aggregations, mappings...) depends on particular tools.

nwagner84 added wontfix This will not be worked on discussion and removed wontfix This will not be worked on labels Dec 7, 2021

nichtich closed this as completed Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common syntax to reference PICA fields, independent from content #271

Common syntax to reference PICA fields, independent from content #271

nichtich commented Aug 16, 2021

nwagner84 commented Dec 7, 2021 •

edited

Loading

nichtich commented Dec 7, 2021

Common syntax to reference PICA fields, independent from content #271

Common syntax to reference PICA fields, independent from content #271

Comments

nichtich commented Aug 16, 2021

nwagner84 commented Dec 7, 2021 • edited Loading

nichtich commented Dec 7, 2021

nwagner84 commented Dec 7, 2021 •

edited

Loading