Rethink PICA Path Expression syntax #66

nichtich · 2020-06-29T07:08:20Z

The PICA Path expression syntax is aligned with MARCSpec but this has some drawbacks:

occurrences syntax (123A[01]) differs from syntax used in PICA Plain serialization format (123A/01)
WinIBW Excel export uses a different syntax

I think WinIBW compatibility is more important than MARCSpec compatibility.

Examples from WinIBW Excel export:

021A Full field
022A/00 Full field, select occurrence
004A $A Subfield
029A $8 $a Multiple subfields, implicit OR
021A $a+$d Multiple subfields, explicit AND
021A $a+" : $d String template

There are two issues here:

occurence syntax with / instead of [..]
Allow whitespaces
how to express multiple subfields

By now the syntax for multiple subfields is implicit AND (021A $ad), we could extend to explicit form 021A $a+$d, add implicit OR 029A $8 $a and string templates. Does WinIBW support escapes in string templates? I'd expect JSON escaping rules, no?

I'd deprecate current position syntax with / and use the slash as alternative syntax for occurrences as well.

The text was updated successfully, but these errors were encountered:

nichtich · 2020-06-29T07:34:32Z

New grammar:

EXPRESSION := TAG OCCURRENCE? WS* SUBFIELDS
TAG        := [012.][0-9.][0-9.][A-Z@.]
OCCURRENCE := `[` [0-9.]{1,3} `]` | `/` [0-9.]{1,3}
SUBFIELDS  := SHORTLIST | ANDLIST | ORLIST
SHORTLIST  := `$` SFCODE+
ANDLIST    := TEMPLATE | SFREF ( WS* `+` WS* `$` (TEMPLATE | SFREF) )* 
ORLIST     := TEMPLATE | SFREF ( WS* (TEMPLATE | SFREF) )+
SFREF      := `$` SFCODE
SFCODE     := [0-9A-Za-z]
TEMPLATE   := `"` ( [^"] | `\"` )* `"`

jorol · 2020-06-29T08:44:55Z

I think WinIBW compatibility is more important than MARCSpec compatibility.

No, not for me. I'm working primarily with Catmandu, PICA & MARC. I aligned the *_map() fixes because I got confused by their differences.

I would be fine with the change of the occurrence syntax if we keep the rest aligned with marc_map(). Perhaps we should discuss these changes with @phochste.

cKlee · 2020-06-29T11:46:29Z

The subfield x is also very essential. It often contains a counter. Would be nice to have this possibility also:

209Ax00 $a
209Ax09 $a

Im Exemplarsatz gibt es Felder, die im Unterfeld "x" einen Zähler enthalten. Beispiel: Mit "x00" und "x09" werden die Felder 7100 und 7109 unterschieden. Lassen Sie sich einen Datensatz im PicaPlus-Format anzeigen, dann wird diese Information klarer!

209A/01 ƒfLSƒaBio Evo 77ƒdiƒx00
209A/01 ƒaA 2012/123ƒduƒx09

In MARCspec this ist a subspec.

nichtich · 2020-06-29T14:13:10Z

I'm working primarily with Catmandu, PICA & MARC. I aligned the *_map() fixes because I got confused by their differences.

Ok, so different occurrences syntax cannot be solved without breaking changes - unless position only makes sense in combination with subfields, so we can differentiate whether / starts an occurrence or a subfield (!).

How about the other extensions to express multiple subfields?

jorol · 2020-06-30T09:04:34Z

How about the other extensions to express multiple subfields?

I suggest to discuss this with @phochste and see if we should implement them for pica_map() and marc_map().

nichtich · 2020-07-06T08:51:29Z

I'm still using this thread to collect ideas of possible changes and extension before discussion whether and which to implement. So far:

Allow / to be used to indicate occurrence (in addition or alternative to [...]
Support string templates (e.g. (": $a"`)
Support number spans of occurrences. The cataloguing rules contain some fields with spans of occurrences, e.g. 147B/07-09 => Extend PICA Path with occurrence ranges #96
Support selection by counter in subfield x (e.g. 209Ax00 $a) as provided by WinIBW => PICA Path: support selection by counter in subfield x #97
Support multiple subfields combined by OR or by AND
Allow whitespace

cKlee · 2020-07-06T09:07:04Z

What is the benefit of allowing whitespace?

nichtich · 2020-07-06T09:09:47Z

What is the benefit of allowing whitespace?

Improve readability and most important same consistent syntax as WinIBW rules. We might strip whitespace but if string templates are allowed this gets complex and has little benefit anyway.

nichtich · 2021-06-10T08:47:39Z

In favor of not supporting whitespace and string templates, the remaining issues are:

Support occurrences ranges (Extend PICA Path with occurrence ranges #96). This is not a real extension but closes a missing feature.
Allow / to indicate occurrence. This would break MARCSpec compatibility, so the solution is to transform the path in the client and warn if a path still contains positions (partly implemented in picadata so far, at least for command explain)

jorol · 2021-06-15T10:36:51Z

Support occurrences ranges (#96). This is not a real extension but closes a missing feature.

ok

Allow / to indicate occurrence. This would break MARCSpec compatibility, so the solution is to transform the path in the client and warn if a path still contains positions (partly implemented in picadata so far, at least for command explain)

To transform the path in "clients" like pica_map() could be a solution. I would keep the positional defined substrings in pica_map and add the functionality there.

Could you create a developer release or branch with the new syntax? I would refactor the Catmandu modules based on that. Not sure when I will have time for this...

nichtich · 2021-06-15T13:32:41Z

Could you create a developer release or branch with the new syntax?

I thought about adding the functionality only in the picadata command line client because it will not support selection of fields values via positions anyway. See this lines for implementation. The documentation should be extended to tell that occurrences can be specified via /... (PICA Plain syntax) or [...] (PICA Path syntax).

Changelog diff is: diff --git a/Changes b/Changes index 9650b54..07fbcdb 100644 --- a/Changes +++ b/Changes @@ -1,6 +1,8 @@ Revision history for PICA::Data {{$NEXT}} + +1.25 2021-06-16T14:18:46Z - Implement occurrence ranges (#96) - Add option position_as_occurrence (see #66)

nichtich · 2021-06-23T08:46:12Z

Closed in favor of #109, #108 and #97. Use of / to denote occurrences instead of positions is only supported as additional feature, enabled in the picadata client, see https://metacpan.org/dist/PICA-Data/view/script/picadata#-path,-p and https://metacpan.org/pod/PICA::Path#new(-$expression-%5B,-position_as_occurrence-=%3E-1-%5D-)

nichtich added a commit that referenced this issue Jun 16, 2021

Add option position_as_occurrence (see #66)

d43471d

nichtich closed this as completed Jun 23, 2021

nichtich mentioned this issue Jul 6, 2021

Formally describe filter syntax deutsche-nationalbibliothek/pica-rs#248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink PICA Path Expression syntax #66

Rethink PICA Path Expression syntax #66

nichtich commented Jun 29, 2020 •

edited

Loading

nichtich commented Jun 29, 2020

jorol commented Jun 29, 2020

cKlee commented Jun 29, 2020

nichtich commented Jun 29, 2020 •

edited

Loading

jorol commented Jun 30, 2020

nichtich commented Jul 6, 2020 •

edited

Loading

cKlee commented Jul 6, 2020

nichtich commented Jul 6, 2020

nichtich commented Jun 10, 2021

jorol commented Jun 15, 2021

nichtich commented Jun 15, 2021 •

edited

Loading

nichtich commented Jun 23, 2021 •

edited

Loading

Rethink PICA Path Expression syntax #66

Rethink PICA Path Expression syntax #66

Comments

nichtich commented Jun 29, 2020 • edited Loading

nichtich commented Jun 29, 2020

jorol commented Jun 29, 2020

cKlee commented Jun 29, 2020

nichtich commented Jun 29, 2020 • edited Loading

jorol commented Jun 30, 2020

nichtich commented Jul 6, 2020 • edited Loading

cKlee commented Jul 6, 2020

nichtich commented Jul 6, 2020

nichtich commented Jun 10, 2021

jorol commented Jun 15, 2021

nichtich commented Jun 15, 2021 • edited Loading

nichtich commented Jun 23, 2021 • edited Loading

nichtich commented Jun 29, 2020 •

edited

Loading

nichtich commented Jun 29, 2020 •

edited

Loading

nichtich commented Jul 6, 2020 •

edited

Loading

nichtich commented Jun 15, 2021 •

edited

Loading

nichtich commented Jun 23, 2021 •

edited

Loading