Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink PICA Path Expression syntax #66

Closed
nichtich opened this issue Jun 29, 2020 · 12 comments
Closed

Rethink PICA Path Expression syntax #66

nichtich opened this issue Jun 29, 2020 · 12 comments

Comments

@nichtich
Copy link
Member

nichtich commented Jun 29, 2020

The PICA Path expression syntax is aligned with MARCSpec but this has some drawbacks:

  • occurrences syntax (123A[01]) differs from syntax used in PICA Plain serialization format (123A/01)
  • WinIBW Excel export uses a different syntax

I think WinIBW compatibility is more important than MARCSpec compatibility.

Examples from WinIBW Excel export:

  • 021A Full field
  • 022A/00 Full field, select occurrence
  • 004A $A Subfield
  • 029A $8 $a Multiple subfields, implicit OR
  • 021A $a+$d Multiple subfields, explicit AND
  • 021A $a+" : $d String template

There are two issues here:

  1. occurence syntax with / instead of [..]
  2. Allow whitespaces
  3. how to express multiple subfields

By now the syntax for multiple subfields is implicit AND (021A $ad), we could extend to explicit form 021A $a+$d, add implicit OR 029A $8 $a and string templates. Does WinIBW support escapes in string templates? I'd expect JSON escaping rules, no?

I'd deprecate current position syntax with / and use the slash as alternative syntax for occurrences as well.

@nichtich
Copy link
Member Author

New grammar:

EXPRESSION := TAG OCCURRENCE? WS* SUBFIELDS
TAG        := [012.][0-9.][0-9.][A-Z@.]
OCCURRENCE := `[` [0-9.]{1,3} `]` | `/` [0-9.]{1,3}
SUBFIELDS  := SHORTLIST | ANDLIST | ORLIST
SHORTLIST  := `$` SFCODE+
ANDLIST    := TEMPLATE | SFREF ( WS* `+` WS* `$` (TEMPLATE | SFREF) )* 
ORLIST     := TEMPLATE | SFREF ( WS* (TEMPLATE | SFREF) )+
SFREF      := `$` SFCODE
SFCODE     := [0-9A-Za-z]
TEMPLATE   := `"` ( [^"] | `\"` )* `"`

@jorol
Copy link
Contributor

jorol commented Jun 29, 2020

I think WinIBW compatibility is more important than MARCSpec compatibility.

No, not for me. I'm working primarily with Catmandu, PICA & MARC. I aligned the *_map() fixes because I got confused by their differences.

I would be fine with the change of the occurrence syntax if we keep the rest aligned with marc_map(). Perhaps we should discuss these changes with @phochste.

@cKlee
Copy link
Collaborator

cKlee commented Jun 29, 2020

The subfield x is also very essential. It often contains a counter. Would be nice to have this possibility also:

209Ax00 $a
209Ax09 $a

Im Exemplarsatz gibt es Felder, die im Unterfeld "x" einen Zähler enthalten. Beispiel: Mit "x00" und "x09" werden die Felder 7100 und 7109 unterschieden. Lassen Sie sich einen Datensatz im PicaPlus-Format anzeigen, dann wird diese Information klarer!

209A/01 ƒfLSƒaBio Evo 77ƒdiƒx00
209A/01 ƒaA 2012/123ƒduƒx09

In MARCspec this ist a subspec.

@nichtich
Copy link
Member Author

nichtich commented Jun 29, 2020

I'm working primarily with Catmandu, PICA & MARC. I aligned the *_map() fixes because I got confused by their differences.

Ok, so different occurrences syntax cannot be solved without breaking changes - unless position only makes sense in combination with subfields, so we can differentiate whether / starts an occurrence or a subfield (!).

How about the other extensions to express multiple subfields?

@jorol
Copy link
Contributor

jorol commented Jun 30, 2020

How about the other extensions to express multiple subfields?

I suggest to discuss this with @phochste and see if we should implement them for pica_map() and marc_map().

@nichtich
Copy link
Member Author

nichtich commented Jul 6, 2020

I'm still using this thread to collect ideas of possible changes and extension before discussion whether and which to implement. So far:

@cKlee
Copy link
Collaborator

cKlee commented Jul 6, 2020

What is the benefit of allowing whitespace?

@nichtich
Copy link
Member Author

nichtich commented Jul 6, 2020

What is the benefit of allowing whitespace?

Improve readability and most important same consistent syntax as WinIBW rules. We might strip whitespace but if string templates are allowed this gets complex and has little benefit anyway.

@nichtich
Copy link
Member Author

In favor of not supporting whitespace and string templates, the remaining issues are:

  • Support occurrences ranges (Extend PICA Path with occurrence ranges #96). This is not a real extension but closes a missing feature.
  • Allow / to indicate occurrence. This would break MARCSpec compatibility, so the solution is to transform the path in the client and warn if a path still contains positions (partly implemented in picadata so far, at least for command explain)

@jorol
Copy link
Contributor

jorol commented Jun 15, 2021

Support occurrences ranges (#96). This is not a real extension but closes a missing feature.

ok

Allow / to indicate occurrence. This would break MARCSpec compatibility, so the solution is to transform the path in the client and warn if a path still contains positions (partly implemented in picadata so far, at least for command explain)

To transform the path in "clients" like pica_map() could be a solution. I would keep the positional defined substrings in pica_map and add the functionality there.

Could you create a developer release or branch with the new syntax? I would refactor the Catmandu modules based on that. Not sure when I will have time for this...

@nichtich
Copy link
Member Author

nichtich commented Jun 15, 2021

Could you create a developer release or branch with the new syntax?

I thought about adding the functionality only in the picadata command line client because it will not support selection of fields values via positions anyway. See this lines for implementation. The documentation should be extended to tell that occurrences can be specified via /... (PICA Plain syntax) or [...] (PICA Path syntax).

nichtich added a commit that referenced this issue Jun 16, 2021
Changelog diff is:

diff --git a/Changes b/Changes
index 9650b54..07fbcdb 100644
--- a/Changes
+++ b/Changes
@@ -1,6 +1,8 @@
 Revision history for PICA::Data

 {{$NEXT}}
+
+1.25 2021-06-16T14:18:46Z
     - Implement occurrence ranges (#96)
     - Add option position_as_occurrence (see #66)
@nichtich
Copy link
Member Author

nichtich commented Jun 23, 2021

Closed in favor of #109, #108 and #97. Use of / to denote occurrences instead of positions is only supported as additional feature, enabled in the picadata client, see https://metacpan.org/dist/PICA-Data/view/script/picadata#-path,-p and https://metacpan.org/pod/PICA::Path#new(-$expression-%5B,-position_as_occurrence-=%3E-1-%5D-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants