Skip to content

Commit a569896

Browse files
authored
Merge pull request #1677 from michaelhkay/1675-csv-parsing-corrections
1675 Fixes for CSV parsing
2 parents f6a4463 + be078f5 commit a569896

File tree

2 files changed

+22
-31
lines changed

2 files changed

+22
-31
lines changed

specifications/xpath-functions-40/src/function-catalog.xml

Lines changed: 13 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -317,10 +317,10 @@
317317
option is <code>true</code>. If there are no data rows in the CSV, the
318318
value will be an empty sequence.</p></fos:meaning>
319319
</fos:field>
320-
<fos:field name="get" type="function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string?" required="true">
320+
<fos:field name="get" type="function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string" required="true">
321321
<fos:meaning><p>A function providing ready access to a given field in a given
322322
row. The <code>get</code> function has signature:</p>
323-
<eg>function($row as xs:integer, $column as union(xs:string, xs:integer)) as xs:string?</eg>
323+
<eg>function($row as xs:positiveInteger, $column as (xs:positiveInteger | xs:string)) as xs:string</eg>
324324
<p>The function takes two arguments: the first is an
325325
integer giving the row number (1-based), the second
326326
identifies a column either by its name or by its 1-based
@@ -330,7 +330,7 @@
330330
the function call <code>$csv?get($R, $C)</code>, where <code>$C</code>
331331
is an integer, returns the value of <code>$csv?rows[$R] => array:get($C, fn { "" })</code>,
332332
and the function call <code>$csv?get($R, $K)</code>, where <code>$K</code>
333-
is a string, returns the value of <code>$csv?get($R, $csv?column-numbers($K))</code>.</p>
333+
is a string, returns the value of <code>$csv?get($R, $csv?column-index($K))</code>.</p>
334334

335335
<p>The properties of the function are as follows:</p>
336336
<glist>
@@ -344,7 +344,7 @@
344344
</gitem>
345345
<gitem>
346346
<label>Signature</label>
347-
<def><p><code>(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string?</code></p></def>
347+
<def><p><code>(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string</code></p></def>
348348
</gitem>
349349
<gitem>
350350
<label>Non-local variable bindings</label>
@@ -26072,26 +26072,10 @@ return json-to-xml($json, $options)]]></eg>
2607226072
</fos:value>
2607326073
</fos:values>
2607426074
</fos:option>
26075-
<!--<fos:option key="normalize-newlines">
26076-
<fos:meaning>Determines whether CR and CRLF character sequences
26077-
are treated as equivalent to NL characters.</fos:meaning>
26078-
<fos:type>xs:boolean</fos:type>
26079-
<fos:default>false</fos:default>
26080-
<fos:values>
26081-
<fos:value value="false">No normalization takes place.
26082-
</fos:value>
26083-
<fos:value value="true">The character sequences CR (<char>U+000D</char>)
26084-
and CRLF (<char>U+000D</char>, <char>U+000A</char>) are treated as equivalent to the
26085-
character NL (<char>U+000A</char>), except when they appear within a quoted field.
26086-
The normalization is done prior to recognition of row delimiters, and happens
26087-
whether or not NL is used as the row delimiter.
26088-
</fos:value>
26089-
</fos:values>
26090-
</fos:option>-->
2609126075
<fos:option key="header">
2609226076
<fos:meaning>Determines whether the first row of the CSV should be treated as a list
2609326077
of column names, or whether column names are being supplied by the caller.
26094-
The value must either be a single boolean, or a sequence of one or more strings.
26078+
The value must either be a single boolean, or a sequence of zero or more strings.
2609526079
</fos:meaning>
2609626080
<fos:type>item()*</fos:type>
2609726081
<fos:default>false</fos:default>
@@ -26100,7 +26084,7 @@ return json-to-xml($json, $options)]]></eg>
2610026084
first row of the CSV data.</fos:value>
2610126085
<fos:value value="false">Column names are not available; all references
2610226086
to columns are by ordinal position.</fos:value>
26103-
<fos:value value="xs:string+">Supplies explicit names for the columns. The <var>N</var>th
26087+
<fos:value value="xs:string*">Supplies explicit names for the columns. The <var>N</var>th
2610426088
name in the list applies to the <var>N</var>th column after any filtering or rearrangement.
2610526089
A zero-length string can be used when there is a column that requires no name.
2610626090
</fos:value>
@@ -26448,7 +26432,7 @@ return (
2644826432
the CSV string. An instance of
2644926433
<code>xs:string</code> whose length is exactly one.
2645026434
Defaults to a single newline character (<char>U+000A</char>).</fos:meaning>
26451-
<fos:type>xs:string+</fos:type>
26435+
<fos:type>xs:string</fos:type>
2645226436
<fos:default>char('\n')</fos:default>
2645326437
</fos:option>
2645426438
<fos:option key="quote-character">
@@ -26459,14 +26443,14 @@ return (
2645926443
</fos:option>
2646026444
<fos:option key="trim-whitespace">
2646126445
<fos:meaning>Determines whether leading and trailing whitespace
26462-
is removed from the content of fields.</fos:meaning>
26446+
is removed from the content of unquoted fields.</fos:meaning>
2646326447
<fos:type>xs:boolean</fos:type>
2646426448
<fos:default>false</fos:default>
2646526449
<fos:values>
26466-
<fos:value value="false">Fields will be returned with any leading or trailing
26450+
<fos:value value="false">Unquoted fields will be returned with any leading or trailing
2646726451
whitespace intact.
2646826452
</fos:value>
26469-
<fos:value value="true">Fields will be returned with leading or trailing
26453+
<fos:value value="true">Unquoted fields will be returned with leading or trailing
2647026454
whitespace removed, and all other whitespace preserved.
2647126455
</fos:value>
2647226456
</fos:values>
@@ -26712,7 +26696,9 @@ return document {
2671226696
}</csv>
2671326697
}]]></eg>
2671426698

26715-
<p>The namespace prefix used in the names of elements (or its absence) is
26699+
<p>The elements in the returned XML are in the namespace
26700+
<code>http://www.w3.org/2005/xpath-functions</code>;
26701+
the namespace prefix that is used (or its absence) is
2671626702
<termref def="implementation-dependent"/>.</p>
2671726703

2671826704
<p>If the function is called twice with the same arguments, it is <termref

specifications/xpath-functions-40/src/xpath-functions.xml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7252,10 +7252,15 @@ Bob,2023-07-14,2.34
72527252
</csv>
72537253
]]></eg>
72547254

7255-
<p>If column names were not extracted, then implementations <rfc2119>should
7256-
not</rfc2119> include the <code><![CDATA[<header>]]></code> element, and
7257-
<code><![CDATA[<field>]]></code> elements <rfc2119>should not</rfc2119> have
7258-
the <code>column</code> attribute:</p>
7255+
<p>If no non-empty column names are available, then the <code>columns</code>
7256+
element and all <code>column</code> attributes are absent.
7257+
If non-empty column names are available for some columns but not for others,
7258+
then (a) an empty <code>column</code> element is included
7259+
within the <code>columns</code> element if and only if there is a subsequent
7260+
column with a non-empty name, and (b) the <code>column</code> attribute
7261+
for the corresponding <code>field</code> elements is absent.</p>
7262+
7263+
<p>For example (when no column names are available):</p>
72597264

72607265
<eg><![CDATA[
72617266
<csv xmlns="http://www.w3.org/2005/xpath-functions">

0 commit comments

Comments
 (0)