Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Improve format documentation #963

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 0 additions & 126 deletions pages/understanding-json-schema/reference/string.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,129 +115,3 @@ with an optional area code:
// props { "indent": true, "valid": false }
"(800)FLOWERS"
```

## Format[#format]

The `format` keyword allows for basic semantic identification of certain
kinds of string values that are commonly used. For example, because JSON
doesn\'t have a \"DateTime\" type, dates need to be encoded as strings.
`format` allows the schema author to indicate that the string value
should be interpreted as a date. By default, `format` is just an
annotation and does not effect validation.

Optionally, validator [implementations](../../learn/glossary#implementation) can provide a configuration option
to enable `format` to function as an assertion rather than just an
annotation. That means that validation will fail if, for example, a
value with a `date` format isn\'t in a form that can be parsed as a
date. This can allow values to be constrained beyond what the other
tools in JSON Schema, including [Regular Expressions](../../understanding-json-schema/reference/regular_expressions) can
do.

> Implementations may provide validation for only a subset of the built-in
> formats or do partial validation for a given format. For example, some
> implementations may consider a string an email if it contains a `@`,
> while others might do additional checks for other aspects of a well
> formed email address.

There is a bias toward networking-related formats in the JSON Schema
specification, most likely due to its heritage in web technologies.
However, custom formats may also be used, as long as the parties
exchanging the JSON documents also exchange information about the custom
format types. A JSON Schema validator will ignore any format type that
it does not understand.

### Built-in formats[#built-in-formats]

The following is the list of formats specified in the JSON Schema
specification.

#### Dates and times

Dates and times are represented in [RFC 3339, section 5.6](https://tools.ietf.org/html/rfc3339#section-5.6). This is a subset
of the date format also commonly known as [ISO8601 format](https://www.iso.org/iso-8601-date-and-time-format.html).

- `"date-time"`: Date and time together, for example,
`2018-11-13T20:20:39+00:00`.
- `"time"`: <StarInline label="New in draft 7" /> Time, for example, `20:20:39+00:00`
- `"date"`: <StarInline label="New in draft 7" /> Date, for example, `2018-11-13`.
- `"duration"`: <StarInline label="New in draft 2019-09" /> A duration as defined by the [ISO 8601 ABNF for \"duration\"](https://datatracker.ietf.org/doc/html/rfc3339#appendix-A).
For example, `P3D` expresses a duration of 3 days.

<Keywords label="single: email single: idn-email single: format; email single: format; idn-email" />

#### Email addresses

- `"email"`: Internet email address, see [RFC 5321, section 4.1.2](http://tools.ietf.org/html/rfc5321#section-4.1.2).
- `"idn-email"`: <StarInline label="New in draft 7" /> The internationalized form of an Internet email
address, see [RFC 6531](https://tools.ietf.org/html/rfc6531).

<Keywords label="single: hostname single: idn-hostname single: format; hostname single: format; idn-hostname" />

#### Hostnames

- `"hostname"`: Internet host name, see [RFC 1123, section 2.1](https://datatracker.ietf.org/doc/html/rfc1123#section-2.1).
- `"idn-hostname"`: <StarInline label="New in draft 7" /> An internationalized Internet host name, see
[RFC5890, section 2.3.2.3](https://tools.ietf.org/html/rfc5890#section-2.3.2.3).

<Keywords label="single: ipv4 single: ipv6 single: format; ipv4 single: format; ipv6" />

#### IP Addresses

- `"ipv4"`: IPv4 address, according to dotted-quad ABNF syntax as
defined in [RFC 2673, section 3.2](http://tools.ietf.org/html/rfc2673#section-3.2).
- `"ipv6"`: IPv6 address, as defined in [RFC 2373, section 2.2](http://tools.ietf.org/html/rfc2373#section-2.2).

<Keywords label="single: uuid single: uri single: uri-reference single: iri single: iri-reference single: format; uuid single: format; uri single: format; uri-reference single: format; iri single: format; iri-reference" />

#### Resource identifiers

- `"uuid"`: <StarInline label="New in draft 2019-09" /> A Universally Unique Identifier as defined by [RFC 4122](https://datatracker.ietf.org/doc/html/rfc4122). Example:
`3e4666bf-d5e5-4aa7-b8ce-cefe41c7568a`
- `"uri"`: A universal resource identifier (URI), according to
[RFC3986](http://tools.ietf.org/html/rfc3986).
- `"uri-reference"`: <StarInline label="New in draft 6" /> A URI Reference (either a URI or a
relative-reference), according to [RFC3986, section 4.1](http://tools.ietf.org/html/rfc3986#section-4.1).
- `"iri"`: <StarInline label="New in draft 7" /> The internationalized equivalent of a \"uri\", according to
[RFC3987](https://tools.ietf.org/html/rfc3987).
- `"iri-reference"`: <StarInline label="New in draft 7" /> The internationalized equivalent of a
\"uri-reference\", according to
[RFC3987](https://tools.ietf.org/html/rfc3987)

If the values in the schema have the ability to be relative to a
particular source path (such as a link from a webpage), it is generally
better practice to use `"uri-reference"` (or `"iri-reference"`) rather
than `"uri"` (or `"iri"`). `"uri"` should only be used when the path
must be absolute.

<Keywords label="single: uri-template single: format; uri-template" />

#### URI template

- `"uri-template"`: <StarInline label="New in draft 6" /> A URI Template (of any level) according to
[RFC6570](https://tools.ietf.org/html/rfc6570). If you don\'t
already know what a URI Template is, you probably don\'t need this
value.

<Keywords label="single: json-pointer single: relative-json-pointer single: format; json-pointer single: format; relative-json-pointer" />

#### JSON Pointer

- `"json-pointer"`: <StarInline label="New in draft 6" /> A JSON Pointer, according to
[RFC6901](https://tools.ietf.org/html/rfc6901). There is more
discussion on the use of JSON Pointer within JSON Schema in
[Structuring a complex schema](../../understanding-json-schema/structuring). Note that this should be used only when
the entire string contains only JSON Pointer content, e.g.
`/foo/bar`. JSON Pointer URI fragments, e.g. `#/foo/bar/` should use
`"uri-reference"`.
- `"relative-json-pointer"`: <StarInline label="New in draft 7" /> A [relative JSON pointer](https://tools.ietf.org/html/draft-handrews-relative-json-pointer-01).

<Keywords label="single: regex single: format; regex" />

#### Regular Expressions

- `"regex"`: <StarInline label="New in draft 7" /> A regular expression, which should be valid according to
the [ECMA 262](https://www.ecma-international.org/publications-and-standards/standards/ecma-262/)
[dialect](../../learn/glossary#dialect).

Be careful, in practice, JSON schema validators are only required to
accept the safe subset of [regular expressions](../../understanding-json-schema/reference/regular_expressions) described elsewhere in this document.
61 changes: 61 additions & 0 deletions pages/understanding-json-schema/reference/type.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,3 +175,64 @@ types. For example, numeric types have a way of specifying a numeric
range, that would not be applicable to other types. In this reference,
these validation keywords are described along with each of their
corresponding types in the following chapters.

## Format[#format]

The `format` keyword conveys semantic information for values that may be difficult or impossible to describe using JSON Schema. Typically, this semantic information is described by other documents. The JSON Schema Validation specification defines several formats, but this keyword also allows schema authors to define their own formats.

For example, because JSON doesn't have a "DateTime" type, dates need to be encoded as strings. `format` allows the schema author to indicate that the string value should be interpreted as a date. By default, `format` is just an annotation and does not affect validation.

Optionally, validator [implementations](../../learn/glossary#implementation) can provide a configuration option to enable `format` to function as an assertion rather than just an annotation. That means that validation fails when, for example, a value with a `date` format isn't in a form that can be parsed as a date. This allows values to be constrained beyond what other tools in JSON Schema, including [Regular Expressions](../../understanding-json-schema/reference/regular_expressions), can do.

> Implementations may provide validation for only a subset of the built-in formats or do partial validation for a given format. For example, some implementations may consider a string an email if it contains an `@`, while others might perform additional checks for other aspects of a well-formed email address.

The JSON Schema specification has a bias toward networking-related formats, likely due to its roots in web technologies. However, custom formats may also be used if the parties exchanging the JSON documents share information about the custom format types. A JSON Schema validator will ignore any format type it does not understand.
nikhilkalburgi marked this conversation as resolved.
Show resolved Hide resolved

### Built-in Formats

It should be noted that `format` is not limited because it only defines a specific set of valid values for formats. Users may define their own custom keywords to work with any specific data type, such as `integer`, `double`, `float`, etc. Below, we cover the commonly used `string` formats specified in the JSON Schema specification.

#### Dates and Times

Dates and times are represented in [RFC 3339, section 5.6](https://tools.ietf.org/html/rfc3339#section-5.6). This is a subset of the date format also commonly known as [ISO8601 format](https://www.iso.org/iso-8601-date-and-time-format.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a subset of the date format also commonly known as ISO8601 format.

This is not true. Here's a really good comparison: https://ijmacd.github.io/rfc3339-iso8601/

Copy link
Contributor Author

@nikhilkalburgi nikhilkalburgi Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gregsdennis ,

I have two different suggestions for this:

1:

Dates and times are represented in RFC 3339, section 5.6. While RFC 3339 is often considered a subset of the ISO 8601 format, there are important differences between the two. For a detailed comparison, refer to this resource: RFC 3339 vs. ISO 8601.

2:

Dates and times are represented in RFC 3339, section 5.6. While RFC 3339 is often seen as a simpler version of the ISO 8601 format, there are some differences. RFC 3339 focuses on a smaller set of date and time formats and has stricter rules. It doesn't include all the features of ISO 8601, like week dates or very long years. For a clear comparison, check this resource: RFC 3339 vs. ISO 8601.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 3339 considered a simpler version of 8601? I never thought that. I always thought they were just different. I was actually surprised when I discovered how much they overlap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 3339 is a simplified and restricted subset of ISO 8601.

They have some similarities like Both formats support the basic date-time structure using the format YYYY-MM-DDTHH:MM:SS and some differences like ISO 8601 allows reduced precision in the time format (e.g, only specifying the hour: 14:30) and RFC 3339 requires the full HH:MM:SS format.

ISO 8601 provides more options than RFC 3339.

So, what should be our conclusion for this? Do you prefer one out of the above two options or you have anything else to add?

Copy link
Member

@gregsdennis gregsdennis Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 3339 is a simplified and restricted subset of ISO 8601.

I have a problem with the word "subset". It's not a subset. There are 3339 formats that 8601 doesn't accept. In the comparison, you can plainly see this. It's not a subset because the 3339 circle does not exist completely within the 8601 circle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you have the opposite problem that "distinct" means there's no overlap. Just say they're separate specs. The RFC is by IETF, and the ISO is by, well, ISO. They're different because they are attempts to standardize dates by two different standardization bodies.

Copy link
Contributor Author

@nikhilkalburgi nikhilkalburgi Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dates and times are represented in RFC 3339, section 5.6. RFC 3339 and ISO 8601 are standards from different standardization bodies. RFC 3339 is from the IETF and ISO 8601 from the ISO. They are separate date format specifications with overlapping features. RFC 3339 enforces stricter rules and supports certain formats that ISO 8601 does not, while omitting features like week dates and very long years, which are part of ISO 8601. To explore their similarities and differences, refer to: RFC 3339 vs. ISO 8601.

I have rephrased it as per your suggestion. I will add the link to IETF, ISO official site and RFC vs ISO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting lost in the weeds here. We should focus on RFC3339 and maybe casually remark that, while it overlaps somewhat with ISO8601, it is a different specification with a different set of formats.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we mentioning the ISO standard at all? It isn't actually relevant. There are innumerable other standards that it isn't defined by.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really what I'm trying to get at.


- `"date-time"`: Date and time together, for example, `2018-11-13T20:20:39+00:00`.
- `"time"`: <StarInline label="New in draft 7" /> Time, for example, `20:20:39+00:00`.
- `"date"`: <StarInline label="New in draft 7" /> Date, for example, `2018-11-13`.
- `"duration"`: <StarInline label="New in draft 2019-09" /> A duration as defined by the [ISO 8601 ABNF for "duration"](https://datatracker.ietf.org/doc/html/rfc3339#appendix-A). For example, `P3D` expresses a duration of 3 days.

#### Email Addresses

- `"email"`: Internet email address, see [RFC 5321, section 4.1.2](http://tools.ietf.org/html/rfc5321#section-4.1.2).
- `"idn-email"`: <StarInline label="New in draft 7" /> The internationalized form of an Internet email address, see [RFC 6531](https://tools.ietf.org/html/rfc6531).

#### Hostnames

- `"hostname"`: Internet host name, see [RFC 1123, section 2.1](https://datatracker.ietf.org/doc/html/rfc1123#section-2.1).
- `"idn-hostname"`: <StarInline label="New in draft 7" /> An internationalized Internet host name, see [RFC5890, section 2.3.2.3](https://tools.ietf.org/html/rfc5890#section-2.3.2.3).

#### IP Addresses

- `"ipv4"`: IPv4 address, according to dotted-quad ABNF syntax as defined in [RFC 2673, section 3.2](http://tools.ietf.org/html/rfc2673#section-3.2).
- `"ipv6"`: IPv6 address, as defined in [RFC 2373, section 2.2](http://tools.ietf.org/html/rfc2373#section-2.2).

#### Resource Identifiers

- `"uuid"`: <StarInline label="New in draft 2019-09" /> A Universally Unique Identifier as defined by [RFC 4122](https://datatracker.ietf.org/doc/html/rfc4122). Example: `3e4666bf-d5e5-4aa7-b8ce-cefe41c7568a`.
- `"uri"`: A universal resource identifier (URI), according to [RFC3986](http://tools.ietf.org/html/rfc3986).
- `"uri-reference"`: <StarInline label="New in draft 6" /> A URI Reference (either a URI or a relative-reference), according to [RFC3986, section 4.1](http://tools.ietf.org/html/rfc3986#section-4.1).
- `"iri"`: <StarInline label="New in draft 7" /> The internationalized equivalent of a "uri", according to [RFC3987](https://tools.ietf.org/html/rfc3987).
- `"iri-reference"`: <StarInline label="New in draft 7" /> The internationalized equivalent of a "uri-reference", according to [RFC3987](https://tools.ietf.org/html/rfc3987).

#### URI Template

- `"uri-template"`: <StarInline label="New in draft 6" /> A URI Template (of any level) according to [RFC6570](https://tools.ietf.org/html/rfc6570). If you don\'t already know what a URI Template is, you probably don\'t need this value.

#### JSON Pointer

- `"json-pointer"`: <StarInline label="New in draft 6" /> A JSON Pointer, according to [RFC6901](https://tools.ietf.org/html/rfc6901). There is more discussion on using JSON Pointer within JSON Schema in [Structuring a complex schema](../../understanding-json-schema/structuring). Note that this should be used only when the entire string contains only JSON Pointer content, e.g., `/foo/bar`. JSON Pointer URI fragments, e.g., `#/foo/bar/` should use `"uri-reference"`.
- `"relative-json-pointer"`: <StarInline label="New in draft 7" /> A [relative JSON pointer](https://tools.ietf.org/html/draft-handrews-relative-json-pointer-01).

#### Regular Expressions

- `"regex"`: <StarInline label="New in draft 7" /> A regular expression that should be valid according to the [ECMA 262](https://www.ecma-international.org/publications-and-standards/standards/ecma-262/) [dialect](../../learn/glossary#dialect). Be careful, in practice, JSON Schema validators are only required to accept the safe subset of [regular expressions](../../understanding-json-schema/reference/regular_expressions) described elsewhere in this document.