-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Either $ref resolution doesn't work, or $id is ignored. #81
Comments
Couple new points:
(Note: Neither In other words, the only working scenario is if the This is very much not correct resolution behavior. From 8.2.1 (emphasis mine; first two quoted paras included to give context to third):
And 9.1.1 echoes this:
Also, from 9.1.2 (emphasis mine):
Furthermore, from 9.2 (emphasis mine):
In other words, unless there is a valid reason in some specific circumstance, resolution (for local files) is supposed to work like this:
And the implication is that the documents should be loaded first, before $ref'ed schemas are resolved (or at least, in an appropriate order, circular refs notwithstanding), so that the canonical URIs can be determined and mapped to the appropriate [sub]schemas. This means, then, that:
So either Wetzel is doing something wrong on the resolution end, or it's just plain ignoring |
It's both. As far as I know, the I cannot point my finger at "the" reason. And I agree that we could consider to make wetzel more compliant to the specification in this regard. But some aspects to keep in mind:
These points may appear to be a bit shallow and handwaving. But maybe some background is relevant here: wetzel was mainly intended for generating the property reference for the glTF schema. The glTF schema uses IDs like It may not be perfect in terms of spec compliance. But it works for glTF and other schemas. An aside: In the refactored state that I pointed to in another issue, I tried to at least carry along some information about the 'base URI' together with the schema. This 'base URI' still consists of a 'directory name' in the current state, but at least, there is a structure for carrying that sort of information, which could either be derived from the |
Maybe you are overthinking it :). The meaning is straightforward and sensible, I think: Ahead of time = before resolving Really, it's pretty much the same set of informational requirements that would be needed to enable handling of circular references, except it would also include this information from some explicit list of available schemas (like
I believe this may be the source of your reservations. The spec is very explicit about this matter: From 8.2.1 (emphasis mine):
From 8.2.3 regarding resolution of references (emphasis mine):
Therefore the premise of the question, "if retrieving a URL indicated by the $id yields a 404 then why should it work?", is not valid: no attempt to access a URL indicated by the $id should ever have been made, and attempts to access $refs by URL are only an optional plan B (see below). That aside, the answer is: because the spec is very clear that the $id defines the canonical URI and that schemas should be identifiable by said URI, and that these URIs (which may not even have addressable schemes like http/file) aren't required to be the retrieval URLs. I am not sure which draft made that explicitly clear but it would've been around draft 7. Draft 4 is where the role of ID was clarified and the idea of a "resolution scope" was introduced. The resolution scope was never linked to the retrieval URL; clarification on the "network operation" point as well as the idea of internal vs. external references was added later, but the intent was there in 04. Note also these are URIs, not URLs, after all. The difference is that a URI (uniform resource identifier) names a resource without necessarily giving it a location, while a URL (uniform resource locator) provides a path to obtain the resource. As an aside, I think the most confusing bit is that it's just become almost ubiquitous to use the "http" scheme in arbitrary URIs, and so examples become misleading. Personally, I think that they should've registered e.g. a "schema" URI scheme and stuck with that. It's for that reason that I actually prefer to use "xri" for identifiers where possible rather than "http". In fact, I may make a proposal along those lines, or to at least switch some of the examples over to a different scheme. Additionally, you write:
In fact,
Heh, yeah; and it doesn't help that they seem to be huge fans of "SHOULD" instead of "MUST"... Still, the specification and behavior of "id" has been present since draft-03 and the basic definition "id identifies the schema, ref uses those ids in resolution" has never changed:
So, really, the modern behavior dates back to draft-04 or draft-05 (depending on what was taken to be implied in 04), and ids themselves go back to 03. In other words,
And that would normally be totally fine -- Wetzel isn't obligated to do anything for strangers, heh -- except (and this is a lot of the motivation behind my post) that json-schema.org includes Wetzel in its documentation generator implementation list. Additionally, it states:
Of course "some limitations" is completely reasonable, but schema IDs are a fundamental feature of JSON Schema. In my opinion, having a lack of ID support is somewhat (granted this is an exaggeration) like saying "Wetzel supports all versions of the spec, with some limitations" where "some limitations" includes inability to parse JSON. :) IDs are in fact so fundamental the "identifier" is listed as one of the primitive keyword categories. Described in 7.4 as:
So e.g. in the Wetzel readme: "Currently it accepts JSON Schema drafts 3, 4, 7, and 2020-12" could be seen as misleading: Since it's missing a fundamental feature, it could be compellingly argued that it doesn't accept any of those drafts, even "with limitations". To be clear, my intent isn't to dish out negative criticism or make demands. What I mean is: Wetzel can obviously do whatever you want it or need it to do, but I strongly feel that if it's not going to be given some more compliant behavior, then it at least ought to be removed from json-schema.org's front page given its current level of compliance.
That's definitely helpful. Essentially, the following URIs are defined:
And resolution is performed by:
Much of the above is actually defined in the URI RFC. Incidentally, AJV's Also note that the specific basic case of ...
... still works fine under the full resolution scheme, as all reference URIs would resolve to absolute URIs in the filesystem since root schema base URIs are their retrieval URIs when no glTFAs for glTF, those schemas are technically non-compliant with the current draft. In particular, note that 8.2.1.1 specifically calls for all root schema documents to not only contain an
Not only is that technically recommended, but in practice it becomes important in complex environments: If the glTF schema exists on a system with many other schemas and applications, then it is important for the glTF schemas to have absolute identifiers -- that is, those schemas can be referenced by IDs that do not depend on their location. There are many reasons for this that I won't go into since this is pretty long already. Also, in addition to location-independence, absolute URIs also of course provide namespacing. Now, the thing here is: While you can currently say, "well, it works" (and that's fine), I could very reasonably go over to the glTF issue page and request that their schemas be given absolute URIs (don't worry, I won't, that'd be kind of a dick move given the current conversation, 😂). This request would be entirely justified and theoretically easy to implement; but it would not be possible given Wetzel's compliance level. The thing is: the glTF schema in its current form works with Wetzel and if it works, it works; but, otoh, the glTF schema will always remain in a form that coincides with Wetzel's limitations, because it would be silly to break a working system. That is, if you were to say "the primary motivation to update Wetzel is to keep up with glTF schema support", that equates to not being a motivation to update Wetzel, as the glTF schema is unlikely to change in a way that would require Wetzel to be updated (the path of least resistance is to just force glTF into Wetzel-compliant form). Anyways... the TL;DR is that IDs are pretty fundamental and have been in the spec for a while, and even though Wetzel + glTF can work together without them, it would greatly improve Wetzel's usability outside of glTF and the most basic schemas. |
And I thought that my issue comments were long 😌
I might be overthinking this, but I have seen too many effects of 'underthinking', and this may just be a countermeasure. If you think that you can implement "The Right Solution®", then feel free to open a pull request. As long as the updated state is still generating the same output for glTF, the repository maintainers will probably be willing to merge it. But if you try, just a word of warning:
The
I'm roughly (!) aware of some of these caveats. I occasionally looked at https://json-schema.org/understanding-json-schema/structuring.html , which explains some of these concepts on a slightly less formal way than the specs that you linked to (but I won't claim to have thoroughly understood all that, and admit that I did not read the technical version of the specs and all the RFCs that are necessary to really understand that). My (somewhat shallow) understanding seems to be in line with what you said in a more profound and elaborate form. Roughly:
So this still leaves the question open about where and how exactly a
but doing that in a 'spec-compliant' way that works in all cases that are covered by the spec, and (!) in all cases that appear in the real world can be difficult. Imagine you find a real-world schema that contains a
You could argue that this is wrong, and it should use a proper ID (and that's correct). But that's not what's happening. So where, exactly, is the An aside: All this does not yet address the issue of fragments in
is not entirely trivial. (Some related code is in some branch, but again: This is faaar from perfect - it just 'worked for me', as far as I needed it...) You seem to read the specs on a more detailed level than I do. So maybe I can throw in that random question here, which I carved out as some sort of "quiz". Consider the following schema:
It defines Now consider this one, "extending" it (even though there is no real 'inheritance' going on) :
It defines Question 1: What type may the additional properties have so that they conform to the second schema?
Question 2: Are you sure about your answer to question 1.?
I have read this 'change log', but admittedly, I will not read through each link. But a big 👍 for that nevertheless, because I might take a closer look at the links when this becomes immediately relevant for my work, and in any case, it is a useful overview (maybe for the case that someone wants to support multiple draft versions). To summarize it, subjectively: The
This has never been followed in glTF, and it was never implemented in wetzel.
That's all fine for me. I'm also only a user of wetzel. It does what it was intended for, but there are many aspects of the JSON schema that it did never handle correctly, and many aspects that it did never handle at all (roughly: because it wasn't necessary for glTF). Or to put it that way: Wetzel
I went through some of these steps/approaches while I tried to use wetzel for a more complex schema. I originally tried to do these changes incrementally, in a somewhat backward-compatible way. But at some point, I had to 'burn some bridges', because the necessary changes completely changed the original implementation, and of course, the refactored state is still far from perfect, and vastly different from something that one could do when...
I also considered to use the
I occasionally looked at AJV. It is a project with ~11000 stars, ~2600 commits, ~150 releases, billion-dollar companies as sponsors, 180 contributors, (and still, 169 open issues and 29 pending pull requests). It's an entirely different category of project than wetzel. One may find some "inspiration" there, in terms of spec-compliant handling of details like
I just did that dick move: KhronosGroup/glTF#2182 . It is a valid point, so why not. The fact that glTF and wetzel are somewhat "coupled" should not prevent |
I used to be self-conscious about my long comments, but now it's just 🤷♂️, haha. I actually edited it down. Anyways, I will 100% read your reply and address what I can; at the moment I accidentally went down a bit of a rabbit hole. You might be interested in the active conversations at https://github.com/orgs/json-schema-org/discussions/197 -- in particular the thread starting from here. |
I skimmed over that thread, but may have to re-read it (and some of the spec references mentioned here and there) to get a clearer picture. A very high-level recommendation seems to be: "Rely on the That certainly could simplify some structures and the implementation tremendously (sorting out actual responsibilities of code paths - divide et impera). And as I mentioned above: I tried that, to some extent - essentially, to populate the But (with a bit of handwaving: ) given the lack of 'proper' IDs in real schemas, and the existing lookup mechanisms in wetzel, and the difficulties of sorting out and carrying along actual 'retrieval URIs', possible ambiguities for |
Wetzel version: Whatever it is in git right now.
OS: Windows 10
Node: 16.13.1
Given the following two schemas placed in a subdirectory named schemas:
schemas\a.json:
schemas\b.json:
When I run:
Wetzel fails with:
Why isn't it loading a.json and how do I make it find the references? Is my understanding of the
-i
option incorrect?I tried hand-wavily adding
-s schemas
as well, but the result was the same.Thanks!
The text was updated successfully, but these errors were encountered: