Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg: Use filtered_formula to represent dependencies #10918

Merged
merged 19 commits into from
Nov 3, 2024

Conversation

Leonidas-from-XIV
Copy link
Collaborator

At the moment when an OPAM package in the local packages we only accept a list of Atoms, so if it has an disjunction in the dependencies we fail when loading as we attempt to parse the list of Atoms into a Dependency_set.t structure. This structure only supports a sequence of conjunctions (as that is the only syntax that dune-project files support at the moment).

This PR changes the internal representation of the dependencies to be OpamTypes.filtered_formula as when reading OPAM files this is what we get from the OPAM file parser. When we pass the data to the solver it is also OpamTpyes.filtered_formula that we have to encode back to. Thus I suggest keeping OpamTypes.filtered_formula as dependency specification and compute the information we need from it directly. The information is fairly limited as we only need one example package name for an error message and a hash of the dependencies.

The downside of this is that all existing dependency hashes change (see tests), as the representation of the package constraints the hash is calculated over is changing, but I consider this an acceptable compromise.

Fixes #10837

Copy link
Collaborator

@ElectreAAS ElectreAAS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests expectedly fail with hash incongruences, but some others fail with this:

-  File "x.opam", line 1, characters 0-0:
-  Error: Expected formula to be a conjunction of atoms but encountered non-atom
-  term 'a | b'
+  Error: Unable to solve dependencies for the following lock directories:
+  Lock directory dune.lock:
+  Can't find all required versions.
+  Selected: x.dev a
+  - a -> (problem)
+      No known implementations at all

Shouldn't these tests now pass with a proper build?

src/dune_pkg/local_package.ml Outdated Show resolved Hide resolved
@Leonidas-from-XIV
Copy link
Collaborator Author

@ElectreAAS The failure you quoted makes sense, because the test in opam-file-errors.t does not define a package a or b. To merge this PR I think the best way would be to change it to to define a (and maybe b) and show that these are supported now. The other tests in the file can be deleted to as we don't convert filtered_formulas into dune conditions anymore, so the error case that this is exercising would not happen anymore.

The hash changes are an unfortumate side-effect as the non-local dependency set is calculated from a different representation and bar of implementing the OPAM filters into Dune's condition language and then hashing that there is no good way of preserving the previous hashes (or running a hybrid approach of encoding them one way if they can and another if they cannot be represented using dune-lang conditions). I don't think we guarantee that the hashes are stable between versions of Dune.

@Leonidas-from-XIV
Copy link
Collaborator Author

I've replaced the failing test with one that shows that disjunctions in opam files lead to valid lock directory solutions using either side of the disjunction.

@ElectreAAS
Copy link
Collaborator

Oh nice! Thanks for updating the test

let deps =
List.map pkg.dependencies ~f:(fun (d : Package_dependency.t) -> d.name)
in
let deps = Local_package.For_solver.dependency_names pkg in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this wrong? You should not consider packages that haven't been selected for the reachability check. This would mean that you're including packages in the lock directory that are not needed for the actual build. This is a good thing to add a test for as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're completely right, thanks for pointing this out. This code needs to be changed because it uses the input to the solver as the output instead of using the solution that the solver outputs.

I'll add a test that checks that dependencies that don't get picked by the solver don't end up in the lock directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking what the semantics of reachable is and according to #10814 it is about removing post dependencies (and also post-deps-solving.t is the only test failing if I comment out the reachability test).

Since this is basically about filtering the solver output via the possible deps from the input, how about still collecting all possible packages (since they are potentially reachable) except those that are post and using that for reachability?

That way packages that the solver is not picking will not be part of the lock dir (as the solution just does not contain them) and post packages won't be picked either (as they will be filtered out by the reachability check). This makes post a bit special but post is already a magic variable in opam.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test showing an unnecessary post dependencies being eliminated.

how about still collecting all possible packages (since they are potentially reachable) except those that are post and using that for reachability?

I think it's just simpler to do this without any hacks. We've already resolved the or deps when writing the dependency list in the lock directory. Might as well just use that here instead of the formula

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I'm thinking of just using this snippet to evaluate the formula:

      Resolve_opam_formula.filtered_formula_to_package_names
        ~with_test:false
        (add_self_to_filter_env opam_package (Solver_env.to_env solver_env))
        version_by_package_name
        what

We already do this for external packages. No reason why we can't do it for local packages for the reachability check. You might want to wait until #11022 is merged though, as I'm going to move this snippet somewhere where it can be shared.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the comment. Ok, I'll try to evaluate the formula instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation of filtered_formula_to_package_names makes me question whether this is the right thing to use:

Given a map associating package names with the version of that package in a solution, this function evaluates an OpamFormula.t to a concrete list of package names all contained in the solution with versions which satisfy the formula, or an error if the the version constraints or packages in the formula can't be satisfied with the packages in the solution. If the formula contains disjunctions (the | operator in opam) then there may be multiple sets of packages that satisfy the formula. In this case the choice of which set to return is made arbitrarily.

This sounds to me that if the solver chooses one side of the disjunction and filtered_formula_to_package_names the other, this will lead to all the solver-selected legitimate dependencies being filtered out. Or, if I pass in the result from the solver as the package name map, the function will fail to find a solution.

Copy link
Member

@rgrinberg rgrinberg Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds to me that if the solver chooses one side of the disjunction and filtered_formula_to_package_names the other, this will lead to all the solver-selected legitimate dependencies being filtered out. Or, if I pass in the result from the solver as the package name map, the function will fail to find a solution.

If this is an issue, then it's already present with packages from opam repository. Working around it for local packages defined using the opam syntax is going to be an insufficient work around anyway.

But I think this isn't a huge issue. Rephrased less pessimistically, it tell us that solver can find at least one way to satisfy the disjunction. If there's only one way, we will find it. If there's more than one way, we will pick some arbitrary way - but it will still satisfy the formula. We're only planning to execute one build plan using this solution, so we indeed need only one solution.

I agree that It's rather unsatisfying that we can't tell the user there's more than one solution. I would really prefer to tell the user to write more constraints so that we don't have to choose things arbitrarily. But in the end, only one build plan is executed, and therefore only one side of any disjunction needs to be reachable.

@rgrinberg
Copy link
Member

Looked over the approach again. I agree that emitting a dependency that wasn't selected for the hash check as an example isn't so bad. It's also an improvement to have the dependency hash reflect the formula. There's an issue introduced to the reachability check though. That one should most definitely only consider effective dependency sets. Doing otherwise would include useless packages in the lock dir

@Leonidas-from-XIV Leonidas-from-XIV force-pushed the filtered-formula-passthrough branch 2 times, most recently from 48312d3 to 6848f33 Compare October 17, 2024 15:06
@Leonidas-from-XIV
Copy link
Collaborator Author

That one should most definitely only consider effective dependency sets. Doing otherwise would include useless packages in the lock dir

In my tests this seems to work, as the reachability is tested by filtering over a solver solution, so if the solver doesn't pick it, it isn't ever checked for reachability. That works even in the presence of post dependencies, the post dependency tests all pass.

Dune_sexp.List [ Dune_sexp.atom "or"; l; r ]
;;

let rec to_sexp v =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a shakem to introduce this much boilerplate. I just noticed we have some converters in Opam_dyn. How about we try using that instead of the sexp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works too, I've replaced it with converting to Dyn.t and hashing that. I currently hash it via to_string but maybe a nicer solution would be using Dyn.hash and converting the int into a hex digest?

The `Dependency_set.t` representation can't deal with disjunctions but
in most cases that is not even necessary as the set gets turned into a
`filtered_formula` again. Thus it might be easier to keep the original
representation and implement the necessary dependency set functionality
on top of that.

Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Leonidas-from-XIV and others added 12 commits November 3, 2024 17:27
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
Signed-off-by: Marek Kubica <marek@tarides.com>
_
Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
_
Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
@rgrinberg
Copy link
Member

LGTM. I just replaced the recursion with some existing opam library functions.

@rgrinberg rgrinberg merged commit b409963 into ocaml:main Nov 3, 2024
26 of 27 checks passed
@Leonidas-from-XIV Leonidas-from-XIV deleted the filtered-formula-passthrough branch November 4, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error solving project with deps specified in opam file when deps contain a disjunction
3 participants