-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #271 from AkashKumar7902/selection-strategy-research
design of selection strategy
- Loading branch information
Showing
3 changed files
with
178 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
<h2>KCL version selection strategy</h2> | ||
|
||
**Author**: Akash Kumar | ||
|
||
**Abstract** | ||
|
||
In all the famous programming languages like go, rust, we cannot have two binaries of the same package with different versions in our system. Because of this, these languages require a selection strategy. But KCL is not a general language and kcl packages are not intended to be executed as a binary, that?s why currently multiple versions of the same package are allowed to reside in the file system. Currently, it seems no version selection should be required. | ||
|
||
Before emphasizing on why a selection strategy is still needed, let's first talk about semantic versioning: | ||
|
||
Semantic versioning (SemVer) is a widely adopted versioning scheme for software that aims to communicate changes in a clear and standardized way. In SemVer, version numbers are composed of three parts: MAJOR.MINOR.PATCH. Major versions signify significant changes, minor versions denote backward-compatible feature additions, and patch versions indicate backward-compatible bug fixes. Crucially, SemVer dictates that once a version is released, subsequent updates within the same major version should not introduce breaking changes. This ensures that users can safely upgrade to newer versions without fear of their software breaking due to unexpected changes. By adhering to SemVer principles, developers can maintain compatibility and predictability in their software releases, fostering smoother adoption and integration for end users. | ||
|
||
All the packages in the kcl ecosystem follow semantic versioning. This means that if two or more versions of the same package is required somewhere in the dependency graph, given the versions don?t differ in MAJOR, then it seems intuitive to only include one version(later) of the package as there is no breaking change between them. This is the reason why we need a selection strategy. | ||
|
||
We propose a minimum version selection strategy for the kpm package manager. Having a minimum version selection strategy would mean that we can have *high-fidelity builds*, in which the dependencies a user builds are as close as possible to the ones the author developed against. | ||
|
||
**Background** | ||
|
||
Here are the shortcomings of the current package manager. | ||
|
||
Case 1: | ||
|
||
Currently if a package has two dependencies which point to two different versions of the same package (Refer to the below dependency graph for clarity) then both versions of the package get downloaded. | ||
|
||
![](/docs/research/dep-graph.png) | ||
|
||
Case 2: | ||
|
||
Currently, if a package is already present in local then also it will redownload it. | ||
|
||
|
||
```bash | ||
$ ls -d /home/akash/.kcl/kpm/k8s* | ||
/home/akash/.kcl/kpm/k8s /home/akash/.kcl/kpm/k8s_1.27 | ||
/home/akash/.kcl/kpm/k8s_1.14 /home/akash/.kcl/kpm/k8s_1.28 | ||
/home/akash/.kcl/kpm/k8s_1.17 /home/akash/.kcl/kpm/k8s_1.29 | ||
``` | ||
```bash | ||
$ kcl mod add k8s | ||
adding dependency 'k8s' | ||
the lastest version '1.29' will be added | ||
downloading 'kcl-lang/k8s:1.29' from 'ghcr.io/kcl-lang/k8s:1.29' | ||
downloading 'kcl-lang/k8s:1.28' from 'ghcr.io/kcl-lang/k8s:1.28' | ||
downloading 'kcl-lang/k8s:1.29' from 'ghcr.io/kcl-lang/k8s:1.29' | ||
add dependency 'k8s' successfully | ||
``` | ||
|
||
Case 3: | ||
|
||
Instead of upgrading all modules, cautious developers typically want to upgrade only one module, with as few other changes to the build list as possible. There is no support for this. Also no current support for downgrading dependency. | ||
|
||
**PROPOSAL** | ||
|
||
We will follow the MVS approach used in go package manager given the fact that the underlying strategy achieves reproducible builds without the need of a lock file. \ | ||
\ | ||
The Go package manager adopts a Minimum Version Selection (MVS) approach to determine which packages to include in the final list for building. MVS aims to create builds that closely mirror the dependencies used by the package author during development. This means that when a user builds a project, the dependencies chosen are as similar as possible to the ones the original author developed against. | ||
|
||
Minimal Version Selection (MVS) operates on the assumption that each module specifies only the minimum versions of its dependencies, adhering to the import compatibility rule where newer versions are expected to be compatible with older ones. This means dependency requirements include only minimum versions, without specifying maximum versions or incompatible later versions. | ||
|
||
version selection strategy is meant to provide algorithms for four operations on build list: | ||
|
||
- **Construct the current build list:** | ||
|
||
The rough build list for package M would be just the list of all modules reachable in the requirement graph starting at M and following arrows. This can be accomplished through a straightforward recursive traversal of the graph, ensuring to skip nodes that have already been visited. The rough built list can then be converted to the final build list. | ||
|
||
- **Upgrade all modules to their latest versions:** | ||
|
||
This can be achieved by running go get -u which will upgrade all the modules to their latest versions. | ||
|
||
Upgrading the modules would mean all arrows in the dependency graph is now pointing to the latest version of the modules. This will result in a upgraded dependency graph but changes in the dependency graph alone won't cause future builds to use the updated modules. To achieve this we need a change in our built list in a way that won't affect dependent packages built list, as upgrades should be limited to our package alone. | ||
|
||
At first glance, it would seem intutive to include all the updated packages in our built list. But, not all packages are necessary and we want to include as few additional modules as possible. To produce a minimum requirement list, an helper algorithm R is introduced. | ||
|
||
**Algorithm R:** | ||
|
||
To compute a minimal requirement list inducing a given build list below the target, reverse postorder traversal is employed, ensuring modules are visited after all those pointing into them. Each module is added only if it's not implied by previously visited ones. | ||
|
||
- **Upgrade one module to a specific newer version:** | ||
|
||
Upgrading all modules to their latest versions can be risky, so developers often opt to upgrade only one module. | ||
|
||
Upgrading one module mean that the arrow which earlier pointed to that module is now pointing to the upgraded version. We can construct a built list from the updated dependency graph, which can then be fed to Algorithm R to get a minimum requirement list. | ||
|
||
- **Downgrade one module to a specific older version.** | ||
|
||
The downgrade algorithm examines each of the target's requirements separately. If a requirement conflicts with the proposed downgrade, meaning its build list contains a version of a module that is no longer allowed, the algorithm iterates through older versions until finding one that aligns with the downgrade. | ||
|
||
Downgrades make changes to the built list by removing requirements. | ||
|
||
**Implementation** | ||
|
||
The already implemented mvs library in go codebase can be reused with few modifications. \ | ||
https://github.com/golang/go/tree/master/src/cmd/go/internal/mvs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# A Tour of various Selection Strategy | ||
|
||
Almost in every Package Manager, there are 4 main actors: | ||
|
||
**Project code** is the code for which we want to manage the dependency. | ||
|
||
**Manifest file** is a file in which the dependencies for the project code are listed. | ||
|
||
**Lock file** is a package manager generated file that contains all the information necessary to reproduce the same dependency tree across any platform. | ||
|
||
**Dependency code** is the fetched code of the resolved dependencies. | ||
|
||
To prevent dependency conflicts, dependency resolution and optimizing dependency tree, selection strategy is used by package manager. | ||
|
||
Lets study the selection strategy of famous package managers: | ||
|
||
### Cargo | ||
|
||
In rust, dependencies are specified in cargo.toml file in the format <name\> = <version\>. [Semver](https://semver.org/) is used when specifying version numbers. | ||
|
||
To update a dependency safely, rust uses the concept of version compatibility. | ||
|
||
Cargo uses Semantic to constrain the compatibility between | ||
different versions of a package. Cargo uses the leftmost nonzero number of the version to determine compatibility, e.g. version numbers 1.0.16 and 1.1.16 are considered compatible, and cargo considers it safe to update in the compatible range, but updates outside the compatibility range are not allowed | ||
|
||
lets see how semantic version requirement is considered during resolution of dependencies. | ||
|
||
• When multiple packages require a common dependency, the resolver aims to ensure they utilize the same version within a SemVer compatibility range, favoring the latest version within that range. For example, if package 1 depends on `foo = "1.0"` and package 2 depends on `foo = "1.1"`, then if the highest version during lock file generation is 1.2.1, both packages will utilize this version. Even if a new version like 2.0.0 is released later, it won't be automatically chosen as it's deemed incompatible. | ||
<br> | ||
• If multiple packages have a common dependency with semver-incompatible versions, then Cargo will allow this, but will build two separate copies of the dependency. | ||
<br> | ||
• If the resolver is constrained to two different versions within the same compatibility range, it will raise an error, as multiple versions within the range are not permitted. | ||
<br> | ||
• Many of the versions in Cargo are pre-releases, which Cargo does not usually use. To use these pre-releases, the user must specify the pre-release version, which often means that it is unstable. | ||
|
||
Cargo's dependency parser considers various factors beyond Semantic Versioning requirements, including package characteristics, dependency types, parser versions, and numerous other rules. | ||
|
||
Running `cargo build` will resolve dependencies listed in the manifest file and save the result in `cargo.lock` file. | ||
|
||
#### Advantages | ||
|
||
• **Compatibility Assurance**: Cargo ensures that dependencies adhere to Semantic Versioning (SemVer) rules, promoting compatibility and reducing potential conflicts between packages. | ||
|
||
• **Integration with Rust Ecosystem**: Cargo is tightly integrated with the Rust ecosystem, facilitating seamless dependency management for Rust projects. Its integration with tools like rustc, the Rust compiler, and rustup, the Rust toolchain installer, enhances developer productivity and simplifies the development workflow. | ||
|
||
#### Disadvantages: | ||
|
||
• **Security Risks in Package Ecosystem**: use of yanked values and unsafe keywords in real-world Rust libraries and applications contribute to these risks. | ||
|
||
• **Dependency Bloat**: In some cases, Cargo's dependency resolution may result in the inclusion of unnecessary or overly large dependencies, leading to increased binary sizes or longer build times. This can impact the performance and efficiency of the final application, especially in resource-constrained environments. | ||
|
||
### Go Package Manager | ||
|
||
The Go package manager adopts a Minimum Version Selection (MVS) approach to determine which packages to include in the final list for building. MVS aims to create builds that closely mirror the dependencies used by the package author during development. This means that when a user builds a project, the dependencies chosen are as similar as possible to the ones the original author developed against. | ||
|
||
Minimal Version Selection (MVS) operates on the assumption that each module specifies only the minimum versions of its dependencies, adhering to the import compatibility rule where newer versions are expected to be compatible with older ones. This means dependency requirements include only minimum versions, without specifying maximum versions or incompatible later versions. | ||
|
||
version selection strategy is meant to provide algorithms for four operations on build list: | ||
|
||
1. Construct the current build list: | ||
|
||
The rough build list for package M would be just the list of all modules reachable in the requirement graph starting at M and following arrows. This can be accomplished through a straightforward recursive traversal of the graph, ensuring to skip nodes that have already been visited. The rough built list can then be converted to the final build list. | ||
|
||
2. Upgrade all modules to their latest versions: | ||
|
||
This can be achieved by running `go get -u` which will upgrade all the modules to their latest versions. | ||
Upgrading the modules would mean all arrows in the dependency graph is now pointing to the latest version of the modules. This will result in a upgraded dependency graph but changes in the dependency graph alone won't cause future builds to use the updated modules. To achieve this we need a change in our built list in a way that won't affect dependent packages built list, as upgrades should be limited to our package alone. | ||
|
||
At first glance, it would seem intutive to include all the updated packages in our built list. But, not all packages are necessary and we want to include as few additional modules as possible. To produce a minimum requirement list, an helper algorithm R is introduced. | ||
|
||
**Algorithm R**: | ||
|
||
To compute a minimal requirement list inducing a given build list below the target, reverse postorder traversal is employed, ensuring modules are visited after all those pointing into them. Each module is added only if it's not implied by previously visited ones. | ||
|
||
3. Upgrade one module to a specific newer version: | ||
|
||
Upgrading all modules to their latest versions can be risky, so developers often opt to upgrade only one module. | ||
|
||
Upgrading one module mean that the arrow which earlier pointed to that module is now pointing to the upgraded version. We can construct a built list from the updated dependency graph, which can then be fed to Algorithm R to get a minimum requirement list. | ||
|
||
4. Downgrade one module to a specific older version. | ||
|
||
The downgrade algorithm examines each of the target's requirements separately. If a requirement conflicts with the proposed downgrade, meaning its build list contains a version of a module that is no longer allowed, the algorithm iterates through older versions until finding one that aligns with the downgrade. | ||
|
||
Downgrades make changes to the built list by removing requirements. |