Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement installing packages over Git #309

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Bilal2453
Copy link
Contributor

@Bilal2453 Bilal2453 commented Feb 22, 2022

The Pull Request

This is an initial PR of many planned PRs aiming at polishing the Lit experience. In this PR I wrote the basics of getting a lit install https://github.com/author/repo command working, as well as refactoring some of the code, and documenting it. I will go through every little change that I have made, and the expected behavior of such changes, as well as potential bugs that I spotted. But first I will explain how to use this implementation.


How to use it

I wanted to keep this implementation as simple as it could be, to not confuse the end-user and keep it familiar to them. Therefor my choice was to use Git's URL schemes, similar to how git clone URL handle it (and similar to values lit make currently accept). Some examples of the command:

  • lit install git@github.com:SinisterRectus/discordia (over ssh)
  • lit install https://github.com/truemedian/rethink-luvit (over https)
  • lit install https://github.com/Bilal2453/discordia-replies.git (over https, single-file package)

All of the above commands are expected to work. It will fetch the repo tree, calculate the dependencies, install the package correctly and then install the package's dependencies; Even if such dependencies are Git repos too. For example, a possible package.lua:

return {
    name = "author/package",
    version = "0.0.1",
    -- etc
    dependencies = { "git://github.com/Bilal2453/discordia-replies", "https://gitlab.com/Bilal2453/lit-git-test.git" },
}

It is as well expected to work with any Git upstream, in the above example GitLab and GitHub are used.


How it works

While in the process of making this implementation small and simple, I decided to reuse most of the Lit logic regarding Git DB, this implementation will use the Git CLI to fetch the given repo into Lit's DB, then it queries the package in-db (just like Lit already does), then marks the package for installation. The installation process itself is untouched, the behavior from this point is identical to what Lit already does.

Single-file packages though, those have been very problematic to implement since you basically cannot tell a service (GitHub for example) to create tag pointing to the desired blob (how Lit currently does it), making us unable to determine which blob in the tree is the package file. I've been in front of two choices, either I don't implement it, or I search the tree for a Lua file, then if and only if the tree only has a single Lua file, use that as the package. I've decided that "something is better than nothing" and went ahead with latter method, I will have to document this behavior really well, to make sure the end-user will expect this.


Not yet implemented

TL;DR:

  • Implement a way of specifying the targeted branch, or tag, or commit hash (Any or all of them will do)
  • Internally implement Git protocol over HTTPS (smart HTTPS) to be used if Git binary is not available.
  • Handle Git submodules (?).

As you might have noticed, this make use of git binary, that decision was for three reasons, first is lit make already making use of that, second for simplicity and to match the friendly behavior of Git as much as I could, and third me being limited on time. I am planning to implement the Git protocol internally over HTTPS, this will be soon in a future PR.

This implementation will fetch whatever latest default branch is. A way of specifying a commit, or a branch, or even a tag is mandatory. I although have been very conflicted over this and over how I will be implementing it. Since we are supporting all Git accepted URL schemes I am afraid that something similar to lit install URL@tag is going to conflict with the repo URL, and since we also have the SCP-like syntax git@host: being also used.
I basically am not sure what syntax to use for specifying a version(a commit hash, tag, branch, etc), and afraid of introducing a syntax that could potentially conflict with what Git expects/accept. I've thought about implementing an install-git command just to solve this problem (with something like lit install-git repo version, but I will leave deciding this up to you, or to future me.

Lastly, Git submodules. As you can see from the commit history of this PR, I used to actually have --recurse-submodules thinking that was everything into handle submodules... apparently no, handling submodules requires a very specific implementation that I didn't even find at the Git docs, the only mention of it I found was this answer on StackOverflow explaining how it is implemented. It is not terribly hard but it also isn't a straightforward, being tide up to time, I decided to not implement it in this PR, although in future PRs I will be trying to... and perhaps failing.


Potential bugs in Lit

While trying to understand how Lit works, I've had multiple "are you sure?" moments. None of which I fixed or even tried to. Here are ones I remember:

  1. calculate-deps.lua. This indicates that lit install author/ is allowed. And indeed when I tried it with lit install SinisterRectus/ it tried to do... something, not quite sure what it is trying to do (probably install first-to-match package) but it quickly fails with a somewhat unexpected error. Will be opening an issue about it.

  2. calculate-deps.lua. As I understand, this will try to process the dependencies of the package... what happens if two packages lists each other as a dependency? I haven't confirmed this but from what I am reading... it would be stuck in a recursive loop. No checks to prevent this from happening are provided.

  3. core.lua. The field name has a typo, and this field is never used anywhere. Correcting it should break nothing...?

  4. core.lua. I wanted to match Git's accepted scheme for fetch, and found this in core... although I could just take the pattern part and leave the handler one, the name of this field does not imply it is related to lit make nor I think it makes sense to expose it as such.


Changes overview

Here I will list most behavior changes, additions, or deletions I've made to the existence code.

  1. rdb.lua:
    • Added db.offlineLoadAny. Because rdb patches db.load, it makes it impossible to use db.loadAny locally without triggering the upstream matching. This solves it by re-implementing loadAny but with the offloadLoad reference.
  2. pkg.lua:
    • Added pkg.queryGit. It is pretty similar to queryDb except in that it is offline only, and handles single-file packages differently (as mentioned previously).
    • Edited the doc comment to include pkg.queryGit, and to add hash as a return of queryDb.
  3. calculate-deps.lua (PR's main focus):
    • Refactoring the code to stop using nesting functions inside other functions.
    • addDep & processDeps logic is merged together into resolveDeps.
    • Everything is now documented, with hopefully clear comments.
    • Old Lit behavior is untouched, except when package name starts with a Git scheme.
    • Implemented resolveGitDeps that handles git calculations (and fetching).

Those should be all of the changes introduced in this PR, hope I missed nothing. There should be 0 change over how Lit currently work. If you observe any, it is a bug.


Tests

To test this implementation I've made a really small repo at https://github.com/Bilal2453/lit-git-test/ which should be enough to show any odd behavior, it tests old dependencies resolution behavior, single-file package over Git, using multiple Git upstreams (GitHub and GitLab in this test), and rule excluding. Overall, it makes use of 3 upstreams, GitHub, GitLab and Lit.


Finally, feel free to give me your feedback, review the code and test it, and contribute to the implementation. I hope this is up to the task of making Lit even better.

@SinisterRectus
Copy link
Member

Hi, thank you for this PR and for your patience. I am going to go through it one step at a time so that I understand it.

@SinisterRectus
Copy link
Member

Overall I think it looks okay. But this is honestly outside of my understanding. I would legitimately have to rewrite all of lit to figure out how lit and git work.

The single file package problem is unfortunate.

Can you open issues or PRs for the potential bugs that you found? I think they are worth addressing.

@Bilal2453
Copy link
Contributor Author

The single file package problem is unfortunate.

It indeed is, the implementation does its best to solve it but, that is as far as it is possible with Git.

But this is honestly outside of my understanding.

So it is for almost anyone frankly, no one fully understands how Lit (except @creationix perhaps) works, so goes to Git, nor I expect anyone to spend two days reading Lit's source code and Git docs – like I have. Therefore I was thinking of reviewing this PR using its behaviour, heavily testing it to ensure it is behaving, and deciding upon that how the PR should be proceeded.

Although that of course won't tell us anything beyond "Is it working or not", which is sad.

Can you open issues or PRs for the potential bugs that you found?

Surely, I will be glad to fix those and continue working on this PR. Hopefully that will be on the next Sunday.

@SinisterRectus
Copy link
Member

Maybe you can write a test file for this? We have limited testing on lit. Always room for more.

@creationix
Copy link
Member

Nice work! I hope to be able to review this later.

@creationix
Copy link
Member

A way of specifying a commit, or a branch, or even a tag is mandatory. I although have been very conflicted over this and over how I will be implementing it.

The convention used by npm is url#sha1 or url#tag. We can use the same here.

On single-file packages, the git data model supports this only via annotated tags. When you publish to lit, the object is literally a git tag object. Unlike commit objects that can only point to a tree, tags can point to anything (commits, trees, or blobs) You may be able to fetch the tag via the git CLI tool and then read the tag and find the blob using the local git files after fetching.

@creationix
Copy link
Member

Internally implement Git protocol over HTTPS (smart HTTPS)

I've done this before in JavaScript, it's not too terrible. The hardest part is reading the packfile which I seem to remember we already have in the git implementation in lua.

Handle Git submodules (?).

This would be great, what use cases do you have in mind? Is the goal to treat submodules as symlinks basically so their contents are in-place? This could be a way to manage recursive dependencies using only git since luvit's require resolution algorithm looks for libs and deps locally first.

@Bilal2453
Copy link
Contributor Author

Bilal2453 commented Jun 22, 2022

The convention used by npm is url#sha1 or url#tag. We can use the same here.

I will be using something similar here as well then.

On single-file packages, the git data model supports this only via annotated tags. When you publish to lit, the object is literally a git tag object. Unlike commit objects that can only point to a tree, tags can point to anything (commits, trees, or blobs) You may be able to fetch the tag via the git CLI tool and then read the tag and find the blob using the local git files after fetching.

Right, I knew that much. The problem here is the Git repo owner has to create a tag for each release manually and point it to the right blob, sounded to me quite inconvenient, will see what I can do about this later on.

I've done this before in JavaScript, it's not too terrible. The hardest part is reading the packfile which I seem to remember we already have in the git implementation in lua.

Nice. I think this should be my next task here, maybe before that make some tests to ensure default behavior isn't changed. I think it should be fairly straightforward.

This could be a way to manage recursive dependencies using only git since luvit's require resolution algorithm looks for libs and deps locally first.

Yep, that is what I had in mind. A way to basically do recursive dependencies. I can see that being quite useful if the said repo wasn't published to Lit or something along those lines, then you can just have it as a Git submodule. But at the same time I also have some serious concerns.

  1. It will definitely complicate everything. Having to handle how submodules themselves work, make sure we don't run into recursive loops but yet handle legitimate recursive deps, makes it much harder to track what is currently installed, etc.

  2. Dependency hell . This may very much create a dependency hell where some submodule depend on a ton of others that in turn depend on more. Some of which may already be locally installed, but we will have to install it yet again to satisfy the package expectations.

  3. Version conflicts. I can imagine a scenario where some package wants a specific commit of a Git submodule, when there is a locally installed version of that package with a different version. Or two packages that are a git submodule dependency each want their own version of some package.

I will be holding off on this for now. But even in the future, I am not sure how should this look like. Seems like a quite complicated feature that may very well end up un-used/over-used.

cc: @creationix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants