Skip to content

http: do not ignore proxy path#1767

Closed
rhendric wants to merge 1 commit intogitgitgadget:masterfrom
rhendric:rhendric/http-proxy-path
Closed

http: do not ignore proxy path#1767
rhendric wants to merge 1 commit intogitgitgadget:masterfrom
rhendric:rhendric/http-proxy-path

Conversation

@rhendric
Copy link

@rhendric rhendric commented Jul 26, 2024

cc: Ryan Hendrickson ryan.hendrickson@alum.mit.edu
cc: Jeff King peff@peff.net

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

Welcome to GitGitGadget

Hi @rhendric, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <revi.ewer@example.com>, Ill Takalook <ill.takalook@example.net>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@dscho
Copy link
Member

dscho commented Jul 26, 2024

/allow

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

User rhendric is now allowed to use GitGitGadget.

WARNING: rhendric has no public email address set on GitHub;
GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

@rhendric
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

Submitted as pull.1767.git.1722009170590.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v1

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ryan Hendrickson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>
> The documentation for `http.proxy` describes that option, and the
> environment variables it overrides, as supporting "the syntax understood
> by curl". curl allows SOCKS proxies to use a path to a Unix domain
> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
> therefore include, if present, the path part of the proxy URL in what it
> passes to libcurl.
>
> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
> ---
>     http: do not ignore proxy path
>     
>      * Documentation: do I need to add anything?

http.proxy documentation says

  The syntax thus is [protocol://][user[:password]@]proxyhost[:port].

but the updated code pays attention to what can come after the
"host[:post]" part, does it not?

> diff --git a/http.c b/http.c
> index 623ed234891..0cd75986a6b 100644
> --- a/http.c
> +++ b/http.c

I am unfamiliar with this code path, so let me follow along aloud.

> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void)
>  		if (!proxy_auth.host)
>  			die("Invalid proxy URL '%s'", curl_http_proxy);

We grabbed the value from the configuration variable (or various
environment variables like $http_proxy) in the curl_http_proxy
variable, and then passed it to credential_from_url() to parse parts
out of [protocol://][user[:password]@]proxyhost[:port][/p/a/t/h].
The parsed result is in proxy_auth struct and there is no hope it
can be usable if the .host member is missing.

> -		curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);

We used to only use the .host member but the curl_http_proxy could
have had a path after it, held in the .path member.

> +		if (proxy_auth.path) {
> +			struct strbuf proxy = STRBUF_INIT;
> +			strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
> +			strbuf_release(&proxy);
> +		} else
> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);

Style.  If "if" side needs {braces} because it consists of multiple
statements, the corresponding "else" side should also have {braces}
around its body, even if it only has a single statement.

If you have the proxy strbuf in a bit wider scope, then the above becomes

	if (proxy_auth.path)
		strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
	else
		strbuf_addstr(&proxy, proxy_auth.host);
	curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
	strbuf_release(&proxy);

which might (I am not decided) be easier to follow.

Could existing users have been taking advantage of the fact that the
extra /path at the end of http.proxy (and $http_proxy and friends)
are ignored?  For them, this change will appear as a regression.

Other than that (and the lack of any documentation and test
updates), looking good.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-07-26 09:29-0700, Junio C Hamano <gitster@pobox.com> sent:

> http.proxy documentation says
>
>  The syntax thus is [protocol://][user[:password]@]proxyhost[:port].
>
> but the updated code pays attention to what can come after the
> "host[:post]" part, does it not?

Correct; I'll add a "[/path]" to that construction.

>> +		if (proxy_auth.path) {
>> +			struct strbuf proxy = STRBUF_INIT;
>> +			strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
>> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
>> +			strbuf_release(&proxy);
>> +		} else
>> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>
> Style.  If "if" side needs {braces} because it consists of multiple
> statements, the corresponding "else" side should also have {braces}
> around its body, even if it only has a single statement.
>
> If you have the proxy strbuf in a bit wider scope, then the above becomes
>
> 	if (proxy_auth.path)
> 		strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
> 	else
> 		strbuf_addstr(&proxy, proxy_auth.host);
> 	curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
> 	strbuf_release(&proxy);
>
> which might (I am not decided) be easier to follow.

For what it's worth, I was following the precedent set by the if-statement starting at line 1256 (a few lines above this patch):

> 		if (strstr(curl_http_proxy, "://"))
> 			credential_from_url(&proxy_auth, curl_http_proxy);
> 		else {
> 			struct strbuf url = STRBUF_INIT;
> 			strbuf_addf(&url, "http://%s", curl_http_proxy);
> 			credential_from_url(&proxy_auth, url.buf);
> 			strbuf_release(&url);
> 		}

<https://github.com/git/git/blob/ad57f148c6b5f8735b62238dda8f571c582e0e54/http.c#L1256>

I have no problem with being inconsistent with surrounding code in these style choices; just let me know what I should do in light of that.

> Could existing users have been taking advantage of the fact that the
> extra /path at the end of http.proxy (and $http_proxy and friends)
> are ignored?  For them, this change will appear as a regression.

That is possible, though I have difficulty imagining a scenario in which it would be intentional.

What do you recommend I do about that possibility?

R

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

User Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> For what it's worth, I was following the precedent set by the
> if-statement starting at line 1256 (a few lines above this patch):

It is worth nothing.  Existing violation does not make it a good
idea to add more of them.  It makes it more work to clean them all
up later.

>> Could existing users have been taking advantage of the fact that the
>> extra /path at the end of http.proxy (and $http_proxy and friends)
>> are ignored?  For them, this change will appear as a regression.
>
> That is possible, though I have difficulty imagining a scenario in
> which it would be intentional.
>
> What do you recommend I do about that possibility?

I have no idea.  I already said that I am not familiar with this
code path, and it is likely I am a worse person than you are to find
a potential creative (ab)uses of the mechanism to take advantage of
the fact that the current code ignores the path part.

Having said that, given that http.proxy falls back to environment
variables that have been established a lot longer than Git itself,
like HTTPS_PROXY etc., if we _know_ that setting HTTPS_PROXY to such
a value with /path part causes curl (or other popular programs) to
try using such a value without stripping the path part (and even
better if that causes them to fail), then such an observation would
make a very good argument why it is extremely unlikely that existing
users used such a setting, hence it is unlikely this patch would
make a regression.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Jul 26, 2024 at 03:52:50PM +0000, Ryan Hendrickson via GitGitGadget wrote:

>      * Tests: I could use a pointer on how best to add a test for this.
>        Adding a case to t5564-http-proxy.sh seems straightforward but I
>        don't think httpd can be configured to listen to domain sockets; can
>        I use netcat?

I don't offhand know of a way to test this without a custom program like
netcat. If it's the only option, it's OK to use tools that might not be
available everywhere as long as the tests are marked as optional with
the appropriate prerequisite. You can find prior art by looking for
test_lazy_prereq calls (e.g., the ones for GZIP or ZIPINFO are pretty
straight-forward).

I would warn that there are several not-quite-compatible variants of
netcat floating around, which can create headaches. You might be better
off with a short perl script using IO::Socket::UNIX or similar.

> diff --git a/http.c b/http.c
> index 623ed234891..0cd75986a6b 100644
> --- a/http.c
> +++ b/http.c
> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void)
>  		if (!proxy_auth.host)
>  			die("Invalid proxy URL '%s'", curl_http_proxy);
>  
> -		curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
> +		if (proxy_auth.path) {
> +			struct strbuf proxy = STRBUF_INIT;
> +			strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
> +			strbuf_release(&proxy);
> +		} else
> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);

The fields in the proxy_auth struct have been parsed from the url, with
any url encoding removed. But then we paste them back together into a
pseudo-url without doing any further encoding. Is that correct?

I doubt that the host contains a "/", but if you had a path that
contained a "%", then the URL form of that is going to be %25. Which is
curl expecting to get here?

I say "pseudo-url" because it is weird to slap a path on the end of the
hostname but leave off the scheme, etc. Which kind of makes me wonder
why we pass proxy_auth.host in the first place, and not simply the
original curl_http_proxy. It looks like a weird interaction between
372370f167 (http: use credential API to handle proxy authentication,
2016-01-26) and 57415089bd (http: honor empty http.proxy option to
bypass proxy, 2017-04-11). The former added a _second_ CURLOPT_PROXY
call, and the latter removed the first one.

I wonder if we could go back to passing the string straight to curl (as
we did prior to 2016), and keeping the proxy_auth struct purely as a
mechanism for gathering credentials. That would presumably fix your use
case. And this is also perhaps an interesting data point for Junio's
question about regressions (if people were passing paths, it used to
work and was broken in 2016).

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

User Jeff King <peff@peff.net> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 26, 2024

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-07-26 17:11-0400, Jeff King <peff@peff.net> sent:

> I would warn that there are several not-quite-compatible variants of
> netcat floating around, which can create headaches. You might be better
> off with a short perl script using IO::Socket::UNIX or similar.

Ah, okay, thanks for the pointer!

>> diff --git a/http.c b/http.c
>> index 623ed234891..0cd75986a6b 100644
>> --- a/http.c
>> +++ b/http.c
>> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void)
>>  		if (!proxy_auth.host)
>>  			die("Invalid proxy URL '%s'", curl_http_proxy);
>>
>> -		curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>> +		if (proxy_auth.path) {
>> +			struct strbuf proxy = STRBUF_INIT;
>> +			strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
>> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
>> +			strbuf_release(&proxy);
>> +		} else
>> +			curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>
> The fields in the proxy_auth struct have been parsed from the url, with
> any url encoding removed. But then we paste them back together into a
> pseudo-url without doing any further encoding. Is that correct?
>
> I doubt that the host contains a "/", but if you had a path that
> contained a "%", then the URL form of that is going to be %25. Which is
> curl expecting to get here?

Oh, I see! Yes, this is an issue with my patch: if I create a socket file named "%30", command-line curl wants http_proxy to contain "%2530", and patched Git wants http_proxy to contain "%252530". Good edge case to put in a test.

> I wonder if we could go back to passing the string straight to curl (as
> we did prior to 2016), and keeping the proxy_auth struct purely as a
> mechanism for gathering credentials.

Hmm, that would be nice, but I think curl doesn't deal well with the extra case that Git supports of specifying a username but no password. It causes one of the existing tests to fail if I pass the string straight through.

On top of that, all of those starts_with tests for checking the protocol are written quite loosely, so in practice Git "supports" the protocols "socks://" and "socksonmyfeet://" by mapping them both to SOCKS4, and curl would not like it if it received those strings directly.

So given that Git wants to handle the protocol and the credentials, it makes sense that only the host and the path are passed to curl. I just have to make sure that they are correctly re-encoded.

R

@rhendric rhendric force-pushed the rhendric/http-proxy-path branch 2 times, most recently from 2091b59 to 372f6e9 Compare July 27, 2024 02:54
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 27, 2024

There are issues in commit b445c4c:
DROP ME: trying to discover why this test fails in CI
Commit checks stopped - the message is too short
Commit not signed off

@rhendric rhendric force-pushed the rhendric/http-proxy-path branch 4 times, most recently from 9ebc260 to b4715eb Compare July 27, 2024 06:11
@rhendric
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 27, 2024

Submitted as pull.1767.v2.git.1722062682858.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v2

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 29, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Jul 26, 2024 at 06:43:29PM -0400, Ryan Hendrickson wrote:

> > I wonder if we could go back to passing the string straight to curl (as
> > we did prior to 2016), and keeping the proxy_auth struct purely as a
> > mechanism for gathering credentials.
> 
> Hmm, that would be nice, but I think curl doesn't deal well with the extra
> case that Git supports of specifying a username but no password. It causes
> one of the existing tests to fail if I pass the string straight through.

OK, thanks for trying. It would have been nice, but I'm not surprised
that there's an unusual interaction.

> On top of that, all of those starts_with tests for checking the protocol are
> written quite loosely, so in practice Git "supports" the protocols
> "socks://" and "socksonmyfeet://" by mapping them both to SOCKS4, and curl
> would not like it if it received those strings directly.
>
> So given that Git wants to handle the protocol and the credentials, it makes
> sense that only the host and the path are passed to curl. I just have to
> make sure that they are correctly re-encoded.

Makes sense.

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 29, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Sat, Jul 27, 2024 at 06:44:42AM +0000, Ryan Hendrickson via GitGitGadget wrote:

>     Re earlier discussion about regressions, it turns out that curl has only
>     supported this syntax since 2022 [https://curl.se/ch/7.84.0.html].
>     Earlier versions of curl, if provided an http_proxy that contained a
>     path, would also ignore it. curl didn't seem to consider this a breaking
>     change when the feature was introduced
>     [https://github.com/curl/curl/pull/8668], though; is that a valid
>     argument for Git not to lose sleep over it either?

IMHO it is probably OK. The path did not do anything before then, so why
would anybody pass it?

Looking into the curl support, I did notice two things:

  - unix sockets are only supported for socks proxies. Is it meaningful
    to have a path on an http proxy? I don't think so (there is no place
    to send it in the "GET" line). But perhaps we should limit the new
    code only to socks.

  - curl says you should put "localhost" in the host field for this,
    though obviously it is not otherwise used. Should we likewise
    require that to kick in the code?

> @@ -1265,7 +1266,24 @@ static CURL *get_curl_handle(void)
>  		if (!proxy_auth.host)
>  			die("Invalid proxy URL '%s'", curl_http_proxy);
>  
> -		curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
> +		strbuf_addstr(&proxy, proxy_auth.host);
> +		if (proxy_auth.path) {

There's one gotcha here I wondered about: we usually throw away the path
element, unless credential.useHTTPPath is set. That only happens after
we load the per-credential config, though, via credential_apply_config(),
which in turn is triggered by a call to credential_fill(), etc.

I think this is OK here, since we have just called credential_from_url()
ourselves, and the earliest credential_fill() we'd hit is from
init_curl_proxy_auth(), which is called right after the code you're
touching.

> +			int path_is_supported = 0;
> +			/* curl_version_info was added in curl 7.10 */
> +#if LIBCURL_VERSION_NUM >= 0x070a00
> +			curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
> +			path_is_supported = ver->version_num >= 0x075400;
> +#endif

We've been trying to keep our curl version #ifdefs in git-curl-compat.h,
with human-readable names. But in this case, I think we can ditch it
entirely, as we require curl 7.21.3 or later already.

This will be the first time we check curl_version_info() instead of a
build-time option. I think that makes sense here. Most features require
extra information at build-time (e.g., CURLOPT_* constants), but in this
case it is purely a check on the runtime behavior.

We perhaps could get away with just letting an older curl quietly ignore
the path field, but what you have here is probably a bit friendlier for
users.

> diff --git a/t/socks5-proxy-server/src/socks5.pl b/t/socks5-proxy-server/src/socks5.pl
> new file mode 100755
> index 00000000000..3ef66a1a287
> --- /dev/null
> +++ b/t/socks5-proxy-server/src/socks5.pl

This is a lot more code than I was hoping for. There are definitely
parts of it we don't care about (e.g, threading, handling multiple
connections at once) that could be pared down a bit. I don't think we
particularly care about auth either (we just want to make sure we talk
to the unix-socket proxy at all).

What if we switched to socks4, which drops all of the auth bits and
supports only IPs, not hostnames (that came in socks4a). This is the
shortest I could come up with (only lightly tested):

-- >8 --
use strict;
use IO::Select;
use IO::Socket::UNIX;
use IO::Socket::INET;

my $path = shift;

unlink($path);
my $server = IO::Socket::UNIX->new(Listen => 1, Local => $path)
	or die "unable to listen on $path: $!";

$| = 1;
print "ready to rumble\n";

while (my $client = $server->accept()) {
	sysread $client, my $buf, 8;
	my ($version, $cmd, $port, $ip) = unpack 'CCnN', $buf;
	next unless $version == 4; # socks4
	next unless $cmd == 1; # TCP stream connection

	# skip NUL-terminated id
	while (sysread $client, my $char, 1) {
		last unless ord($char);
	}

	# version(0), reply(5a == granted), port (ignored), ip (ignored)
	syswrite $client, "\x00\x5a\x00\x00\x00\x00\x00\x00";

	my $remote = IO::Socket::INET->new(PeerHost => $ip, PeerPort => $port)
		or die "unable to connect to $ip/$port: $!";

	my $io = IO::Select->new($client, $remote);
	while ($io->count) {
		for my $fh ($io->can_read(0)) {
			for my $pair ([$client, $remote], [$remote, $client]) {
				my ($from, $to) = @$pair;
				next unless $fh == $from;

				my $r = sysread $from, my $buf, 1024;
				if (!defined $r || $r <= 0) {
					$io->remove($from);
					next;
				}
				syswrite $to, $buf;
			}
		}
	}
}
-- >8 --

That's still more than I'd like, but quite a bit smaller. I dunno.
Probably the one you found is more battle-tested, but I really think we
have pretty simple requirements.

> diff --git a/t/t5564-http-proxy.sh b/t/t5564-http-proxy.sh
> index bb35b87071d..bafaa31adf8 100755
> --- a/t/t5564-http-proxy.sh
> +++ b/t/t5564-http-proxy.sh
> @@ -39,4 +39,22 @@ test_expect_success 'clone can prompt for proxy password' '
>  	expect_askpass pass proxuser
>  '
>  
> +start_socks() {
> +	# The %30 tests that the correct amount of percent-encoding is applied
> +	# to the proxy string passed to curl.
> +	"$PERL_PATH" "$TEST_DIRECTORY/socks5-proxy-server/src/socks5.pl" \
> +		"$TRASH_DIRECTORY/%30.sock" proxuser proxpass &
> +	socks_pid=$!
> +}
> +
> +test_lazy_prereq LIBCURL_7_84 'expr x$(curl-config --vernum) \>= x075400'

This is the first time we'd be relying on curl-config in the test suite.
I suspect it would _usually_ work, but I'd worry about a few things:

  1. The Makefile allows for a different path for $(CURL_CONFIG), or
     even skipping it entirely by setting $(CURLDIR).

  2. We'd usually build and test in the same environment, but not
     necessarily. In particular, I know Windows uses separate CI jobs
     for this, and I'm not sure if curl-config will be around in the
     test jobs.

I can see two paths forward:

  a. There's been recent discussion about adding an option for Git to
     report the run-time curl version. That doesn't exist yet, though
     "git version --build-options" will report the build-time version.
     That might be good enough for the purposes of testing a build.

  b. Write the test such that it expects the request to work _or_ we see
     the "version too old" failure.

> +test_expect_success PERL,LIBCURL_7_84 'clone via Unix socket' '

I'm not sure if this PERL prereq is sufficient. We also need to know
that we can make unix sockets. Probably we need to actually run the
socks proxy and make sure it gets to the "starting..." line before
accepting that it works. Which also gets us to...

> +	start_socks &&
> +	test_when_finished "rm -rf clone && kill $socks_pid" &&
> +	test_config_global http.proxy "socks5://proxuser:proxpass@localhost$PWD/%2530.sock" &&
> +	GIT_TRACE_CURL=$PWD/trace git clone "$HTTPD_URL/smart/repo.git" clone &&

This is racy. We started the proxy in the background, but we don't know
that it's ready to accept connections by the time we run "git clone". My
first few runs all failed, but putting a "sleep 1" in between fixed it
(which obviously is not a real fix, but just a debugging aid).

If you usually see success, try running the test script with "--stress"
to create extra load, and you should see failures.

The usual trick here is to start the process in the background, and then
in the foreground wait for some "I'm ready" output, which involves ugly
tricks with fifos. There's prior art in lib-git-daemon.sh (though it's a
little more complicated there, because we want to relay its stderr, too,
but with a script under our control we can just put the ready message on
stdout).

So coupled with the prereq fix that I mentioned above, you might be able
to do something like (totally untested):

  start_socks_proxy () {
	mkfifo socks_output &&
	{
		perl proxy.pl ... >socks_output &
		socks_pid=$!
	} &&
	read line <socks_output &&
	test "$line" = "Starting..."
  }

  test_expect_success 'try to start socks proxy' '
	if start_socks_proxy
	then
		test_seq_prereq SOCKS_PROXY
	fi
  '

  test_expect_success SOCKS_PROXY 'clone via unix socket' '
	...
  '

-Peff

@rhendric rhendric force-pushed the rhendric/http-proxy-path branch 6 times, most recently from 58800f3 to 7a58da7 Compare July 31, 2024 15:30
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 12:28-0700, Junio C Hamano <gitster@pobox.com> sent:

>>>> Is this blocking feedback? This strikes me as speculative
>>>> over-engineering
>>>
>>> No, it is loosening a pattern that is overly tight and as a side
>>> effect shortening the line to more readable length ;-).
>>
>> Blocking or not?
>
> If we are updating anyway, that question is irrelevant, no?  This
> version may hit 'seen' but until the next version comes it will not
> advance to 'next'.

I can't figure out what you mean by this so I am going to proceed as if you had simply said οΏ½non-blockingοΏ½.

R

The documentation for `http.proxy` describes that option, and the
environment variables it overrides, as supporting "the syntax understood
by curl". curl allows SOCKS proxies to use a path to a Unix domain
socket, like `socks5h://localhost/path/to/socket.sock`. Git should
therefore include, if present, the path part of the proxy URL in what it
passes to libcurl.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
@rhendric rhendric force-pushed the rhendric/http-proxy-path branch from fa101a3 to b136f60 Compare August 2, 2024 19:45
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> At 2024-08-02 12:28-0700, Junio C Hamano <gitster@pobox.com> sent:
>
>>>>> Is this blocking feedback? This strikes me as speculative
>>>>> over-engineering
>>>>
>>>> No, it is loosening a pattern that is overly tight and as a side
>>>> effect shortening the line to more readable length ;-).
>>>
>>> Blocking or not?
>>
>> If we are updating anyway, that question is irrelevant, no?  This
>> version may hit 'seen' but until the next version comes it will not
>> advance to 'next'.
>
> I can't figure out what you mean by this so I am going to proceed as
> if you had simply said ‘non-blocking’.

It does not make much sense to ask if a suggestion is "blocking" or
"non-blocking".  If you respond with a reasonable explanation why
you do not want to take a suggestion, I may (or may not) say that
your reasoning makes sense.  IOW, making me say "it is blocking"
means you want to me to say that I won't listen to you no matter
what you say.  That is rarely be the case.

In this case, I do not think it makes sense to insist with -Fx that
the error message has the exact message.  And I do not think your
"strikes me as" qualifies as a "reasonable explanation".

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 14:13-0700, Junio C Hamano <gitster@pobox.com> sent:

>>>>>> Is this blocking feedback? This strikes me as speculative
>>>>>> over-engineering
>>>>>
>>>>> No, it is loosening a pattern that is overly tight and as a side
>>>>> effect shortening the line to more readable length ;-).
>>>>
>>>> Blocking or not?
>>>
>>> If we are updating anyway, that question is irrelevant, no?  This
>>> version may hit 'seen' but until the next version comes it will not
>>> advance to 'next'.
>>
>> I can't figure out what you mean by this so I am going to proceed as
>> if you had simply said ‘non-blocking’.
>
> It does not make much sense to ask if a suggestion is "blocking" or
> "non-blocking".  If you respond with a reasonable explanation why
> you do not want to take a suggestion, I may (or may not) say that
> your reasoning makes sense.  IOW, making me say "it is blocking"
> means you want to me to say that I won't listen to you no matter
> what you say.  That is rarely be the case.

I want you to do what Documentation/ReviewingGuidelines.txt says reviewers should do:

- When providing a recommendation, be as clear as possible about whether you
  consider it "blocking" (the code would be broken or otherwise made worse if an
  issue isn't fixed) or "non-blocking" (the patch could be made better by taking
  the recommendation, but acceptance of the series does not require it).
  Non-blocking recommendations can be particularly ambiguous when they are
  related to - but outside the scope of - a series ("nice-to-have"s), or when
  they represent only stylistic differences between the author and reviewer.

Because I would rather not have what is likely to be a highly subjective argument about this particular choice in a test script if we don't have to have one.

I would also rather not put time into figuring out how best to rename the function "old_curl_version" if it no longer checks for the particular error produced when the curl version is too old, nor would I want to think ahead about whether it is correct for these tests that use this function to continue to pass if different variations on this error are raised. There is one qualifying error currently, and that's what the test matches against. Matching against hypothetical future errors is speculative overengineering if it is not obvious how the errors will vary and what it may mean if they do.

R

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> I would also rather not put time into figuring out how best to rename
> the function "old_curl_version" if it no longer checks for the
> particular error produced when the curl version is too old,

Now, that is a good argument to make sure the "libcurl 7.84 or later"
I suggested to omit for the sake of brevity is in the pattern.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> what the test matches against. Matching against hypothetical future
> errors is speculative overengineering if it is not obvious how the
> errors will vary and what it may mean if they do.

The easiest you can imagine is that it turns out the cut-off version
of cURL for the feature turned out to be not 7.84, or versions of
cURL shipped by some distros have backports of the feature to an
older version that explicitly naming 7.84 causes more confusion than
naming the exact feature we rely on, and we end up rephrasing the
error message.  We have done such changes in the past, so it is not
really speculating "hypothetical future errors", but applying common
sense gained over years working on this project.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 14:47-0700, Junio C Hamano <gitster@pobox.com> sent:

> Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:
>
>> what the test matches against. Matching against hypothetical future
>> errors is speculative overengineering if it is not obvious how the
>> errors will vary and what it may mean if they do.
>
> The easiest you can imagine is that it turns out the cut-off version
> of cURL for the feature turned out to be not 7.84, or versions of
> cURL shipped by some distros have backports of the feature to an
> older version that explicitly naming 7.84 causes more confusion than
> naming the exact feature we rely on, and we end up rephrasing the
> error message.  We have done such changes in the past, so it is not
> really speculating "hypothetical future errors", but applying common
> sense gained over years working on this project.

Okay, so is it a problem to also change the message in the test if that happens? My concern is that if I pick some fragment of the message to elide in the test, the message could still get reworded in a way that requires the test to be changed anyway. Even if not, the comment above it would likely need revision too; and if the test doesn't fail, whoever amends the message is unlikely to notice this.

The way the test is written in the current version of the patch, there is no ambiguity about what is being tested and what would need to change if the message changes. Future contributors can grep the code for references to that error message and find this test regardless of which part and how much of the message they choose to grep for. Shaving a dozen or so characters out of line is not more important than those considerations, is it?

R

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2024

This patch series was integrated into seen via git@d2bad54.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 3, 2024

This patch series was integrated into seen via git@2a3761c.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2024

This branch is now known as rh/http-proxy-path.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2024

This patch series was integrated into seen via git@9b3fcf1.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2024

This patch series was integrated into next via git@d6f1fb1.

@gitgitgadget gitgitgadget bot added the next label Aug 5, 2024
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2024

There was a status update in the "Cooking" section about the branch rh/http-proxy-path on the Git mailing list:

The value of http.proxy can have "path" at the end for a socks
proxy that listens to a unix-domain socket, but we started to
discard it when we taught proxy auth code path to use the
credential helpers, which has been corrected.

Will merge to 'master'.
source: <pull.1767.v5.git.1722576007398.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2024

This patch series was integrated into seen via git@a19c007.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 8, 2024

This patch series was integrated into seen via git@3d00122.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 8, 2024

This patch series was integrated into seen via git@a97c000.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 9, 2024

There was a status update in the "Cooking" section about the branch rh/http-proxy-path on the Git mailing list:

The value of http.proxy can have "path" at the end for a socks
proxy that listens to a unix-domain socket, but we started to
discard it when we taught proxy auth code path to use the
credential helpers, which has been corrected.

Will merge to 'master'.
source: <pull.1767.v5.git.1722576007398.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 9, 2024

This patch series was integrated into seen via git@eb2b240.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2024

This patch series was integrated into seen via git@64e6fac.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2024

There was a status update in the "Cooking" section about the branch rh/http-proxy-path on the Git mailing list:

The value of http.proxy can have "path" at the end for a socks
proxy that listens to a unix-domain socket, but we started to
discard it when we taught proxy auth code path to use the
credential helpers, which has been corrected.

Will merge to 'master'.
source: <pull.1767.v5.git.1722576007398.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 15, 2024

There was a status update in the "Graduated to 'master'" section about the branch rh/http-proxy-path on the Git mailing list:

The value of http.proxy can have "path" at the end for a socks
proxy that listens to a unix-domain socket, but we started to
discard it when we taught proxy auth code path to use the
credential helpers, which has been corrected.
source: <pull.1767.v5.git.1722576007398.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into seen via git@4bad011.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into master via git@4bad011.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into next via git@4bad011.

@gitgitgadget gitgitgadget bot added the master label Aug 17, 2024
@gitgitgadget gitgitgadget bot closed this Aug 17, 2024
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2024

Closed via 4bad011.

@rhendric rhendric deleted the rhendric/http-proxy-path branch August 17, 2024 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants