Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HEAD requestes changed to GET #2107

Closed
b3k opened this issue Oct 4, 2016 · 14 comments
Closed

HEAD requestes changed to GET #2107

b3k opened this issue Oct 4, 2016 · 14 comments

Comments

@b3k
Copy link

b3k commented Oct 4, 2016

It looks like varnish replaces HEAD requests to GET in before vcl_backend_fetch.

Not sure if this is bug or feature, but I didn't found anything about that in documentation.

VCL looks like (less important parts was cut-off):

sub vcl_recv {
set req.http.X-Req-Method-Entry = req.method;
return (hash);
}

sub vcl_backend_response {
set beresp.http.X-Varnish-Method-Entry = bereq.http.X-Req-Method-Entry;
set beresp.http.X-Req-Method = bereq.method;
return (deliver);
}

varnishlog (headers: X-Req-Method-Entry and X-Req-Method):

-   ReqStart       185.24.255.255 16410
-   ReqMethod      HEAD
-   ReqURL         /api/rest/health
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: 178.216.255.255
-   ReqHeader      User-Agent: curl/7.50.3
-   ReqHeader      Accept: */*
-   ReqHeader      X-Forwarded-For: 185.24.255.255
-   VCL_call       RECV
-   ReqUnset       Host: 178.216.255.255
-   ReqHeader      Host: 178.216.255.255
-   ReqURL         /api/rest/health
-   ReqHeader      X-Varnish-URL: /api/rest/health
-   ReqHeader      Surrogate-Capability: key=ESI/1.0
-   ReqHeader      X-Req-Method-Entry: HEAD
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 32771 fetch
-   Timestamp      Fetch: 1475573555.869367 0.079345 0.079345
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Server: nginx
-   RespHeader     Date: Tue, 04 Oct 2016 09:32:35 GMT
-   RespHeader     Content-Type: application/json; charset=utf-8
-   RespHeader     Content-Length: 2
-   RespHeader     X-Powered-By: React/alpha
-   RespHeader     Set-Cookie: frontend=d8ce5dc1d896a1c3ef344f7a1b; expires=Tue, 04 Oct 2016 12:32:35 GMT; max-age=10800; domain=test.net; path=/; httponly
-   RespHeader     X-Varnish-URL: /api/rest/health
-   RespHeader     X-Varnish-Method-Entry: HEAD
-   RespHeader     X-Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
-   RespHeader     X-Req-Method: GET
-   RespHeader     X-Passed: 200
-   RespHeader     X-Varnish: 32770
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespHeader     X-Cache-Hits: 0
-   RespUnset      X-Powered-By: React/alpha
-   RespUnset      Server: nginx
-   RespUnset      X-Varnish: 32770
-   RespUnset      Via: 1.1 varnish-v4
-   RespUnset      X-Varnish-URL: /api/rest/health
-   VCL_return     deliver

My workaround is to add vcl_backend_fetch subroutine and set the bereq.method using X-Req-Method-Entry header set in vcl_recv:

sub vcl_backend_fetch {
set bereq.method = bereq.http.X-Req-Method-Entry;
return (fetch);
}

varnishd version:

/# varnishd -V
varnishd (varnish-4.1.3 revision 5e3b6d2)

OS:

Linux 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
@nigoroll
Copy link
Member

nigoroll commented Oct 4, 2016

Yes, varnish has "always" done this for misses - the idea being that caching the whole object makes more sense than just the headers, but interestingly it actually looks undocumented in the tree, but finding the explanation is easy: http://www.gossamer-threads.com/lists/varnish/misc/14319
Notice that if you actually do want to cache HEADs as separate objects, you need to change the cache key.

hermunn pushed a commit that referenced this issue Oct 13, 2016
(Backport of 2bbcde1 with the fix b698aec squashed in)

doc fixes #2107
@hermunn
Copy link
Member

hermunn commented Oct 13, 2016

Backport review: Backported as c9598de.

@sbraz
Copy link

sbraz commented Dec 6, 2024

Hi,
Would it be possible to add a configuration example to the documentation in addition to this paragraph please?

In particular, to cache non-GET requests, req.method needs to be
saved to a header or variable in :ref:vcl_recv and restored to
bereq.method. Notice that caching non-GET requests typically also
requires changing the cache key in :ref:vcl_hash e.g. by also
hashing the request method and/or request body.

I implemented the following, is that the correct way to do it?

sub vcl_hash {
    hash_data(req.method);
}

sub vcl_recv {
    set req.http.X-Original-Method = req.method;
}

sub vcl_backend_fetch {
    if (bereq.http.X-Original-Method == "HEAD") {
        set bereq.method = bereq.http.X-Original-Method;
    }
}

This seems to work but I have some questions:

  • what causes Varnish to not include the X-Original-Method in requests sent to the backend? This is the expected behaviour for me but I'm curious as to why this happens because it seems like I added a header to the request object.
  • with this setup, if a GET was cached, it can't be used to satisfy subsequent HEAD requests, is there a workaround?
  • the doc mentions "needs to be saved to a header or variable". In the example above, I'm saving the method to a header, how would I save it to a variable?

@dridi
Copy link
Member

dridi commented Dec 9, 2024

Right now headers are the only pseudo-variables we have, and the only thing that transfers between client and backend contexts.

My suggestion to solve this would be a new req.hash_always_get flag (true by default to match the current behavior) to decide whether to turn any cache miss into a GET. We could then add an (optional?) OA_METHOD object attribute to account for the method during lookups.

@gquintard
Copy link
Member

regarding documentation: @sbraz, would you mind having a look at https://www.varnish-software.com/developers/tutorials/caching-post-requests-varnish/ and tell us if it's missing information from your perspective?

@sbraz
Copy link

sbraz commented Dec 10, 2024

@dridi thanks for the explanation. Then why does the doc read "header or variable"? Shouldn't the variable part just be dropped as it could be confusing (it did confuse me)?

My suggestion to solve this

Do you mean to solve the issue of satisfying HEADs from previously cached GETs?

@gquintard, I've read the the documentation and it does cover a similar use case, yes. However, I think having something specific to HEAD requests would still be useful as it seems different.

In the POST example, vcl_recv contains an explicit return (hash), my VCL does not and yet it works, why? Is it because the built-in VCL for vcl_recv already returns hash for HEAD requests but not for POST requests?
Also, in my VCL, there is no need to unset the X-Original-Method header as it gets overwritten in any case, right?

Can you confirm my VCL is valid and could be included in the documentation (maybe just as a snippet here) ?

Sorry to insist but could you also explain what causes Varnish to not include the X-Original-Method in requests sent to the backend? This is the expected behaviour for me but I'm curious as to why this happens because it seems like I added a header to the request object. Maybe this is documented somewhere, but if it isn't, could something be added to the "VCL Steps" doc? I couldn't find anything explaining when frontend request headers are copied to backend header requests and why mine isn't.

Unrelated: https://www.varnish-software.com/developers/tutorials/varnish-builtin-vcl/ contains a typo ("response headeres"), can you please fix it?

@dridi
Copy link
Member

dridi commented Dec 11, 2024

@dridi thanks for the explanation. Then why does the doc read "header or variable"? Shouldn't the variable part just be dropped as it could be confusing (it did confuse me)?

I agree that this looks confusing. The documentation should probably be updated but @nigoroll recently announced a substantial update to the documentation so I would probably revisit this afterwards (not sure when though).

My suggestion to solve this

Do you mean to solve the issue of satisfying HEADs from previously cached GETs?

No, you can already do that out of the box. A cache miss on a HEAD request triggers a GET request, stores the whole response, and only replies with response headers to the client. A cache hit will also produce a proper HEAD response.

The problem my suggestion addresses is that any cache miss turns into a GET, this is not limited to HEAD requests (the only method for which this is appropriate).

You shouldn't need a workaround at all. HEAD is handled transparently by Varnish.

A workaround is needed when you need to cache the response for a different method like POST.

Sorry to insist but could you also explain what causes Varnish to not include the X-Original-Method in requests sent to the backend? This is the expected behaviour for me but I'm curious as to why this happens because it seems like I added a header to the request object.

You receive a client request (req) and if you need to trigger a fetch (for example because of a cache miss) then a backend request (bereq) is derived from the client request (that you may have modified) in the state it was when the fetch got triggered (modulus other changes like a x-varnish header).

Regarding your original snippet, it would actually look like this if you wanted to avoid sending the header to your backend:

sub vcl_recv {
    unset req.http.X-Fetch-Method; # prevent header injection
}

sub vcl_hash {
    if (req.method != "GET" && req.method != "HEAD") {
        hash_data(req.method);
    }
}

sub vcl_miss {
    if (req.method != "GET" && req.method != "HEAD") {
        set req.http.X-Fetch-Method = req.method;
    }
}

sub vcl_backend_fetch {
    if (bereq.http.X-Fetch-Method) {
        set bereq.method = bereq.http.X-Fetch-Method;
    }
    unset bereq.method = bereq.http.X-Fetch-Method;
}

This snippet treats GET/HEAD as the same, and only overrides non-HEAD cache misses. If we bypass the cache (return (pass)) Varnish leaves the method alone. After conditionally overriding the backend request method, the header is removed. This way the override logic is more robust and can handle retries.

Unrelated: https://www.varnish-software.com/developers/tutorials/varnish-builtin-vcl/ contains a typo ("response headeres"), can you please fix it?

@gquintard one two three not it!

Finally, replying to myself:

My suggestion to solve this would be a new req.hash_always_get flag [...]

I think a better name (considering existing flags) would be req.miss_always_get.

@sbraz
Copy link

sbraz commented Dec 11, 2024

No, you can already do that out of the box. A cache miss on a HEAD request triggers a GET request, stores the whole response, and only replies with response headers to the client. A cache hit will also produce a proper HEAD response.

Maybe my initial problem was not clear. I host large files for Linux distribution mirroring. Said distributions tend to crawl the mirror daily and send a large amount of HEAD requests. At the moment, all these HEAD requests result in GETs to the backend, which is wasteful.
My goal is to send HEAD requests to the backend for HEAD requests to the frontend. However, when I do this, the out-of-the-box behaviour you refer to no longer works. I was wondering if there was a way to:

  • cache HEADs separately
  • still satisfy HEAD requests from previous GETs

If there isn't, it's not a deal-breaker. I can live with duplicated objects when a GET came before a HEAD, I just want to know if keeping the out-of-the-box optimisation is possible.

Regarding your original snippet, it would actually look like this if you wanted to avoid sending the header to your backend

You are right, the header did get sent to the backend, my bad. This makes sense: if it's not unset, it should be present. I understand this part better now.

This snippet treats GET/HEAD as the same, and only overrides non-HEAD cache misses

Your example VCL doesn't really work for my use case, it sends GET for HEAD, isn't it meant for caching POST requests instead? Also the last line from vcl_backend_fetch should be unset bereq.http.X-Fetch-Method.

I tried to tweak it to set the header in vcl_miss and it does work but I end up with the same result as my previous VCL where the header was set in vcl_recv. The following works fine but if I send a client HEAD after a GET, Varnish still sends a HEAD to the backend. Am I missing something?

sub vcl_recv {
    unset req.http.X-Fetch-Method; # prevent header injection
}

sub vcl_hash {
    # If I kept the "HEAD" part from your code, a client GET after a client HEAD returned the result from HEAD
    if (req.method != "GET") {
        hash_data(req.method);
    }
}

sub vcl_miss {
    # If I kept the "HEAD" from your code, Varnish kept sending GETs to the backend for HEADs
    if (req.method != "GET") { 
        set req.http.X-Fetch-Method = req.method;
    }       
}

sub vcl_backend_fetch {
    if (bereq.http.X-Fetch-Method) {
        set bereq.method = bereq.http.X-Fetch-Method;
    }
    unset bereq.http.X-Fetch-Method;
}

To me, this works more or less like my previous VCL (plus the unset bit which is nice). Why is it better to set the header in vcl_miss instead of vcl_recv?

Thanks again for your help, it is much appreciated.

@dridi
Copy link
Member

dridi commented Dec 12, 2024

You can actually do that, however it's a little tricky:

sub vcl_recv {
    unset bereq.http.X-Fetch-Method;
    if (req.restarts == 1) {
        return (hash);
    }
}

sub vcl_hash {
    if (req.restarts == 1) {
        hash_data(req.method);
    }
}

sub vcl_miss {
    if (req.method == "HEAD") {
        if (req.restarts == 0) {
            return (restart);
        }
        set req.http.X-Fetch-Method = "HEAD";
    }
}

sub vcl_backend_fetch {
    if (bereq.http.X-Fetch-Method) {
        set bereq.method = bereq.http.X-Fetch-Method;
    }
    unset bereq.http.X-Fetch-Method;
}

If a cache entry exists for a GET request, you can reuse it for subsequent HEAD requests. Otherwise, exceptionally send a HEAD request to the backend with a different cache key.

Restarts tend to make the VCL state machine harder to reason about, use them sparingly.

@gquintard
Copy link
Member

@dridi, PR posted to the dev portal, @sbraz, thank you for pointing it out!

@sbraz
Copy link

sbraz commented Dec 13, 2024

You can actually do that, however it's a little tricky:

Thanks a lot, now that is something that could be worth adding to the documentation :) There's a typo in vcl_recv, it should unset unset req.http.X-Fetch-Method; but other than that, it works perfectly.

If I understand correctly (thanks in part to std.log), assuming the first request is a GET:

  • vcl_recv doesn't do anything special (except unsetting the internal header), neither do vcl_hash, vcl_miss or vcl_backend_fetch.
  • for a GET on the same URL, vcl_recv doesn't do anything special either and neither does vcl_hash. Then it's a hit.
  • a HEAD on the same URL behaves similarly, none of the conditions are entered, so we satisfy the HEAD from the cached GET, as per the documentation.

On the other hand, if the first request is a HEAD:

  • vcl_miss triggers a restart of the request processing.
  • vcl_recv immediately returns hash, I assume this is done to avoid executing the rest of the function?
  • vcl_hash hashes the HEAD method in addition to other parameters.
  • vcl_miss sets req.http.X-Fetch-Method.
  • vcl_backend_fetch performs a HEAD on the backend
  • for a HEAD on the same URL, we still enter vcl_miss once and the processing restarts, but then the hash is updated with the HEAD method, so we end up with a hit

Restarts tend to make the VCL state machine harder to reason about, use them sparingly.

This would be the only place in my VCL where I use them. I assume it's safe to reuse your snippet then?

I have one last question regarding backend requests. With the default behaviour of GET for all frontend requests, if I check varnishncsa's %r for a backend request, it contains HEAD /path although the method actually being used is GET, is that expected? The same goes for BereqMethod in varnishlog. Is this because BereqMethod is set before vcl_backend_fetch is executed?

@sbraz
Copy link

sbraz commented Dec 19, 2024

@dridi I noticed that all cached HEAD requests have Content-Length: 0 and it causes problems (this doesn't happen if a HEAD is satisfied from GET). Is that a Varnish bug or am I missing something? I'm running varnish-7.1.1 revision 7cee1c581bead20e88d101ab3d72afb29f14d87a.
EDIT: Same problem with varnish-7.6.1 revision c3d5882003eb87e5e93dc09fb9513ca96db3ca3c.

@dridi
Copy link
Member

dridi commented Dec 30, 2024

vcl_recv immediately returns hash, I assume this is done to avoid executing the rest of the function?

Correct, you already processed the original request once. Processing the request twice could "corrupt" it if operations are not idempotent.

This would be the only place in my VCL where I use them. I assume it's safe to reuse your snippet then?

It's probably fine. It becomes really messy once you have more than one reason to restart.

With the default behaviour of GET for all frontend requests, if I check varnishncsa's %r for a backend request, it contains HEAD /path although the method actually being used is GET, is that expected? The same goes for BereqMethod in varnishlog. Is this because BereqMethod is set before vcl_backend_fetch is executed?

Sounds like a bug, maybe open a new issue describing this behavior. It might actually be solved by #4213, can you give it a try first?

I noticed that all cached HEAD requests have Content-Length: 0 and it causes problems (this doesn't happen if a HEAD is satisfied from GET).

Sounds like a bug, please open a new issue with steps to reproduce.

@sbraz
Copy link

sbraz commented Dec 30, 2024

Thanks, I created #4244 for BereqMethod shown as HEAD although a GET is sent. Actually the biggest problem was that varnishncsa showed HEAD for %r but #4213 fixes this.
And, more importantly, #4245 for the zero Content-Length problem.

Once the Content-Length problem is fixed, could the example VCL with restarts be added to the documentation somewhere?

nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jan 2, 2025
Previously, we would only keep the Content-Length header for HEAD requests on
hit-for-miss objects, now we simply keep it always to enable "fallback" caching
of HEAD requests.

The added vtc implements the basics of the logic to enable the (reasonable) use
case documented in
varnishcache#2107 (comment)
but using Vary instead of cache key modification plus restart.

Fixes varnishcache#4245
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jan 3, 2025
Previously, we would only keep the Content-Length header for HEAD requests on
hit-for-miss objects, now we simply keep it always to enable "fallback" caching
of HEAD requests.

The added vtc implements the basics of the logic to enable the (reasonable) use
case documented in
varnishcache#2107 (comment)
but using Vary instead of cache key modification plus restart.

Fixes varnishcache#4245
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Feb 6, 2025
Previously, we would only keep the Content-Length header for HEAD requests on
hit-for-miss objects, now we simply keep it always to enable "fallback" caching
of HEAD requests.

The added vtc implements the basics of the logic to enable the (reasonable) use
case documented in
varnishcache#2107 (comment)
but using Vary instead of cache key modification plus restart.

Fixes varnishcache#4245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants