Skip to content

Fix and optimize 304 object copying #4013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 5, 2023

Conversation

nigoroll
Copy link
Member

@nigoroll nigoroll commented Oct 31, 2023

(edit: Added a second commit which depends on the first)

TL;DR: Fix trimstore use in 304 and copy objects efficiently ("defragmented")

Commit 1: Call trimstore only once when copying the body after a 304

It is my understanding that the objtrimstore stevedore API function is only to be called once when the object is complete, which I believe is also in line with the comment on ObjExtend() that The final flag must be set on the last call.

If this understanding of the API is correct, we did not adhere to it in the fetch code when we made a copy of an existing stale object after a 304 response: There, we iterated over the stale object and did not set the final flag just once when the object was complete, but rather after each storage segment was copied.

This commit fixes this, adds some pedentry to the simple storage and extends b00062.vtc to test this behavior specifically. On top, g6.vtc also triggered without the fix but the duplicate trim detection in place.

This issue has originally surfaced in the SLASH/fellow storage where the trimstore function implicitly asserted to only be called once.

Ref 115742b
Ref https://gitlab.com/uplex/varnish/slash/-/issues/33

Commit 2: Optimize 304 template object copying

When copying a stale object after a 304 revalidation, we iterated over it and allocated new storage for each storage segment.

So, at best, we kept the fragmentation of the existing object, or made it even worse. This is particularly relevant when the existing object was created from a chunked response, in which case the original segments might have been particularly small and, consequently, many in number.

Because we wait for the stale object to complete, we know the total length (OA_LEN) upfront, and can ask the storage serving the copy for exactly the right length. This is much more efficient, as fragmentation is lowered and the storage engine might make a much better allocation decision when it knows the full length rather than getting piecemeal requests.

We implement the improved allocation scheme through additional state kept for the duration of the template object copy: struct vbf_objiter_priv holds the pointer to the newly created busy object, the total yet unallocated length (initialized to OA_LEN of the existing object) and pointer/length to the unused portion of the currently open segment, if any.

@nigoroll nigoroll changed the title Call trimstore only once when copying the body after a 304 Fix and optimize 304 object copying Nov 1, 2023
@dridi dridi requested a review from mbgrydeland November 13, 2023 14:58
Copy link
Contributor

@mbgrydeland mbgrydeland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a problem with this patch set wrt to revalidation of objects that are still streaming?

That is, due to a TTL that is shorter than the time it takes to fetch the object, a 304 revalidation is started which then needs to stream the data from the still fetching object while creating the revalidated object.

I wonder if this will then be using the wrong length (OA_LEN not yet set, which happens in vbf_stp_fetchend), causing an assert it looks like. It would I think regardless mess up both the allocation sizes and the logic to set VFP_END once and only once.

@mbgrydeland
Copy link
Contributor

Isn't there a problem with this patch set wrt to revalidation of objects that are still streaming?

Hmm, seems I was confused about the behaviour here. We do an explicit wait for BOS_FINISHED before creating the revalidated object, so that should be OK.

@mbgrydeland
Copy link
Contributor

There is a problem with the patch when revalidating zero length objects. Varnish won't attempt to do a revalidation for these objects (when the length is zero), but a missbehaving backend answering 304 anyways will cause a panic. See the test case.

varnishtest ""

server s1 {
	rxreq
	txresp -nolen -hdr "Content-Length: 0" -hdr {Etag: "foo"}

	rxreq
	txresp -status 304 -nolen -hdr "Content-Length: 0" -hdr {Etag: "foo"}
} -start

varnish v1 -vcl+backend {
	sub vcl_backend_response {
		set beresp.ttl = 1s;
	}
} -start

client c1 {
	txreq
	rxresp

	txreq
	rxresp
} -run

delay 1

client c2 {
	txreq
	rxresp
} -run

delay 0.1

client c3 {
	txreq
	rxresp
} -run

@nigoroll
Copy link
Member Author

nigoroll commented Dec 4, 2023

@mbgrydeland good catch! I will add the test case and then improve this PR.

On the previous comment: Yes, we only copy the stale object once it is complete.

nigoroll added a commit that referenced this pull request Dec 4, 2023
Test case by Martin Blix Grydeland, taken from #4013
@nigoroll
Copy link
Member Author

nigoroll commented Dec 4, 2023

@mbgrydeland I am going to force-push a simple removal of the triggering assertion:

diff --git a/bin/varnishd/cache/cache_fetch.c b/bin/varnishd/cache/cache_fetch.c
index 3e0d7cdc3..6fb47975f 100644
--- a/bin/varnishd/cache/cache_fetch.c
+++ b/bin/varnishd/cache/cache_fetch.c
@@ -883,7 +883,6 @@ vbf_stp_condfetch(struct worker *wrk, struct busyobj *bo)
        INIT_OBJ(vop, VBF_OBITER_PRIV_MAGIC);
        vop->bo = bo;
        vop->l = ObjGetLen(bo->wrk, stale_oc);
-       assert(vop->l > 0);
        if (ObjIterate(wrk, stale_oc, vop, vbf_objiterate, 0))
                (void)VFP_Error(bo->vfc, "Template object failed");
 

IMHO, skipping the iterator for a zero length object would also be an option, but we did call it before this PR and there might be future code depending on it. So let's keep it and have the iterator do its noop.

@nigoroll nigoroll force-pushed the fetch_304_dup_trimstore branch from 8ec7719 to d7e0b26 Compare December 4, 2023 11:09
@mbgrydeland
Copy link
Contributor

I'm wondering if there aren't some potential problems unrelated to this patch that we have uncovered. The fact that Varnish will enter 304 revalidation logic even though we never asked for it (didn't send out any IMS/INM headers) is worrying to me. I wonder if we shouldn't keep track somehow of whether to expect a 304, and error out if we still got a 304. Could prevent a future gotcha at least.

@nigoroll
Copy link
Member Author

nigoroll commented Dec 4, 2023

bugwash: waiting for @mbgrydeland's ok

Copy link
Contributor

@mbgrydeland mbgrydeland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you end up dropping the panic test case from the PR? I can't see it as part of the commits currently.

@nigoroll nigoroll requested a review from mbgrydeland December 4, 2023 15:23
@nigoroll
Copy link
Member Author

nigoroll commented Dec 4, 2023

Did you end up dropping the panic test case from the PR? I can't see it as part of the commits currently.

I added it directly to trunk: 3fc690d 9a4653b

@mbgrydeland sorry for accidentally re-requesting your review too early.

It is my understanding that the objtrimstore stevedore API function is
only to be called once when the object is complete, which I believe is
also in line with the comment on ObjExtend() that "The final flag must
be set on the last call".

If this understanding of the API is correct, we did not adhere to it in
the fetch code when we made a copy of an existing stale object after a
304 response: There, we iterated over the stale object and did not set
the final flag just once when the object was complete, but rather after
each storage segment was copied.

This commit fixes this, adds some pedentry to the simple storage and
extends b00062.vtc to test this behavior specifically. On top, g6.vtc
also triggered without the fix but the duplicate trim detection in place.

This issue has originally surfaced in the SLASH/fellow storage where the
trimstore function implicitly asserted to only be called once.

Ref 115742b
Ref https://gitlab.com/uplex/varnish/slash/-/issues/33
When copying a stale object after a 304 revalidation, we iterated over
it and allocated new storage for each storage segment.

So, at best, we kept the fragmentation of the existing object, or made
it even worse. This is particularly relevant when the existing object
was created from a chunked response, in which case the original
segments might have been particularly small and, consequently, many in
number.

Because we wait for the stale object to complete, we know the total
length (OA_LEN) upfront, and can ask the storage serving the copy for
exactly the right length. This is much more efficient, as
fragmentation is lowered and the storage engine might make a much
better allocation decision when it knows the full length rather than
getting piecemeal requests.

We implement the improved allocation scheme through additional state
kept for the duration of the template object copy: struct
vbf_objiter_priv holds the pointer to the newly created busy object,
the total yet unallocated length (initialized to OA_LEN of the existing
object) and pointer/length to the unused portion of the currently open
segment, if any.
@nigoroll nigoroll force-pushed the fetch_304_dup_trimstore branch from d7e0b26 to 59fe52c Compare December 4, 2023 18:49
@dridi dridi merged commit 98b6b95 into varnishcache:master Dec 5, 2023
@nigoroll nigoroll deleted the fetch_304_dup_trimstore branch December 5, 2023 20:42
nigoroll added a commit that referenced this pull request Dec 5, 2023
we need a signal if the fix is in
nigoroll added a commit that referenced this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants