-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Service Lambda timeouts cause user-facing 5xx responses (#6284) #6624
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6624 +/- ##
===========================================
- Coverage 85.59% 85.57% -0.03%
===========================================
Files 155 154 -1
Lines 20903 20899 -4
===========================================
- Hits 17892 17884 -8
- Misses 3011 3015 +4 ☔ View full report in Codecov by Sentry. |
47cdf38
to
e703ece
Compare
d54245e
to
db02ce0
Compare
Testing 504 response, note the included
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! ✅
A few observations.
Commit, [a] Add a retry-after header to 504 responses (#6284), informs of a API change but no associated API version bump is visible.
You should reconsider your approach in commits, 'Remove metric alarm threshold and period default values' & 'Set indexer and service metric alarm threshold to one per day (#6284)'. It seem to me, like unnecessary changes (specifically, removing the defaults) to latter introduce a constriction on a generic helper handler. Which I think it's better being specified at the call site. Think about going through the routes in the app.py file, and trying to determine what the alarm rate, period or threshold may be for a given Lambda, you're current approach may take some clicking around to determine what these values may be. However, I'm not 100% sure of the intent here so perhaps I'm missing something.
Finally, you're drop! commit is thoughtful, but do consider adding test coverage, perhaps a small unit test, mock it to timeout right away and return the desired status code.
src/azul/terraform.py
Outdated
**{ | ||
f'DEFAULT_{response_type}': { | ||
'responseParameters': { | ||
# Static value response header parameters must be enclosed | ||
# within a pair of single quotes. | ||
# | ||
# https://docs.aws.amazon.com/apigateway/latest/developerguide/request-response-data-mappings.html#mapping-response-parameters | ||
# https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-swagger-extensions-gateway-responses.html | ||
# | ||
# Note that azul.strings.single_quote() is not used here | ||
# since API Gateway allows internal single quotes in the | ||
# value, which that function would prohibit. | ||
# | ||
f'gatewayresponse.header.{k}': f"'{v}'" | ||
for k, v in AzulChaliceApp.security_headers.items() | ||
} | ||
} for response_type in ['4XX', '5XX'] | ||
}, | ||
**{ | ||
response_type: { | ||
'responseParameters': { | ||
**{ | ||
f'gatewayresponse.header.{k}': f"'{v}'" | ||
for k, v in AzulChaliceApp.security_headers.items() | ||
}, | ||
'gatewayresponse.header.Retry-After': "'10'" | ||
}, | ||
'responseTemplates': { | ||
"application/json": json.dumps({ | ||
'message': '504 Gateway Timeout. Wait the number of' | ||
' seconds given in the `Retry-After`' | ||
' header before retrying the request.' | ||
}) | ||
} | ||
} for response_type in ['INTEGRATION_TIMEOUT', 'INTEGRATION_FAILURE'] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider employing the changes in the following patch, make the diff smaller.
Index: src/azul/terraform.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/terraform.py b/src/azul/terraform.py
--- a/src/azul/terraform.py (revision 1f2033d1536b4b60bc2d1632e597fb3c74aa2675)
+++ b/src/azul/terraform.py (date 1728947754159)
@@ -816,7 +816,7 @@
openapi_spec[key] = config.minimum_compression_size
assert 'aws_api_gateway_gateway_response' not in resources, resources
openapi_spec['x-amazon-apigateway-gateway-responses'] = {
- **{
+ {
f'DEFAULT_{response_type}': {
'responseParameters': {
# Static value response header parameters must be enclosed
@@ -833,8 +833,7 @@
for k, v in AzulChaliceApp.security_headers.items()
}
} for response_type in ['4XX', '5XX']
- },
- **{
+ } | {
response_type: {
'responseParameters': {
**{
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyCharm wants to format
some_variable = {
'foo': 'FOO'
} | {
'bar': 'BAR'
}
as
some_variable = {
'foo': 'FOO'
} | {
'bar': 'BAR'
}
so I wrapped it in parens to get
some_variable = (
{
'foo': 'FOO'
} | {
'bar': 'BAR'
}
)
1cebdf5
to
ebee142
Compare
These changes (removing the defaults & increasing the period to one day) were requested in the ticket.
Such a test wouldn't be able to verify the response headers, since they come from the API Gateway. |
ebee142
to
56a65c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes (removing the defaults & increasing the period to one day) were #6284 (comment).
I see, I originally missed that, apologies.
Approved ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next time, no fixups, please.
src/azul/terraform.py
Outdated
'responseTemplates': { | ||
"application/json": json.dumps({ | ||
'message': '504 Gateway Timeout. Wait the number of' | ||
' seconds given in the `Retry-After`' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
' seconds given in the `Retry-After`' | |
' seconds specified in the `Retry-After`' |
src/azul/openapi/responses.py
Outdated
return { | ||
'504': { | ||
'description': 'Request timed out. When handling this response,' | ||
' clients should wait the number of seconds given in' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
' clients should wait the number of seconds given in' | |
' clients should wait the number of seconds specified in' |
src/azul/terraform.py
Outdated
{ | ||
f'DEFAULT_{response_type}': { | ||
'responseParameters': { | ||
# Static value response header parameters must be enclosed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be moved up so it is evident that it applies to both dictionaries.
src/azul/openapi/responses.py
Outdated
@@ -41,3 +41,14 @@ def header(type_: TYPE, **kwargs: PrimitiveJSON) -> JSON: | |||
'schema': schema.make_type(type_), | |||
**kwargs | |||
} | |||
|
|||
|
|||
def http_504_response() -> JSON: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong place. There is already precedent for shared specs.
src/azul/openapi/spec.py
Outdated
return { | ||
'504': { | ||
'description': 'Request timed out. When handling this response,' | ||
' clients should wait the number of seconds' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we typically place the space at the beginning of the continuation. If you disagree, please provide evidence.
src/azul/terraform.py
Outdated
'responseTemplates': { | ||
"application/json": json.dumps({ | ||
'message': '504 Gateway Timeout. Wait the number of' | ||
' seconds specified in the `Retry-After`' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
ae04ba7
to
4a4f6ba
Compare
src/azul/openapi/spec.py
Outdated
@@ -232,3 +232,15 @@ def version(self) -> JSON: | |||
} | |||
} | |||
} | |||
|
|||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this is a class method while the other methods are not, is a smell. The method serves the same purpose as the other methods but the usage pattern is different, and that is the smell.
src/azul/chalice.py
Outdated
@@ -267,6 +267,10 @@ def route(self, | |||
methods = kwargs['methods'] | |||
self.non_interactive_routes.update((path, method) for method in methods) | |||
methods = kwargs.get('methods', ()) | |||
if method_spec: | |||
import azul.openapi.spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also smells.
Please dissolve CommonEndpointSpecs
into AzulChaliceApp
. That should happen as the first commit in this PR. The methods from CommonEndpointSpecs
should be appended at the end of AzulChaliceApp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore the previous review.
71002ad
to
4a1ae7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the refactoring mentioned in my previous commit review. It passes unit and integration tests. Please rebase your changes on top of mine. I have a backup of your branch if you need it.
… and dissolve CommonEndpointSpecs
965a968
to
8c128b0
Compare
Connected issues: #6284
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
1 when the issue title describes a problem, the corresponding PR
title is
Fix:
followed by the issue titleAuthor (partiality)
p
tag to titles of partial commitspartial
or completely resolves all connected issuespartial
labelAuthor (chains)
base
or this PR is not chained to another PRchained
or is not chained to another PRAuthor (reindex, API changes)
r
tag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:dev
or the changes introduced by it will not require reindexing ofdev
reindex:anvildev
or the changes introduced by it will not require reindexing ofanvildev
reindex:anvilprod
or the changes introduced by it will not require reindexing ofanvilprod
reindex:prod
or the changes introduced by it will not require reindexing ofprod
reindex:partial
and its description documents the specific reindexing procedure fordev
,anvildev
,anvilprod
andprod
or requires a full reindex or carries none of the labelsreindex:dev
,reindex:anvildev
,reindex:anvilprod
andreindex:prod
API
or this PR does not modify a REST APIa
(A
) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.py
or this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.json
and committed the resulting changes or this PR does not modifyazul_docker_images
, or any other variables referenced in the definition of that variableu
tag to commit title or this PR does not require upgrading deploymentsupgrade
or does not require upgrading deploymentsdeploy:shared
or does not modifydocker_images.json
, and does not require deploying theshared
component for any other reasondeploy:gitlab
or does not require deploying thegitlab
componentdeploy:runner
or does not require deploying therunner
imageAuthor (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprod
andprod
) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop
, squashed old fixupsmake requirements_update
or this PR does not modifyrequirements*.txt
,common.mk
,Makefile
andDockerfile
R
tag to commit title or this PR does not modifyrequirements*.txt
reqs
or does not modifyrequirements*.txt
make integration_test
passes in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex:…
labels andr
commit title tagno demo
develop
_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
deploy:gitlab
deploy:gitlab
System administrator
dev.gitlab
are complete or this PR is not labeleddeploy:gitlab
anvildev.gitlab
are complete or this PR is not labeleddeploy:gitlab
Operator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
_select anvildev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
sandbox
label or PR is labeledno sandbox
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indices indev
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indices inanvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
p
if the PR is also labeledpartial
Operator (chain shortening)
develop
or this PR is not labeledbase
chained
label from the blocked PR or this PR is not labeledbase
base
base
label from this PR or this PR is not labeledbase
Operator (after pushing the merge commit)
dev
anvildev
dev
dev
anvildev
anvildev
_select dev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
_select anvildev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
dev
anvildev
Operator (reindex)
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
Operator
deploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels to the next promotion PRs or this PR carries none of these labelsdeploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem