Replies: 4 comments 4 replies
-
@viniarck, excellent documentation, thanks for putting so much effort to describe the changes. Other than the unit tests, what kind of impact should we expect for the end-to-end tests in term of refactoring? The results are impressive and I am ok with it for 2023.1, I just need more data about the effort to make all changes. |
Beta Was this translation helpful? Give feedback.
-
Hi @viniarck , Have you also considered quart hypercorn combined with flask? |
Beta Was this translation helpful? Give feedback.
-
For the record, since |
Beta Was this translation helpful? Give feedback.
-
I appreciated everybody's feedback on this discussion. We'll go for for I'll start to map the related tasks to carry on the implementation. If unexpected major blockers show up they'll be handled as issues. I'll close this issue since it has stayed opened long enough collecting feedback. |
Beta Was this translation helpful? Give feedback.
-
This discussion is for issue #301.
These are the problems being solved and the proposed solutions:
Problems:
SocketIO
should be ready for running in production out of the box #168Proposed Solutions (to the respective problems):
werkzeug
withuvicorn
, which is well maintainted and one of the most battle tested and used ASGI server currently in Python land, and it supports programmatically embedded to be shutdown.flask
(andflask-socketio
) withstarlette
, which is the base async framework that FastAPI is built on top of, and it's one of most widely used today in many projects, and is constantly well ranked in benchmarks (as shown in the figure below).Flask
2+ was an intermediary important milestone on Kytos-ng 2022.3, that unlockedasync
routes, but it was still runinng onwerkzeug
server, so it wasn't async turtles all the way down, async was bolted on it while still being compatible as an WSGI server. Moving more towardsasyncio
instead ofgevent
orevelent
(thatflask-socketio
recommends) is the way go, since Python upstream is developing heavily towardsasync
and most of the well known web/backend libs are also moving in this direction, consequently our team can leverage well maintained upstream code and libraries.Initially, I was inclined to propose
FastAPI
instead ofstarlette
,FastAPI
essentially bringsstarlette
+pydantic
+ leverage typing a bit more + OpenAPI generation utilities. However, we already have invested in extensive openapi.yml documentation on NApps, and we already have a plan to moveopenapi-core
frommef_eline
to core to be reused, so withpydantic
we'll only use it to validate DB models, and then since we're already usingopenapi-core
in the future we can leverage openapi-schema-validator to reuse the schemas components/models from the openapi specs to also validate KytosEvent content.Also, we wouldn't benefit much from the auto generated OpenAPI that FastAPI provides since we've already been maintaining it in a respective file instead.
typing
is welcome and we'll continue to use. Consequentely,starlette
is more suitable without introducing too much that can overlap with the existing code base functionalities/responsibilities without much extra effort or additional liabilities, andstarlette
is also more likely to be around for years to come, sinceFastAPI
wouldn't even exist without it anyway. If one day we decide not to maintain external OpenAPI files we can revisit this part of the discussion.uvicorn
withstarlette
won't completely solve any type of instability if an high rate of requests being sent during a long period of time, butasyncio
Tasks being less resource intensive than Threads then they can contribute to more throughput and stability for IO-bound parts. Only rate limiting will completely mitigate the original problem, but with asyncio/uvicorn/starlette as the experiments below will show our HTTP endpoints will be much more stable and predictable in terms of latency and responses. In fact, the original issue High rate of requests can lead to runtime instability #225 no longer results in instability resetting client connections.Initial Experiments:
I researched and conducted the following pre-requisites experiments to confirm that the proposed solutions would fit well:
e1) Confirm that threadpools are working well, stress test also with at least 300 req / sec during 1 min
uvicorn
handling 500 req/sec similar toGET topology/v3
route without breaking a sweat and 95th percentile under 78 ms over 1 minute:werkzeug
trying to handle 500 req/s onGET topology/v3
(scenario from issue #225):e2) Make sure APM instrumetation is capturing requests/responses as expected
starlette
is supported by Elastic APM, I've also confirmed in practice, andpymongo
instrumentation still works as expected:e3) Make sure
uvicorn
won't conflict withkytosd
console ctrl-d:uvicorn
embedded server shutdown capabilities worked as expected, and gracefully shuts down including from the console (that[uvicorn.error]
log entry it's atINFO
level,uvicorn
team picked an unfortunate logger name for some modules):e4) Make sure
openapi-core
openapi.yml validator is still compatible:openapi-core
0.16+ supports it, we'll need to upgrade this dependency, I've also quickly prototyped it to double confirm, they had some breaking changes on some Python imports but it works:e5) Adapt
rest
decoratorThe
rest
decorator will still be compatible as a drop in, and if it's acoroutine
then it'll be run in the asyncio eventloop context, otherwise it'll usestarlette/uvicorn
ThreadPool, so it's entirely compatible with the NApps that still uses synchronously, and as NApps are ready to become more async when it makes sense, they can do it gradually. Other than that, since I haven't adapted therest
decorator yet, I ended up temporarily duplicating some endpoints to be faster when prototyping this, I ran these experiments:POST /v2/flowsx
is equivalent toflow_manager/v2/flows/{dpid}
, except it's being served byuvicorn/starlette
, notice that even withpymongo
using a blocking driver, in this case the 95th percentile latency ended up being 95 times faster, and the rate of request was 100 req/sec over 1 min, which is quite expressive for a real use case networking scenario:e6) Adapt
kytos/lib/helpers.py
get_test_client(controller, napp)
.I haven't done this yet, but it should be doable, no surprises expected here. Also, shipping
httpx
, that is recommended and maintained by the same uvicorn/starlette team (encode),httpx
works both synchronously or asynchronously, providing a very convenient interface. As we've been usingasync
gradually when it makes sense,httpx
can also replacerequests
, including providing async capabilities when needed, so NApps can still userequests
, but then as they start leveraging moreasync
they can start using it.Example of
httpx
that I demoed before when showing some async capabilities that's supported:Adjacent opportunities:
I've also taken the opportunity to try to experiment with an adjacent related library
motor
to fully also have async DB calls, this is the last significant blocking IO part that we have to have asyncio everywhere on our platform/NApps, withmotor
the official asyncpymongo
driver, to potentially provide an optional async client when needed in addition to maintainingpymongo
, however, it's been shown in practice that at the moment, for the upcoming2023.1
version, it's not worth it at moment because of these two main reasons:motor
has been async sincetornado
times, it supportsasyncio
but it's not a first class turtles all the way downasyncio
, it's running executors on top of blocking IO, so there are cases where beformance will be better, but MongoDB Python core team acknowledges that on average results might still be relatively similar.motor
yet, onlypymongo
, somotor
calls wouldn't show up on charts.Here's an experiment, I simplified two endpoints to upsert
flows
using/flow_manager/v2/sync_upsert/{dpid}
(flask/werkzeug/pymongo) and/v2/async_upsert
with (uvicorn/starlette/motor), notice that in this particular case, with 100 req /sec over 1 minmotor
ended up having slightly worse overall lateness:Maybe a case where the synchronous endpoint is blocking too much,
motor
would outperform, but this confirms that out the gate the current implementation ofmotor
won't always bring better latencies, even when there's a significant number of requests, so we'll keep on eye on it, and let's see in a next opportunity howmotor
and see when Elastic APM also supports it. As of now, the synchronouspymongo
driver call still benefit fromuvicorn
thread pools and event loop as it was shown in experiment e5, which also leaves us a great position, as we've always been gradually moving toasync
.Feedback
Let me know your thoughts, suggestions or concerns, this is being planned to be shipped on
2023.1
Beta Was this translation helpful? Give feedback.
All reactions