FAB Provider - 'Server has gone away' #63718

anavrotski · 2026-03-15T20:38:08Z

anavrotski
Mar 15, 2026

Apache Airflow Provider(s)

fab

Versions of Apache Airflow Providers

apache-airflow-providers-fab 3.4.0

Apache Airflow version

3.1.8

Operating System

linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

Airflow - AWS EKS K8s cluster, database - MySQL AWS RDS.

What happened

After several hours, Airflow UI becomes unavailable, and an exception is raised. Confirmed in two different environments (and different database servers), so it is not a db problem.

api-server     await super().__call__(scope, receive, send)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
api-server     await self.middleware_stack(scope, receive, send)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
api-server     raise exc
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
api-server     await self.app(scope, receive, _send)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
api-server     await self.app(scope, receive, send)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/gzip.py", line 29, in __call__
api-server     await responder(scope, receive, send)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/gzip.py", line 130, in __call__
api-server     await super().__call__(scope, receive, send)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/gzip.py", line 46, in __call__
api-server     await self.app(scope, receive, self.send_with_compression)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/base.py", line 182, in __call__
api-server     with recv_stream, send_stream, collapse_excgroups():
api-server                                    ^^^^^^^^^^^^^^^^^^^^
api-server   File "/usr/python/lib/python3.12/contextlib.py", line 158, in __exit__
api-server     self.gen.throw(value)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/_utils.py", line 85, in collapse_excgroups
api-server     raise exc
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/starlette/middleware/base.py", line 184, in __call__
api-server     response = await self.dispatch_func(request, call_next)
api-server                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/auth/middlewares/refresh_token.py", line 49, in dispatch
api-server     new_user, current_user = await self._refresh_user(current_token)
api-server                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/auth/middlewares/refresh_token.py", line 81, in _refresh_user
api-server     user = await resolve_user_from_token(current_token)
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/core_api/security.py", line 103, in resolve_user_from_token
api-server     return await get_auth_manager().get_user_from_token(token_str)
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/auth/managers/base_auth_manager.py", line 111, in get_user_from_token
api-server     return self.deserialize_user(payload)
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/cachetools/_cachedmethod.py", line 380, in __call__
api-server     return wrapper(self._obj, *args, **kwargs)
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/cachetools/_cachedmethod.py", line 363, in wrapper
api-server     v = method(self, *args, **kwargs)
api-server         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/fab/auth_manager/fab_auth_manager.py", line 296, in deserialize_user
api-server     return self.session.scalars(select(User).where(User.id == int(token["sub"]))).one()
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/scoping.py", line 1891, in scalars
api-server     return self._proxied.scalars(
api-server            ^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2459, in scalars
api-server     return self._execute_internal(
api-server            ^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2249, in _execute_internal
api-server     result: Result[Any] = compile_state_cls.orm_execute_statement(
api-server                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/context.py", line 306, in orm_execute_statement
api-server     result = conn.execute(
api-server              ^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1419, in execute
api-server     return meth(
api-server            ^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 527, in _execute_on_connection
api-server     return connection._execute_clauseelement(
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1641, in _execute_clauseelement
api-server     ret = self._execute_context(
api-server           ^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
api-server     return self._exec_single_context(
api-server            ^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
api-server     self._handle_dbapi_exception(
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2363, in _handle_dbapi_exception
api-server     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
api-server     self.dialect.do_execute(
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 952, in do_execute
api-server     cursor.execute(statement, parameters)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 179, in execute
api-server     res = self._query(mogrified_query)
api-server           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 330, in _query
api-server     db.query(q)
api-server   File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/connections.py", line 286, in query
api-server     _mysql.connection.query(self, query)
api-server sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'Server has gone away')
api-server [SQL: SELECT ab_user.id, ab_user.first_name, ab_user.last_name, ab_user.username, ab_user.password, ab_user.active, ab_user.email, ab_user.last_login, ab_user.login_count, ab_user.fail_login_count, ab_user.created_on, ab_user.changed_on, ab_user.created_by_fk, ab_user.changed_by_fk 
api-server FROM ab_user 
api-server WHERE ab_user.id = %s]
api-server [parameters: (2,)]
api-server (Background on this error at: https://sqlalche.me/e/20/e3q8)

What you think should happen instead

No response

How to reproduce

Deploy Airflow 3.1.8 with fab provider v3.4.0 and MySQL as Airflow's metadata database.

Anything else

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

potiuk · 2026-03-15T21:57:57Z

potiuk
Mar 15, 2026
Collaborator

Please check server logs. Server has gone away is always accompanied by a message on the server side explaining what was wrong. Alternatively - inspect your firewall rules. Often what happens is that in case of lack of activilty on connectoins, firewalls might close such opened connection. That would be the same if your infrastructure has similar behaviour on closing long running connection to a database. Such issues can be solved by enabling pooling, and ping per conneciton or several other techniques described in https://docs.sqlalchemy.org/en/21/core/pooling.html#dealing-with%20-disconnects -> look at the docs, you can pass engine parameters when you configure airflow, so you should be able to experiment with that.

Let us know how your experiments went.

0 replies

anavrotski · 2026-03-15T22:06:46Z

anavrotski
Mar 15, 2026
Author

It happened in two different isolated environments, two airflow deployments with two different database servers roughly after the same time period after deploy with same stacktrace. And airflow 3.1.6 with fab provider 3.1.2 works well under absolutely the same conditions during months. I don't think it has any relation to firewall, network configuration etc.

0 replies

potiuk · 2026-03-15T23:32:52Z

potiuk
Mar 15, 2026
Collaborator

Without you looking at your server logs it's impossible to help you with your problems.

0 replies

potiuk · 2026-03-15T23:38:22Z

potiuk
Mar 15, 2026
Collaborator

Sometimes there are other circumstances and you can even not knowingly have some differences that you are aware - and it's simply impossible to figure out all possibilitites and guess what could be the problem, or whether it was caused by upgrade or deployment differences.

But looking at logs at your server might actually bring some solutions.

I know there is an easy temptation to say "the only thing changed is Airflow version" and have hopes that maintainers will magically guess what your problem is, but maintainers are volunteers and they help others if they have at least some information that can help them to make at least intelligent guessess. Generally this is the kind of "help" and "support" you get for a software that you get absolutely for free - it's free as a puppy - you need to take care of it, even of you got it for free - and usually it means that people will gladly help and even direct you what to do (for example look for logs telling what happens or at some firewall settings) but sometimes that is maximum they can do, because ultimately you are the only person who has access to your deployment and can diagnose things (aka. Deployment Manager).

0 replies

potiuk · 2026-03-15T23:41:13Z

potiuk
Mar 15, 2026
Collaborator

Also this is what Claude says:

That's a classic MySQL error. It typically means the connection between your application and MySQL server was dropped. The most common causes are:
The connection timed out. MySQL has a wait_timeout setting (default 8 hours) that closes idle connections. If your app holds a connection longer than that without using it, the next query fails with this error.
The query or result was too large. MySQL's max_allowed_packet setting limits the size of a single query/response. If you're inserting or retrieving large blobs or huge INSERT statements, it'll sever the connection.
The MySQL server restarted or crashed between when the connection was established and when the query ran.
Fixes depend on your stack, but the most common ones:
For connection timeout issues — enable connection pooling with recycling. If you're using SQLAlchemy, set pool_recycle to something shorter than MySQL's wait_timeout:
pythonengine = create_engine("mysql://...", pool_recycle=3600, pool_pre_ping=True)
pool_pre_ping=True is especially helpful — it tests each connection before using it and transparently replaces dead ones.
For packet size issues — increase max_allowed_packet in your MySQL config:
ini[mysqld]
max_allowed_packet = 64M
Then restart MySQL. You can check the current value with SHOW VARIABLES LIKE 'max_allowed_packet';.
Are you seeing this intermittently on idle connections, or consistently on specific queries? That would help narrow down which cause it is.

0 replies

anavrotski · 2026-03-16T06:37:14Z

anavrotski
Mar 16, 2026
Author

The captured pod logs don’t contain timestamps, so I can’t correlate them with the RDS server logs. I also can’t reproduce the installation of Airflow 3.1.8, because downgrading the database in this version fails and requires recreating the entire database.

What I can add is that after installation, Airflow ran without issues for several hours and only then failed (in two separate environments). This error doesn’t appear when using the same database on the same cluster but with a different version of the FAB provider.

So, let's close this issue as "unreproducible". Next time, when a new version of Airflow is released and it fails, I'll try to capture more info.

P.S. Claude offered a long list of possible causes—Valkey cache tuning, Celery settings, and so on—for the “Task state changed externally” error. But the real root cause turned out to be simple: the API server was restarting because of a memory leak.

0 replies

potiuk · 2026-03-16T11:15:38Z

potiuk
Mar 16, 2026
Collaborator

Converted to discussion.

0 replies

ajinp-dot · 2026-03-18T03:51:51Z

ajinp-dot
Mar 18, 2026

This looks like the classic “MySQL server has gone away” issue with Airflow, usually caused by stale or idle connections in the SQLAlchemy pool. Since it happens after a few hours, it’s likely MySQL (RDS) is closing idle connections (wait_timeout) and Airflow is trying to reuse them. Fix is to enable connection recycling and health checks, like setting sql_alchemy_pool_recycle to a value lower than MySQL timeout and sql_alchemy_pool_pre_ping = True so dead connections are refreshed automatically.

Also double check RDS settings like wait_timeout and max_allowed_packet. This is more of a connection lifecycle issue than load or max connections.

That said, in rare cases if the MySQL database itself is corrupted or tables become inaccessible, similar errors can show up. If you start seeing InnoDB errors in logs or queries failing inconsistently, you can try third party recovery tools like Stellar Repair for MySQL, helps to Repair and extract data from damaged tables.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAB Provider - 'Server has gone away' #63718

Uh oh!

{{title}}

Uh oh!

Replies: 8 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

FAB Provider - 'Server has gone away' #63718

Uh oh!

anavrotski Mar 15, 2026

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

Replies: 8 comments

Uh oh!

potiuk Mar 15, 2026 Collaborator

Uh oh!

anavrotski Mar 15, 2026 Author

Uh oh!

potiuk Mar 15, 2026 Collaborator

Uh oh!

potiuk Mar 15, 2026 Collaborator

Uh oh!

potiuk Mar 15, 2026 Collaborator

Uh oh!

anavrotski Mar 16, 2026 Author

Uh oh!

potiuk Mar 16, 2026 Collaborator

Uh oh!

ajinp-dot Mar 18, 2026

anavrotski
Mar 15, 2026

potiuk
Mar 15, 2026
Collaborator

anavrotski
Mar 15, 2026
Author

potiuk
Mar 15, 2026
Collaborator

potiuk
Mar 15, 2026
Collaborator

potiuk
Mar 15, 2026
Collaborator

anavrotski
Mar 16, 2026
Author

potiuk
Mar 16, 2026
Collaborator

ajinp-dot
Mar 18, 2026