Initial implementation of parallel type checking #20280

ilevkivskyi · 2025-11-21T00:23:28Z

Fixes #933

This is not very polished, but a fully functional implementation. This gives ~1.5x performance improvement for self-check. I think we can keep this feature hidden while we iterate on it. A very high-level overview is to start n workers, each of which loads the graph, the coordinator process then submits SCCs one by one as they become unblocked by dependencies. Workers use regular cache to get information about SCCs processed by other workers. There are more details in the docstring for worker.py.

Some notes:

I moved some code around, so that we can share as many things as possible between the daemon and workers IPC.
For now I use some hybrid binary-JSON format for messages, but this is temporary. I am going to switch to a proper binary fixed format soon.
Windows is not supported yet, the missing part is def ready_to_read(conns: list[IPCClient]) -> list[int].
Right now workers use default stdout/stderr. This is easier for debugging, but I think we may switch to writing to a log file at some point (like the daemon does).
I add a GC freeze trick for initial graph loading. It is very similar to the GC freeze trick for warm runs. I don't see any visible memory use increase, while it gives 8-10% speedup (even for single-process runs). Note I disable it in tests, since we run each test in the same process.
Testing in general was the trickiest part. There are various implicit assumptions that don't work for parallel checking. I use environment variables to "propagate" those assumptions.
I add two CI jobs (regular and compiled) that run ~60% of all tests with 4 parallel workers. For now I skip 15 tests in parallel mode (all because of some incremental mode bugs):
- Two because of a crash on star imports in import cycles (similar to request-cache with --no-implicit-reexport cause an AssertionError #11025)
- One because of an inconsistency when re-exporting __all__
- Few tests because of inconsistent formatting of overloaded constructors in error messages
- Few tests because of a problem with foo defined here notes, see "<function> defined here" notes omitted when function is loaded from cache #4772
We should probably switch mypy/ipc.py to using to librt.base64. This may be not critical now, but will be important with the new parser, when we will be sending larger chunks of data over the sockets.

I am going to address some of the above issues, and re-enable tests gradually in follow-up PRs. More long term there are three main areas for further improvements:

Parallelizing parsing, not just the type-checking. It looks like this is currently the main bottleneck.
Improving SCCs packing, by splitting type-checking into public interface phase and implementations phase. We can notify the coordinator after the first phase.
Switching to lazy-loading the cache. This will become important as we will address the other two bottlenecks and will be able to use more workers.

JukkaL

Thanks of working on this -- parallel processing has a huge potential since every CPU has multiple cores, and the core counts only seem to keep increasing year after year. Not a full review, but left some minor comments.

JukkaL · 2025-11-21T15:53:27Z

mypy/build.py

+    for worker in manager.workers:
+        data = receive(worker.conn)
+        assert data["status"] == "ok"
+        send(worker.conn, {"sccs": [(list(scc.mod_ids), scc.id, list(scc.deps)) for scc in sccs]})


Precompute the data outside the loops, since it's the same for each worked.

JukkaL · 2025-11-21T16:12:24Z

mypy/main.py

    internals_group.add_argument("--export-ref-info", action="store_true", help=argparse.SUPPRESS)

+    # Experimental parallel type-checking support.
+    internals_group.add_argument("--num-workers", type=int, default=0, help=argparse.SUPPRESS)


What about allowing also -n for this, similar to pytest?

JukkaL · 2025-11-21T16:16:09Z

mypy/defaults.py

+
+RECURSION_LIMIT: Final = 2**14
+
+WORKER_START_INTERVAL: Final = 0.03


30ms can be a large fraction of Python process startup time. It might be a bit more efficient to have this as 10ms, for example, to speed up small builds a little.

JukkaL · 2025-11-21T16:17:00Z

mypy/build_worker/worker.py

+* Load graph using the sources, and send "ok" to coordinator.
+* Receive SCC structure from coordinator, and ack it with an "ok".
+* Receive an SCC id from coordinator, process it, and send back the results.
+* When prompted by coordinator (with s "final" message), cleanup and shutdown.


What's s "final?

This is just a typo, should be a "final" :-)

JukkaL · 2025-11-21T16:24:39Z

.github/workflows/test.yml

+          python: '3.14'
+          os: ubuntu-24.04-arm
+          toxenv: py
+          tox_extra_args: "-n 4 --mypy-num-workers=4 mypy/test/testcheck.py"


Would it make sense to add some test cases that are specifically designed to test parallel type checking, e.g. a long import chain, or potential for large number of parallelism (no need to do this in this PR)?

Yes, I was thinking about this. I will add this to the list of follow-up items in PR description so that I will not forget about it.

JukkaL · 2025-11-21T16:32:20Z

mypy/build.py


+    workers = []
+    if options.num_workers > 0:
+        pickled_options = pickle.dumps(options.snapshot())


Later on, we may want to use something more efficient than pickle (but it's fine for now). Maybe add a TODO comment about it?

JukkaL · 2025-11-21T16:32:47Z

mypy/build.py

+        for worker in workers:
+            # Start loading graph in each worker as soon as it is up.
+            worker.connect()
+            source_tuples = [(s.path, s.module, s.text, s.base_dir, s.followed) for s in sources]


Calculate the list outside the loop, since it's the same for each worker.

ilevkivskyi · 2025-11-22T19:50:21Z

@JukkaL I addressed your comments. Please let me know if you want to take a look again before this is merged.

JukkaL · 2025-11-24T14:53:10Z

I want to have another look and try this out a little before merging, probably by Tue/Wed this week.

JukkaL · 2025-11-27T17:06:31Z

I tried it on a huge codebase at work, on macOS, and encountered this crash:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "mypy/build_worker/worker.py", line 84, in main
  File "mypy/build_worker/worker.py", line 137, in serve
  File "mypy/build.py", line 3674, in process_stale_scc
  File "mypy/build.py", line 2205, in reload_meta
  File "mypy/build.py", line 1488, in find_cache_meta
  File "mypy/build.py", line 1328, in _load_json_file
  File "mypy/metastore.py", line 186, in read
  File "mypy/metastore.py", line 173, in _query
OperationalError: database is locked

This is probably related to using sqlite for the cache.

More complete output:

/Users/jukka/src/server/mypy-stubs/redis/commands/search/result.pyi: error: INTERNAL ERROR -- Please try using mypy master on GitHub:
https://mypy.readthedocs.io/en/stable/common_issues.html#using-a-development-mypy-build
Please report a bug at https://github.com/python/mypy/issues
version: 1.19.0+dev.d63a2fcae851176310575f33ac328559481f82d1
/Users/jukka/src/server/mypy-stubs/redis/commands/search/result.pyi: : note: use --pdb to drop into pdb
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "mypy/build_worker/worker.py", line 84, in main
  File "mypy/build_worker/worker.py", line 137, in serve
  File "mypy/build.py", line 3674, in process_stale_scc
  File "mypy/build.py", line 2205, in reload_meta
  File "mypy/build.py", line 1488, in find_cache_meta
  File "mypy/build.py", line 1328, in _load_json_file
  File "mypy/metastore.py", line 186, in read
  File "mypy/metastore.py", line 173, in _query
OperationalError: database is locked
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/jukka/src/mypy/mypy/__main__.py", line 37, in <module>
    console_entry()
  File "/Users/jukka/src/mypy/mypy/__main__.py", line 15, in console_entry
    main()
  File "mypy/main.py", line 127, in main
  File "mypy/main.py", line 211, in run_build
  File "mypy/build.py", line 305, in build
  File "mypy/build.py", line 402, in build_inner
  File "mypy/build.py", line 3158, in dispatch
  File "mypy/build.py", line 3564, in process_graph
  File "mypy/build.py", line 1050, in wait_for_done
  File "mypy/build.py", line 1064, in wait_for_done_workers
  File "mypy/ipc.py", line 387, in receive
OSError: No data received

ilevkivskyi · 2025-11-27T17:12:34Z

Oh yes, using sqlite cahce may be tricky, multiple processes can't probably write at the same time. I will check what is the standard workaround for this (maybe just a retry).

ilevkivskyi · 2025-11-28T01:32:35Z

@JukkaL I think --sqlite-cache should work now with parallel checking. It looks like committing after each SCC fixes the problem (at least for me).

JukkaL · 2025-11-28T11:18:40Z

Thanks! I will test using --sqlite-cache today. Without it, I parallel checking worked on our big internal codebase, and with 3 workers I saw about 30-35% speedup (though I only did a few measurements). It was using an absolutely massive amount of memory though (which is as expected until we have the new parser).

ilevkivskyi · 2025-11-29T13:13:04Z

@JukkaL if you don't have any "large-scale" comments, I would prefer to merge this soon, and fix smaller things in follow up PRs incrementally (this is hidden behind a flag anyway). Otherwise it will just gather dust and merge conflicts.

github-actions · 2025-11-29T13:28:25Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

ilevkivskyi added 17 commits November 13, 2025 23:59

Initial random experiments

8853a1b

Play more with GC

37533da

Add some error handling logic

465d84b

Better handling of the GC trick

76e00ce

Basic handling for blockers

3a5bc3c

Basic crash handling

25ca8d5

Better testing support

5775df5

Merge remote-tracking branch 'upstream/master' into simple-parallel

8e901f9

Better testing support

2cd69e4

Skip some tests

1e5b1fe

Fix custom error codes

518218e

Skip couple more tests for now

2d70d59

Add CI jobs

fa50b91

Undo unnecessary changes

f69b81b

Some refactoring

3e6ff9d

Add some docs/comments

f70123a

One more comment; fix a typo

42a530d

ilevkivskyi requested a review from JukkaL November 21, 2025 00:23

This comment has been minimized.

Sign in to view

JukkaL reviewed Nov 21, 2025

View reviewed changes

Address CR

d63a2fc

This comment has been minimized.

Sign in to view

ilevkivskyi added 2 commits November 27, 2025 23:43

Fix --sqlite-cache

1cb50dd

Merge remote-tracking branch 'upstream/master' into simple-parallel

1c23255

This comment has been minimized.

Sign in to view

ilevkivskyi added 3 commits November 29, 2025 01:43

Merge remote-tracking branch 'upstream/master' into simple-parallel

90cc500

Apply black

8126bc1

A tiny refactor

df84adb

This comment has been minimized.

Sign in to view

ilevkivskyi added 2 commits November 29, 2025 13:06

Merge remote-tracking branch 'upstream/master' into simple-parallel

d59494d

Fix merge

1b7ed16


		RECURSION_LIMIT: Final = 2**14

		WORKER_START_INTERVAL: Final = 0.03

Uh oh!

Initial implementation of parallel type checking #20280

Are you sure you want to change the base?

Initial implementation of parallel type checking #20280

Uh oh!

Conversation

ilevkivskyi commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi commented Nov 22, 2025

Uh oh!

This comment has been minimized.

JukkaL commented Nov 24, 2025

Uh oh!

JukkaL commented Nov 27, 2025

Uh oh!

ilevkivskyi commented Nov 27, 2025

Uh oh!

This comment has been minimized.

ilevkivskyi commented Nov 28, 2025

Uh oh!

JukkaL commented Nov 28, 2025

Uh oh!

This comment has been minimized.

ilevkivskyi commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilevkivskyi commented Nov 21, 2025 •

edited

Loading