Skip to content

Conversation

@ilevkivskyi
Copy link
Member

@ilevkivskyi ilevkivskyi commented Nov 21, 2025

Fixes #933

This is not very polished, but a fully functional implementation. This gives ~1.5x performance improvement for self-check. I think we can keep this feature hidden while we iterate on it. A very high-level overview is to start n workers, each of which loads the graph, the coordinator process then submits SCCs one by one as they become unblocked by dependencies. Workers use regular cache to get information about SCCs processed by other workers. There are more details in the docstring for worker.py.

Some notes:

  • I moved some code around, so that we can share as many things as possible between the daemon and workers IPC.
  • For now I use some hybrid binary-JSON format for messages, but this is temporary. I am going to switch to a proper binary fixed format soon.
  • Windows is not supported yet, the missing part is def ready_to_read(conns: list[IPCClient]) -> list[int].
  • Right now workers use default stdout/stderr. This is easier for debugging, but I think we may switch to writing to a log file at some point (like the daemon does).
  • I add a GC freeze trick for initial graph loading. It is very similar to the GC freeze trick for warm runs. I don't see any visible memory use increase, while it gives 8-10% speedup (even for single-process runs). Note I disable it in tests, since we run each test in the same process.
  • Testing in general was the trickiest part. There are various implicit assumptions that don't work for parallel checking. I use environment variables to "propagate" those assumptions.
  • I add two CI jobs (regular and compiled) that run ~60% of all tests with 4 parallel workers. For now I skip 15 tests in parallel mode (all because of some incremental mode bugs):
  • We should probably switch mypy/ipc.py to using to librt.base64. This may be not critical now, but will be important with the new parser, when we will be sending larger chunks of data over the sockets.

I am going to address some of the above issues, and re-enable tests gradually in follow-up PRs. More long term there are three main areas for further improvements:

  • Parallelizing parsing, not just the type-checking. It looks like this is currently the main bottleneck.
  • Improving SCCs packing, by splitting type-checking into public interface phase and implementations phase. We can notify the coordinator after the first phase.
  • Switching to lazy-loading the cache. This will become important as we will address the other two bottlenecks and will be able to use more workers.

@ilevkivskyi ilevkivskyi requested a review from JukkaL November 21, 2025 00:23
@github-actions

This comment has been minimized.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks of working on this -- parallel processing has a huge potential since every CPU has multiple cores, and the core counts only seem to keep increasing year after year. Not a full review, but left some minor comments.

mypy/build.py Outdated
for worker in manager.workers:
data = receive(worker.conn)
assert data["status"] == "ok"
send(worker.conn, {"sccs": [(list(scc.mod_ids), scc.id, list(scc.deps)) for scc in sccs]})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precompute the data outside the loops, since it's the same for each worked.

mypy/main.py Outdated
internals_group.add_argument("--export-ref-info", action="store_true", help=argparse.SUPPRESS)

# Experimental parallel type-checking support.
internals_group.add_argument("--num-workers", type=int, default=0, help=argparse.SUPPRESS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about allowing also -n for this, similar to pytest?

mypy/defaults.py Outdated

RECURSION_LIMIT: Final = 2**14

WORKER_START_INTERVAL: Final = 0.03
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30ms can be a large fraction of Python process startup time. It might be a bit more efficient to have this as 10ms, for example, to speed up small builds a little.

* Load graph using the sources, and send "ok" to coordinator.
* Receive SCC structure from coordinator, and ack it with an "ok".
* Receive an SCC id from coordinator, process it, and send back the results.
* When prompted by coordinator (with s "final" message), cleanup and shutdown.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's s "final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a typo, should be a "final" :-)

python: '3.14'
os: ubuntu-24.04-arm
toxenv: py
tox_extra_args: "-n 4 --mypy-num-workers=4 mypy/test/testcheck.py"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add some test cases that are specifically designed to test parallel type checking, e.g. a long import chain, or potential for large number of parallelism (no need to do this in this PR)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking about this. I will add this to the list of follow-up items in PR description so that I will not forget about it.


workers = []
if options.num_workers > 0:
pickled_options = pickle.dumps(options.snapshot())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later on, we may want to use something more efficient than pickle (but it's fine for now). Maybe add a TODO comment about it?

mypy/build.py Outdated
for worker in workers:
# Start loading graph in each worker as soon as it is up.
worker.connect()
source_tuples = [(s.path, s.module, s.text, s.base_dir, s.followed) for s in sources]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculate the list outside the loop, since it's the same for each worker.

@ilevkivskyi
Copy link
Member Author

@JukkaL I addressed your comments. Please let me know if you want to take a look again before this is merged.

@github-actions

This comment has been minimized.

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 24, 2025

I want to have another look and try this out a little before merging, probably by Tue/Wed this week.

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 27, 2025

I tried it on a huge codebase at work, on macOS, and encountered this crash:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "mypy/build_worker/worker.py", line 84, in main
  File "mypy/build_worker/worker.py", line 137, in serve
  File "mypy/build.py", line 3674, in process_stale_scc
  File "mypy/build.py", line 2205, in reload_meta
  File "mypy/build.py", line 1488, in find_cache_meta
  File "mypy/build.py", line 1328, in _load_json_file
  File "mypy/metastore.py", line 186, in read
  File "mypy/metastore.py", line 173, in _query
OperationalError: database is locked

This is probably related to using sqlite for the cache.

More complete output:

/Users/jukka/src/server/mypy-stubs/redis/commands/search/result.pyi: error: INTERNAL ERROR -- Please try using mypy master on GitHub:
https://mypy.readthedocs.io/en/stable/common_issues.html#using-a-development-mypy-build
Please report a bug at https://github.com/python/mypy/issues
version: 1.19.0+dev.d63a2fcae851176310575f33ac328559481f82d1
/Users/jukka/src/server/mypy-stubs/redis/commands/search/result.pyi: : note: use --pdb to drop into pdb
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "mypy/build_worker/worker.py", line 84, in main
  File "mypy/build_worker/worker.py", line 137, in serve
  File "mypy/build.py", line 3674, in process_stale_scc
  File "mypy/build.py", line 2205, in reload_meta
  File "mypy/build.py", line 1488, in find_cache_meta
  File "mypy/build.py", line 1328, in _load_json_file
  File "mypy/metastore.py", line 186, in read
  File "mypy/metastore.py", line 173, in _query
OperationalError: database is locked
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/jukka/src/mypy/mypy/__main__.py", line 37, in <module>
    console_entry()
  File "/Users/jukka/src/mypy/mypy/__main__.py", line 15, in console_entry
    main()
  File "mypy/main.py", line 127, in main
  File "mypy/main.py", line 211, in run_build
  File "mypy/build.py", line 305, in build
  File "mypy/build.py", line 402, in build_inner
  File "mypy/build.py", line 3158, in dispatch
  File "mypy/build.py", line 3564, in process_graph
  File "mypy/build.py", line 1050, in wait_for_done
  File "mypy/build.py", line 1064, in wait_for_done_workers
  File "mypy/ipc.py", line 387, in receive
OSError: No data received

@ilevkivskyi
Copy link
Member Author

Oh yes, using sqlite cahce may be tricky, multiple processes can't probably write at the same time. I will check what is the standard workaround for this (maybe just a retry).

@github-actions

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

@JukkaL I think --sqlite-cache should work now with parallel checking. It looks like committing after each SCC fixes the problem (at least for me).

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 28, 2025

Thanks! I will test using --sqlite-cache today. Without it, I parallel checking worked on our big internal codebase, and with 3 workers I saw about 30-35% speedup (though I only did a few measurements). It was using an absolutely massive amount of memory though (which is as expected until we have the new parser).

@github-actions

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

@JukkaL if you don't have any "large-scale" comments, I would prefer to merge this soon, and fix smaller things in follow up PRs incrementally (this is hidden behind a flag anyway). Otherwise it will just gather dust and merge conflicts.

@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster, parallel type checking

2 participants