Skip to content

Conversation

@metascroy
Copy link
Contributor

@metascroy metascroy commented Dec 3, 2025

This PR:

With these changes, we estimate performance of Llama1B on iPhone 15 Pro / iOS 26 at:

  • 30 tok/sec decode
  • 1900 tok/sec prefill

Differential Revision: D88083155

Summary:
This fixes issues with the ANE-friendly llama on iOS26.  See updated readme.md for more information.

A key change is decomposing SDPA into matmuls and softmax because iOS26 has a bug in its implementation of SDPA on the ANE.

Differential Revision: D88083155
@metascroy metascroy requested a review from cccclai as a code owner December 3, 2025 00:01
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16057

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 7 Cancelled Jobs

As of commit c78b111 with merge base 144a37d (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 3, 2025

@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88083155.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CoreML Backend] macOS 26.1 ANE regression: fp16 LLaMA inference produces inf/nan (worked on macOS 15.7)

1 participant