-
-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common/swdptap: make SWD timing more consistent #1714
Conversation
I think the cycles saved by this change are small enough that my logic analyzer can't reliably see them, so I'm going to mark this as ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a code perspective this all looks good to us. We'll try to get this tested on our local setup in the next couple of days to see what the before-and-after on the timing and waveforms looks like with a mind to merge this if it all looks fine.
We'll post our testing results as images and traces here to be able to reference back to if needed.
I wonder if it's useful to hang some extra capacitance off of the probe wiring, to slow things down enough to A/B test the change to the bit read timing. (I don't have a MSO or similar at the moment, which would likely help a lot with this kind of test.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've just given this a test, screenshots of the resulting waveforms to follow, and while the delay routines on minimum delay now have a very slight duty cycle imbalance on SWCLK, the clarity improvements and other changes more than make up for that and it doesn't cause bad behaviour from parts tested. If all involved are happy, we're happy to merge this.
Here are our results from testing this PR first is a set of before captures from
|
Thanks for the review! I noticed similar changes as what you saw, but was also limited to a logic analyzer with 24MHz sample rate. Many of the asymmetries are on the order of one LA sample, and aren't that consistent. I can take a closer look to see if there are any asymmetries that are somewhat more consistent, and try to adjust the timing on those. On BMP native clocked at 72MHz (default), it seems like a 1-cycle difference could easily result in an inconsistent 1-sample asymmetry at 24MSamp/sec. It would be nice to get some traces from a faster logic analyzer. |
What we tend to do is assume that the asymmetry errs on the side that appears more commonly - so that final capture is showing an asymmetry ~2/3 of the way to the right between the two seen asymetries - they're both asymetrical on the side of the high time being low; just.. slightly closer to even than the captures showing one high period would otherwise suggest Agreed, though, that getting some high sample density captures would be great - if we can get our scope set up for it'll we'll try to get some captures with that - but that'll be a few days out. |
Force-push is to resolve a merge conflict; I haven't dug into the asymmetries in detail yet. |
After poking at this a bit, staring at LA traces and disassembly, most of the calculations that occur inside the loop happen during the clock low phase for output. Input calculations can be moved around a bit. In any case, moving the clock clear to the top of the loop for output makes it easier to rebalance the duty cycle. ( |
Thanks for the review and detailed timing analysis! I've made some adjustments that seem to improve the duty cycle for me, at least as well as I can see at 24MSamp/sec. |
Ah, sorry the ball got dropped a bit on this one - could you rebase this on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies this PR slipped away from us.. We've a few review notes from this round.
With the up-vs-down loops, we don't recall if we've already had a conversation about it and what the outcome was, so please forgive us if you've already explained to us before if there's a merit for the proposed change and we've just forgotten it. If there's a clearly demonstrated technical reason for using the count-down style here, please add comments in the code explaining the choice and otherwise ignore those 4 lints.
We're working to tidy up and merge outstanding good quality PRs at the moment, and notice this one's not seen any movement since our review. If you're happy with us sorting the outstanding items and rebasing this on |
Catching up… I'll try to look at it this week as well. I first have to see if I can even get the tree to build on macOS at the moment (I remember hearing about some Meson problems). |
Sounds good! 🤞🏼 you can figure out the macOS build - there was a set of bug fixes pushed to |
Got Meson to work on macOS. Native doesn't fit into ROM with the default config on Still puzzling through at least one merge conflict. |
That sounds like some fairly reasonable progress. Which compiler version are you using for |
|
Ah, yes, that toolchain has worse codegen than 12.2.Rel1 so that makes sense of that - this is why we've held back on upgrading toolchains for a while now because the project's trying to squeeze every possible bit of space out at the moment.. |
Make SWD timing more consistent: output changes (including floating) take place after the falling edge, and reads take place before the rising edge. This includes moving a gpio_get to occur after the clock-low delay, which would help reduce incorrect reads from lower-bandwidth target connections. Eliminate redundant GPIO operations, typically ones that set the clock low, by making the bit string input/output primitives always end by outputting a falling clock edge.
Make clock cycle more symmetric, especially for `no_delay` variants. `swdptap_seq_out_no_delay` seems to need the clock clear to be moved back to the top of the loop, to ensure minimal delay between the falling edge and the SWDIO change. Also use a different loop counter that compiles to a `SUBS` `BCC`/`BCS` pair, instead of using an explicit `CMP`.
Force push is to resolve merge conflicts. I'll check the other comments shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All review items address, so LGTM - merging. Thank you for the contribution!
Make SWD timing more consistent: output changes (including floating) take place after the falling edge, and reads take place before the rising edge.
This includes moving a gpio_get to occur after the clock-low delay, which would help reduce incorrect reads from lower-bandwidth target connections.
Eliminate redundant GPIO operations, typically ones that set the clock low, by making the bit string input/output primitives always end by outputting a falling clock edge.
Detailed description
This makes the bit banging code more consistent and easier to analyze. It also removes some redundant clearing of the clock, typically because most existing big banging functions ended on a falling clock edge, and also began with an explicit clearing of the clock. This is no longer necessary, if we adopt a convention that the "resting" phase of the clock is low.
It also moves a misplaced
gpio_get
from before a clock-low delay to after it, so it comes right before the rising clock edge. This can help avoid incorrect reads from lower-bandwidth target connections. It might be helpful for @stansotn or someone else with such target connections to test it out.I'll mark it ready for review after testing on BMDA, though I suspect that BMDA might slow things down too enough to make the differences harder to see on the logic analyzer.
Note that adopting an inlined busy-loop function as in #1688 might be needed to gain more significant clock consistency.
Your checklist for this pull request
make PROBE_HOST=native
)make PROBE_HOST=hosted
)Closing issues
N/A