-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMA for pulse sequences #553
Comments
Yea. You are describing the design that has been thrown around for a few years now.
my_burst = DMA("my_burst")
with my_burst.record():
delay(10*ns)
ttl0.pulse(20*ns)
for i in range(100):
dds2.pulse(300*MHz + i*1*MHz, 220*ns)
# timeline is unaltered and rewound to before the `with`
# ... new experiment, new kernel
my_pulse = DMA("my_burst")
t = now_mu()
for i in range(100)
ttl2.pulse(3*us)
my_pulse.play()
# timeline advanced by length of my_pulse
assert t + seconds_to_mu(100*(3*us + 250*ns)) == now_mu() |
Since the DMA'ed events, and thus the full list of all channels to which RTIO events will be sent by DMA, would necessarily be known at compile time, would it be possible to have |
Not necessarily.
|
OK, but the full list of channels for both if ttl0.input():
dma0.play()
ttl4.pulse(10*us)
else:
dma1.play()
ttl4.pulse(10*us) If If another channel, if ttl0.input():
dma0.play()
ttl4.pulse(10*us)
ttl7.pulse(2*us)
else:
dma1.play()
ttl4.pulse(10*us)
ttl7.pulse(2*us) |
Yes, we could add the list of touched RTIO channels into function signatures, much like there is iodelay in the compiler-assisted interleaving. |
with concurrent: # instead of `parallel`
my_burst.play()
ttl2.pulse() and it should do that with "true" parallelism. But trying to hack concurrent DMA and CPU RTIO access into this right now before having CPU concurrency seems misguided to me.
|
Not |
Good points here @jordens. Seems like the subtleties on this are pretty problematic in the version discussed above. One variant: how about allowing the CPU to perform RTIO reads, but not writes, during the DMA, as well as any operations not involving the RTIO core (calculations, RPCs, etc)? This would enable one to profitably use the DMA download time for tasks like As a variant, one could block on any RTIO commands (read or write) but allow all other types of CPU operations to proceed during the DMA. |
Funded by Oxford. |
Awesome! Will the specification be posted somewhere? |
Sure. Currently the specification is the (virtual/perceived) consensus of this issue plus a bunch of details from IRC. It'll probably also change a bit when we take a step back and see DRTIO in its full glory and have a clear perspective how to hook DMA to it. |
No specific questions, just wanted to know if there would be a public place where the current "vision" is maintained. |
Yeah. That's something we want to do. |
Basic gateware written, works in functional simulation. |
Has a timeline been posted for the completion of this? |
Work on this will begin as soon as the network stack fiasco is resolved, and it shouldn't take long to get the basic functionality working (~week at most) unless there is another series of obscure bugs. |
It would be helpful to have a development checklist for keeping track of what the steps are and how things are progressing. Based on wiki. |
There was another series of obscure bugs, that are now dealt with. Only #700 remains. |
@sbourdeauducq @jordens @whitequark @dleibrandt @r-srinivas @dtcallcock I'm posting this here to provide a central place for discussions regarding how one might use direct memory access (DMA) to program pulse sequences that for whatever reason are undesirable to implement in the typical way (processor calculating and pushing timestamps to RTIO core one by one).
For long pulse sequences where the average time between RTIO output events is less than the time it takes for the processor to compute and push timestamps (roughly 1 us each), the slack will be steadily reduced and will eventually cause an RTIO underflow. The solution to this is to calculate timestamps farther in advance. The simplest way to do this is to make sufficiently deep FIFOs for the relevant channels, and then set the initial slack sufficiently large (e.g. using a single long delay) to enable the processor to calculate and push all the timestamps with enough slack remaining at the end.
This method may not be desirable in some cases (e.g. where experimental duty cycle is important), so another solution would be to precompute RTIO timestamps on the PC, load them in to RAM on the core device, and have the processor call for the timestamps to be pushed directly from RAM to the output FIFOs at the appropriate time.
There are numerous technical/implementation questions to be discussed. I list a few below, some with suggested answers and some without.
with
statement, that the compiler recognizes as something that needs to be converted to timestamps on the PC and uploaded to RAM.RTIOSequenceError
?The text was updated successfully, but these errors were encountered: