-
Notifications
You must be signed in to change notification settings - Fork 2
/
TODO
201 lines (160 loc) · 8.71 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
* sort/align endpoint eventq fields for better locality
* group similar fields in user structure for cache effects, cache-align some fields?
+ add a counter per list and group them as well to reduce the progression loop overhead
* idr for the peer table
* dynamically alloc the sendq_map index array out of the medium request?
* drop the mmapped jiffies
* use clock_gettime(CLOCK_MONOTONIC_COARSE) when available,
it has the HZ resolution and cost about 10ns
* use clock_gettime(CLOCK_MONOTONIC) otherwise,
but it costs about 30ns
* Symlinks for both the static and shared libraries are
created even if --disable-shared and/or --disable-static
is given
+ or drop libopen-mx.so and only ship libmyriexpress.so
(and always enable abi compat)
* set the library ABI version in the library filename
* fix more icc warnings
* constify the open-mx interface ?
* add restrict when c99 is enabled?
* stop supporting mtu!=1500/9000 ?
+ go back to mtu/wire-compat indep library, with send
function pointers initialized at connect
- how to generate the multiple instances ?
* cleanup the skbfrags stuff:
+ is it the number of attached frags, or also the linear part?
+ enforce it or just try to optimize?
+ fix the way to compute frags in medium_sends
+ or just make it a binary?
* how to tune mtu/packet-ring-size since 8k-mediums do not always help?
* when invalidating a pinned region, change its status first, with a memory barrier
and rcu_synchronize if possible so that people using it are gone and won't come back
* push-based model for large
* make omx_prepare_binding more flexible:
+ work on a single board, with --append or --clearfile
+ pass a modulo, or even a hash function
+ pass a irq-name/pattern
* add a tool to auto-tune the interface coalescing and binding?
+ do it within the startup script?
+ driver-specific ethtool configs
* when removing a peer (for instance a local iface), make the index available again
in case of adding another peer/iface
+ need to add omx_peers_nr since we only have omx_peer_next_nr
* single cmd to send the whole mediumsq message, with single done event ?
+ get_user_pages/dev_queue_xmit, put_pages in the last callback
+ less pipelining copy/queue_xmit
* pre-alloc many requests in the critical path to avoid ENOMEM in the background later ?
+ stop aborting on failure to alloc a fake recv notify when discard an unexp rndv
* regcache
+ disable regcache in omx_rcache_test when the driver feature flag is missing
* if killed while registering, needed to mark the region as failed?
* huge page pinning support
+ add a pagesize in the region segment structure and use either pagesize of hugepagesize when possible
* if failing to deregister region
*** glibc detected *** tests/omx_pingpong: malloc(): memory corruption: 0x000000000064edd0 ***
* rework counters, distinguish packet/shared level transfer from
commands and events ?
* add a flag in the endpoint desc for skbfrags
+ use mediumva by default if not skbfrags
+ add driver-specific skbfrags quirks?
* only poll what's really need to be polled in the progression loop
+ only poll the exp event queue if some events are expected
+ only if jiffies changed
- check enough progression
- resend queue
- partners to ack
+ only process delayed requests if something happened in other loops
+ what about check endpoint desc ?
+ make request_alloc_check() called in debug only
* merge the endpoint management stuff from the kernel-matching branch?
* endpoint_close cannot be called from interrupt context
+ we can destroy region at first closing too?
* cleanup endpoint acquiring/locking/closing again
+ synchronize_rcu before freeing after unlinking
+ rcu_deference+acquire+check status during ioctl and network
+ stop allocating during the whole fd life
+ drop free/init status
+ drop the status_lock
* drop the iface endpoint array mutex and use the big ifaces mutex?
* cleanup the idr+BHlock for handles
+ sleep if not enough slot available
* fix/improve ioat support
+ sleep a bit before polling on synchronous copies
- add an empirical ioat speed, tested at startup, tuned at runtime, configurable with a module param
- use the empirical ioat speed to sleep using msleep during large
synchronous copy instead of polling
- if copy completed on first poll, increase the speed by 12.5% or so
- no need to reduce the speed ever
* support non-ethernet networks by providing ways to extract the mac address differently
+ use eth_hdr(), dev->header_ops->create() and ->parse() to clarify the accessing
to eth headers of packets and make it less ethernet specific
* affinity
+ add OMX_BH_PREWARM_CACHE= to use non-temporal only if not on the process core
- store the process core binding when opening the endpoint
and assume the binding won't change if the env var was set
* error handler when completing zombie send/recv only if ep totally open, with special message
* drop no-progression-during-one-second warning if no requests are pending?
what's a pending request? anything but a posted recv?
* resource allocation cleanups
+ add a counter of failure to alloc small buffer with ENOMEM and abort if too many?
+ add a counter of failure to pull with ENOMEM and abort if too many?
* 64bit internals by default
* 64bit API by default, unless OMX_32BITS_LENGTH is defined
+ how to force the #define OMX_32BITS_LENGTH once compiled and installed?
* look at vringfd (http://lwn.net/Articles/276364/) to replace event ring
* endpoint parameters
+ unexp queue length
* don't free partner on disconnect since we can't change the session anymore?
+ or randomify the initial session?
* thread safety
+ filter wakeup on recv/send/connect/...
+ filter on specific request id too
+ progress thread only woken up if nobody else
+ split the progression timer out of the timeout timer and make it global
and wakeup a single process
+ change the wakeup_jiffies mapped value into an ioctl parameter?
+ add a last_poll_jiffies so that the driver knows if the progress thread is needed after a timeout
* skb_clone and alloc_skb_fclone for pull and pull replies?
* wire compat
+ fix lib ack contents?
* use large region id in the lib, only allocate uint8_t id for the wire when
posting the rndv message or the pull request
* export rdma_get and rdma window management functions
* rdma_put
* write parameter in ioctl to register a region, check it when reading/writing from/to the region
+ different rdmawin id for sender/receiver
- no need to check for deadlock if too many sender's rdmawin registered
* move recv lib in the kernel
+ early ioatcopy expected medium, move the other in the recv ring, drop when no space anymore
- keep unexpected data in the ring for ever
- when unexpected is posted, notify the kernel that we acquired the ring slots and let it
finish receiving in the target buffer
+ limit unexpq size to avoid memory starvation
- or allocate default unexp buffers upto 32kB
- use a usual recv queue for unexp?
+ need to keep partner's recv_seqnum in the kernel
- and synchronize with user-space which takes care of acking
- share mapping of peer-index based array of recv seqnums between kernel and user-space?
+ use ioat to offload the whole recv medium copy and wait in the last frag
+ move the whole lib in the driver and only keep basic request status in userspace?
* add warning about mtu in nic/switches when truncated packet arrives, with printk_rate
=== obsolete ideas
* why shared memory from/to same send and recv buffer is very bad from 64k+ 128k ?
* change --iface into ---iface, and use --iface as a "close when no more endpoint"
* make NEED_SEQNUM a resource and use DELAYED for throttling as well ?
* what's in src_generation ? unset/unchecked in OMX
* wire-compat at runtime (8kB frame needs 17bits for medium length)
* req->recv.specific.medium.frags_received_mask and .accumulated_length are redundant
+ the mask is needed to detect duplicates, and the length prevents from precomputing the missing mask
* make shared comms runtime-configurable in the driver
* use mediumva for shared medium? seems slower
+ and mediums are not that useful for shared anyway (lower rndv threshold)
* fix make install -j
+ just support make -j followed by make install -j, otherwise it's a useless mess
* regcache malloc hook (if no driver feature flag)
* merge small/tiny and change the packet type in the driver to TINY only in wire-compat mode?
+ putting the data inside the request/event really helps performance, cannot remove this path
* improve event delivery
+ per cpu rings? not scalable with number of cups
+ prefetch non-temporal from the event ring ro prevent cache line bounces?
+ or write event with non-temporal? hard in the kernel