-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize VRP struct sizes. Use netip.Prefix instead of net.IPNet. #103
Conversation
On the surface this looks fine, I would however want to see if two stayrtr's (old and this PR) running head to head for a few days with rtrmon comparing the two, just to make sure :) |
Very nice analysis and a clear change!
That is probably the best way to be sure it is functionally identical. This way you can check that the VRPs converge after a set delay. The relevant metric is |
For clarification, is this an experiment you need me to run or are you doing this? |
I would prefer it if you ran it (I likely won't have time until 2024). Do you have a working setup with prometheus? If we agree that convergence should be within 256s, I would check the history for violations of that property
|
271420b
to
236c4bf
Compare
Ok hopefully I did this right, I ran something like this on a linux host with binaries built from my branch:
Output looks like the below in prom. I do not see anything with label For whats its worth running a brief test using binaries built from upstream looks the same. Anyway I will leave this running for a couple of days and report back. Thanks! |
Those labels should be there eventually. Let's see! |
This not only blocks #105 but also additional work to make the delta handling for ASPA correct. |
@ties I have had 2 instances of I'm not sure exactly what you are looking for here, happy to share any additional views you may want to see here. |
Thanks for doing this!
I hoped to see a flat line, as in "the worst case divergence converges within 256s".
Alternatively, |
I just realised the default refresh interval is 600s; in that case, 851 or 1024 are the first values that are likely to converge. I am also collecting data myself now, I should be able to give an update tomorrow. |
I'm starting to wonder if this isn't the best way to check this or if I am doing something wrong. Yes, I am using default timers for everything but both upstream and my fork are showing a non-zero delta so either the testing methodology is incorrect or I'm doing something wrong here: for what its worth the upstream deltas look "worse" to my eye. So it sounds like you suggest perhaps running two stayrtr's (fork and master branch) and configure rtrmon to diff those two instead? I can give that a try and report back. |
that seems helpful indeed - before merging |
I'm evaluating stayrtr in a memory constrained environment. The memory footprint of stayrtr could be optimized a bit. It seems the bulk of the memory is allocated during updates but there are also some things that could be modified to reduce memory usage during steady state. To do this I'm proposing the following changes:
Move from go 1.17 to 1.21
Use
netip.Prefix
instead ofnet.IPNet
in a few structs to reduce field sizes (requires at least go1.18 which is why we change the base go version in go.mod). Bulk of the lines changed in this PR are related to this change but generally do not change much of the logic. We simply swap out equivalent functions innetip
in place of those fromnet
Reordering fields in the
VRP
struct to reduce the size of the struct from 64 bytes to 40 bytesCurrent:
New:
PDUIPv4Prefix
andPDUIPv6Prefix
structs from 72 bytes to 40 bytesCurrent:
New:
ComputeDiff()
except in tests as these are not needed other than for debug logging. We also modifyConvertSDListToMap()
to only return a map with values containing previous flags instead of the entire struct. As best I can tell we only reference the value of this map in one place and only to get previous flag values.Doing a simple side-by-side test with stayrtr started at roughly the same time on two hosts we can compare current
master
branch (yellow) memory RSS vs my forked branch (green):For what its worth it seems most of the memory allocations occurs when updating VRPs. It appears the bulk of this is related to copying data which I haven't found a clean workaround for but may be something that could be addressed in a future pr. A view of this from pprof showing lots of this from
VRP.Copy()
Beyond ensuring all test cases pass I have also run integration tests with BIRD 2.14 and confirm that RTR functionality continues to function as before with these changes.