-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dfp fpar #178
Conversation
…of parallel global links in case that spine router has multiple global links to another group
…eplay.c method. Add UR, Broadcast, Tornado, 3d stencil, background traffic pattern for multi workload simulation
2) Add model-net level message app id identification & window based message receiving rate per application on spine router. Behavior is controlled by providing a start, end time and window size in dragonfly plus configuration file.
…nection to every other group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall good job! Address the comments and I'll be confident in merging it once I just run some tests on it!
deleting unnecessay comments / unused variables add small random value for tw_event_new in issue_event_non_delay function define max_app max_msg constants to avoid hardcoding
- add more comments/printing messages indicating that fpar works with alpha scoring and uses a routing threshold value between [0-100]
…olute_best_connection_from_conns() when choosing a random vector element
global link connection causing fpar routing failure
FYI: I haven't forgotten about this - just have a bit of backlog. Will address it following my refactoring merge at the end of this month. |
be evaluated instead of "traffic" pattern
The additions to make this synthetic workload generator allow multiple synthetic workloads were done in a way that made it break for just a standard synthetic run. This was because the jobmap was not initialized for the case when there was only a single job and so when that jobmap is polled to find what job ID an MPI rank belongs to, it segfaults. The solution to this is to initialize the jobmap_ctx value with jobmap_ctx = codes_jobmap_configure(CODES_JOBMAP_IDENTITY, &jobmap_p); where the IDENTITY jobmapper provides the funcionality so that if the job ID lookup is called when there's only a single job, it doesn't segfault.
Commit a4780ae changed the behavior of the LPIO output files generated by the Dragonfly Plus model so that if there was no traffic on a port, then stats for that port would not be printed. While this might make for a cleaner output file, it makes the length of said file dependent on the simulation that it ran. Comparing two output files visually by a human between two different types of simulations becomes impossible. It is also limiting in that it becomes more difficult to immediately tell if there's a problem. For example, not long ago I noticed that there was a port that wasn't getting any traffic due to a problem in the routing implementation. This was easily visible because there was a line with 0 traffic. If that line hadn't been printed at all, it might not have been as easy to see had I not been looking for it. If this change was made to make creating graphs from non-negligible data easier, then that is something that should be implemented as a data pre-processing step and not hardcoded behavior in a generalized simulation model.
During a code review a long time ago when this change was initially made, I suggested the TODO here. That was a bad on my part, it could have just been implemented as such. Scoring a terminal connection doesn't really mean much but maybe it could. At this point I see no reason to not allow it.
This commit changes the way that the recently introduced dfp_src_terminal_id value encoded into messages is initialized. Previously it was set based on codes_mapping_get_lp_relative_id() relative to the server that generated the packet. This is coincidentally the same as the terminal ID if there is only a single server per terminal but this is not always the case. It is now set during packet_generate() which is the function called by the terminal that is generating the packet and so we can cut out the call to codes_mapping_get_lp_relative_id() entirely.
I added a note here. There are major changes to this file, enough that it probably warrants being a new file. I intend on unifying synthetic workload generators going forward so this is more of a note for anyone who wanted to have access to the old behavior and a reminder to myself what the old version was when I'm making the new workloads.
I am merging this now. @yao-kang, I made some alterations. Most are pretty self explanatory and minor - the only one that might make a difference to you is commit 7f65fd8. It undoes the change that made it so that routers/terminals wouldn't output data for ports that didn't have any traffic on them. I'm also not a huge fan of splitting the link stat file into two as that's another pre-processing type operation, but I let it go since that will be covered when I tackle #182 |
Added FPAR to dragonfly-plus model