Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dfp fpar #178

Merged
merged 20 commits into from
Nov 12, 2019
Merged

Dfp fpar #178

merged 20 commits into from
Nov 12, 2019

Conversation

yao-kang
Copy link
Collaborator

Added FPAR to dragonfly-plus model

yao-kang added 5 commits July 10, 2019 10:08
…of parallel global links in case that spine router has multiple global links to another group
…eplay.c method.

Add UR, Broadcast, Tornado, 3d stencil, background traffic pattern for multi workload simulation
2) Add model-net level message app id identification & window based message receiving rate per
application on spine router. Behavior is controlled by providing a start, end time and window size
in dragonfly plus configuration file.
@nmcglo nmcglo self-assigned this Jul 18, 2019
Copy link
Member

@nmcglo nmcglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good job! Address the comments and I'll be confident in merging it once I just run some tests on it!

src/network-workloads/model-net-mpi-replay.c Outdated Show resolved Hide resolved
src/network-workloads/model-net-synthetic-dfly-plus.c Outdated Show resolved Hide resolved
src/network-workloads/model-net-synthetic-dfly-plus.c Outdated Show resolved Hide resolved
src/networks/model-net/dragonfly-plus.C Show resolved Hide resolved
src/networks/model-net/dragonfly-plus.C Show resolved Hide resolved
src/networks/model-net/dragonfly-plus.C Outdated Show resolved Hide resolved
src/networks/model-net/dragonfly-plus.C Show resolved Hide resolved
src/networks/model-net/dragonfly-plus.C Show resolved Hide resolved
deleting unnecessay comments / unused variables
add small random value for tw_event_new in issue_event_non_delay function
define max_app max_msg constants to avoid hardcoding
- add more comments/printing messages indicating that fpar works with alpha scoring
and uses a routing threshold value between [0-100]
…olute_best_connection_from_conns()

when choosing a random vector element
global link connection causing fpar routing failure
@nmcglo
Copy link
Member

nmcglo commented Oct 25, 2019

FYI: I haven't forgotten about this - just have a bit of backlog. Will address it following my refactoring merge at the end of this month.

yao-kang and others added 7 commits October 26, 2019 18:03
be evaluated instead of "traffic" pattern
The additions to make this synthetic workload generator allow multiple
synthetic workloads were done in a way that made it break for just a
standard synthetic run.

This was because the jobmap was not initialized for the case when there
was only a single job and so when that jobmap is polled to find what
job ID an MPI rank belongs to, it segfaults.

The solution to this is to initialize the jobmap_ctx value with
jobmap_ctx = codes_jobmap_configure(CODES_JOBMAP_IDENTITY, &jobmap_p);
where the IDENTITY jobmapper provides the funcionality so that if
the job ID lookup is called when there's only a single job, it doesn't
segfault.
Commit a4780ae changed the behavior of the LPIO output files
generated by the Dragonfly Plus model so that if there was no
traffic on a port, then stats for that port would not be printed.

While this might make for a cleaner output file, it makes the
length of said file dependent on the simulation that it ran.
Comparing two output files visually by a human between two
different types of simulations becomes impossible.

It is also limiting in that it becomes more difficult to immediately
tell if there's a problem. For example, not long ago I noticed that
there was a port that wasn't getting any traffic due to a problem
in the routing implementation. This was easily visible because
there was a line with 0 traffic. If that line hadn't been printed
at all, it might not have been as easy to see had I not been looking
for it.

If this change was made to make creating graphs from non-negligible
data easier, then that is something that should be implemented as a
data pre-processing step and not hardcoded behavior in a generalized
simulation model.
During a code review a long time ago when this change was initially
made, I suggested the TODO here. That was a bad on my part, it could
have just been implemented as such. Scoring a terminal connection
doesn't really mean much but maybe it could. At this point I see
no reason to not allow it.
This commit changes the way that the recently introduced
dfp_src_terminal_id value encoded into messages is initialized.

Previously it was set based on codes_mapping_get_lp_relative_id()
relative to the server that generated the packet. This is
coincidentally the same as the terminal ID if there is only a
single server per terminal but this is not always the case.

It is now set during packet_generate() which is the function
called by the terminal that is generating the packet and so
we can cut out the call to codes_mapping_get_lp_relative_id()
entirely.
I added a note here. There are major changes to this file, enough
that it probably warrants being a new file. I intend on unifying
synthetic workload generators going forward so this is more of a
note for anyone who wanted to have access to the old behavior
and a reminder to myself what the old version was when I'm making
the new workloads.
@nmcglo nmcglo changed the base branch from master to develop November 12, 2019 01:12
@nmcglo
Copy link
Member

nmcglo commented Nov 12, 2019

I am merging this now. @yao-kang, I made some alterations. Most are pretty self explanatory and minor - the only one that might make a difference to you is commit 7f65fd8. It undoes the change that made it so that routers/terminals wouldn't output data for ports that didn't have any traffic on them.

I'm also not a huge fan of splitting the link stat file into two as that's another pre-processing type operation, but I let it go since that will be covered when I tackle #182

@nmcglo nmcglo merged commit 7b6f6fb into develop Nov 12, 2019
@nmcglo nmcglo deleted the dfp-fpar branch July 12, 2022 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants