-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert DG-RePlAce algorithm to Kokkos #5352
base: master
Are you sure you want to change the base?
Convert DG-RePlAce algorithm to Kokkos #5352
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There were too many comments to post at once. Showing the first 25 out of 52. Check the log or trigger a new build to see more.
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
/////////////////////////////////////////////////////////////////////////////// | ||
|
||
#include "gpl2/MakeDgReplace.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'gpl2/MakeDgReplace.h' file not found [clang-diagnostic-error]
#include "gpl2/MakeDgReplace.h"
^
// | ||
/////////////////////////////////////////////////////////////////////////////// | ||
|
||
#include <Kokkos_Core.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'Kokkos_Core.hpp' file not found [clang-diagnostic-error]
#include <Kokkos_Core.hpp>
^
// | ||
// | ||
/////////////////////////////////////////////////////////////////////////////// | ||
#include <Kokkos_Core.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'Kokkos_Core.hpp' file not found [clang-diagnostic-error]
#include <Kokkos_Core.hpp>
^
src/gpl2/src/dct.h
Outdated
/////////////////////////////////////////////////////////////////////////////// | ||
#include <Kokkos_Core.hpp> | ||
|
||
void dct_2d_fft(const int M, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
void dct_2d_fft(const int M, | |
void dct_2d_fft(int M, |
src/gpl2/src/dct.h
Outdated
#include <Kokkos_Core.hpp> | ||
|
||
void dct_2d_fft(const int M, | ||
const int N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'N' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
const int N, | |
int N, |
src/gpl2/src/placerBase.cpp
Outdated
binCntY_ = 512; | ||
} | ||
|
||
binSizeX_ = ceil(static_cast<float>((ux_ - lx_)) / binCntX_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: call to 'ceil' promotes float to double [performance-type-promotion-in-math-fn]
src/gpl2/src/placerBase.cpp:40:
- #include <cstdio>
+ #include <cmath>
+ #include <cstdio>
binSizeX_ = ceil(static_cast<float>((ux_ - lx_)) / binCntX_); | |
binSizeX_ = std::ceil(static_cast<float>((ux_ - lx_)) / binCntX_); |
src/gpl2/src/placerBase.cpp
Outdated
} | ||
|
||
binSizeX_ = ceil(static_cast<float>((ux_ - lx_)) / binCntX_); | ||
binSizeY_ = ceil(static_cast<float>((uy_ - ly_)) / binCntY_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: call to 'ceil' promotes float to double [performance-type-promotion-in-math-fn]
binSizeY_ = ceil(static_cast<float>((uy_ - ly_)) / binCntY_); | |
binSizeY_ = std::ceil(static_cast<float>((uy_ - ly_)) / binCntY_); |
#include <string> | ||
#include <vector> | ||
|
||
#include "db_sta/dbNetwork.hh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'db_sta/dbNetwork.hh' file not found [clang-diagnostic-error]
#include "db_sta/dbNetwork.hh"
^
src/gpl2/src/placerBase.h
Outdated
int64_t nesterovInstsArea() const | ||
{ | ||
return stdInstsArea_ | ||
+ static_cast<int64_t>(round(macroInstsArea_ * targetDensity_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: call to 'round' promotes float to double [performance-type-promotion-in-math-fn]
src/gpl2/src/placerBase.h:38:
- #include <memory>
+ #include <cmath>
+ #include <memory>
+ static_cast<int64_t>(round(macroInstsArea_ * targetDensity_)); | |
+ static_cast<int64_t>(std::round(macroInstsArea_ * targetDensity_)); |
src/gpl2/src/placerObjects.cpp
Outdated
/////////////////////////////////////////////////////////////// | ||
// Instance | ||
Instance::Instance() | ||
: inst_(nullptr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'inst_' is redundant [modernize-use-default-member-init]
: inst_(nullptr), | |
: , |
Earlier it was reported the runtime difference to be minimal but 0:57.70 vs 1:33.49 is more substantial. Is this expected? |
Earlier measurements were done when some parts was still using native CUDA and using different design ( I'd expect, it should be possible to achieve similar runtime using Kokkos, This results might suggest, that there are some unnecessary memory copies between host/device, but this needs to be investigated further. |
Please try to get a more precise measure of the runtime difference as this is important in deciding whether Kokkos is a good alternative to direct CUDA coding. Do all the various versions produce the same result? That is also important. |
What was the thinking behind making kokkos a dependency but kokkos-fft a submodule? It seems like they could both be build dependencies (and added to the DependencyInstaller with an option). |
I think I would say direct CUDA coding isn't really a viable option. I would be personally opposed to its inclusion. I think Kokkos or something like it is the only viable path forward. The runtime differences don't look significant if you compare it to the overall speedup achieved. We're going for a pragmatic path forward, and to me this meets my bar for the goals we set out.
Agree that this is important to check. We may need to order the floats to get identical/sufficiently similar results. |
You personally pushed for the inclusion of gpuSolver.cu and said its was valuable as a template for future development. Shall we delete it? I was never in favor. A 50% overhead is worth exploring to at least understand if not eliminate. |
I think that seems like the right move at this point. With more time and context I don't think it's viable for us to maintain two codebases.
+1 I just want to point out if this is the fastest we could go that seems fast enough for me. |
No they don't and it was quite surprising, as I expected that original code and Kokkos with CUDA backend will produce the same result. NVCC should do pre-processing and compilation for device code and produce CUDA binary and it should leave host code for host compiler. We checked that when I suspect that this issue isn't only related to Eigen: when I disabled initial placement, runtime of Kokkos and original code were almost the same, but results were still different (I haven't investigated reason for this).
kokkos-fft is header only interface library that translates FFT calls into proper backend by detecting enabled backends in Kokkos, but I agree, if preferred, both kokkos and kokkos-fft could be dependencies.
I think this overhead is due to different initial placement, when initial placement is disabled runtime is very similar:
I also did precise measurements using RTX 3080, 8 vCPU i9-12900 @ 2.42 GHz and 32GB of RAM with 10 runs using
|
Thanks for the analysis. It would be good to get to the bottom of the difference as it will make regression testing hard otherwise. Is |
Arguments that are passed to |
another possibility is that it is invoking a different g++ binary from another path |
Converted to a draft due to no progress. |
04d428f
to
925dd93
Compare
I've rebased this branch onto latest
I've found that to not be the case. Early, I've recreated the same condition (where Eigen was running slowly) using
To prioritize merging of GPU-accelerated placement, the focus was to get the branch issue-free before optimizing. In my testing, Kokkos-based algorithm on Future / subsequent work:
|
I added a configuration option to |
I would prefer to see kokkos as part of the dependency installer rather than as a submodule. There should be no need to compile it for each workspace on a machine. |
With the current setup, it would be possible to support both compilation schemes, with the priority set towards the |
If someone wants to put a local copy in-tree that's fine but I'd like to avoid having a submodule. |
I'll add support for |
072e3b1
to
2dcac77
Compare
2dcac77
to
960ec72
Compare
I've added nested parallelism to the most time consuming kernel -
Additionally, a concern was raised wrt. non-deterministic results that are returned from Kokkos, depending on the compute device used for processing. To validate the flow, each variant was subjected to a run from syntheis to the Test subjects were:
Metrics collected were taken from the final report and log, and were:
Results:
|
Very nice! How is the cpu vs gpu runtime with your latest changes? Is this ready for review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There were too many comments to post at once. Showing the first 25 out of 45. Check the log or trigger a new build to see more.
src/gpl2/src/dct.h
Outdated
/////////////////////////////////////////////////////////////////////////////// | ||
#include <Kokkos_Core.hpp> | ||
|
||
void dct_2d_fft(const int M, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
void dct_2d_fft(const int M, | |
void dct_2d_fft(int M, |
src/gpl2/src/dct.h
Outdated
#include <Kokkos_Core.hpp> | ||
|
||
void dct_2d_fft(const int M, | ||
const int N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'N' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
const int N, | |
int N, |
src/gpl2/src/dct.h
Outdated
const Kokkos::View<Kokkos::complex<float>*>& fft, | ||
const Kokkos::View<float*>& post); | ||
|
||
void idct_2d_fft(const int M, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
void idct_2d_fft(const int M, | |
void idct_2d_fft(int M, |
src/gpl2/src/dct.h
Outdated
const Kokkos::View<float*>& post); | ||
|
||
void idct_2d_fft(const int M, | ||
const int N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'N' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
const int N, | |
int N, |
src/gpl2/src/dct.h
Outdated
const Kokkos::View<float*>& ifft, | ||
const Kokkos::View<float*>& post); | ||
|
||
void idxst_idct(const int M, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
void idxst_idct(const int M, | |
void idxst_idct(int M, |
src/gpl2/src/placerObjects.cpp
Outdated
densityScale_(0.0), | ||
haloWidth_(0), | ||
type_(InstanceType::FILLER), | ||
isFixed_(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'isFixed_' is redundant [modernize-use-default-member-init]
isFixed_(false) | |
src/gpl2/src/placerObjects.cpp
Outdated
int lx = 0.0; | ||
int ly = 0.0; | ||
inst->getLocation(lx, ly); | ||
int ux = lx + floor(bbox->getDX() / 2) * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: result of integer division used in a floating point context; possible loss of precision [bugprone-integer-division]
int ux = lx + floor(bbox->getDX() / 2) * 2;
^
src/gpl2/src/placerObjects.cpp
Outdated
int ly = 0.0; | ||
inst->getLocation(lx, ly); | ||
int ux = lx + floor(bbox->getDX() / 2) * 2; | ||
int uy = ly + floor(bbox->getDY() / 2) * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: result of integer division used in a floating point context; possible loss of precision [bugprone-integer-division]
int uy = ly + floor(bbox->getDY() / 2) * 2;
^
src/gpl2/src/placerObjects.cpp
Outdated
} | ||
} | ||
|
||
void Instance::dbSetPlacementStatus(odb::dbPlacementStatus ps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: the parameter 'ps' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
void Instance::dbSetPlacementStatus(odb::dbPlacementStatus ps) | |
void Instance::dbSetPlacementStatus(const odb::dbPlacementStatus& ps) |
src/gpl2/src/placerObjects.h:105:
- void dbSetPlacementStatus(odb::dbPlacementStatus ps);
+ void dbSetPlacementStatus(const odb::dbPlacementStatus& ps);
src/gpl2/src/placerObjects.cpp
Outdated
//////////////////////////////////////////////////////// | ||
// Pin | ||
Pin::Pin() | ||
: pin_(nullptr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'pin_' is redundant [modernize-use-default-member-init]
: pin_(nullptr), | |
: , |
Yes, it's ready for review. I've applied the suggested clang-tidy fixes and added the missing RockyLinux9 package. The performance difference between
The test setup is an Intel i7-8700 and a NVIDIA GTX 1080Ti |
a1b101b
to
1d136de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
src/gpl2/src/placerObjects.cpp
Outdated
//////////////////////////////////////////////////////////////////////////////////////////////// | ||
// Net | ||
Net::Net() | ||
: net_(nullptr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'net_' is redundant [modernize-use-default-member-init]
: net_(nullptr), | |
: , |
src/gpl2/src/placerObjects.cpp
Outdated
// Net | ||
Net::Net() | ||
: net_(nullptr), | ||
netId_(-1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'netId_' is redundant [modernize-use-default-member-init]
netId_(-1), | |
, |
src/gpl2/src/placerObjects.cpp
Outdated
ly_(0), | ||
ux_(0), | ||
uy_(0), | ||
isDontCare_(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'isDontCare_' is redundant [modernize-use-default-member-init]
isDontCare_(false), | |
, |
src/gpl2/src/placerObjects.cpp
Outdated
ux_(0), | ||
uy_(0), | ||
isDontCare_(false), | ||
virtualWeight_(0.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'virtualWeight_' is redundant [modernize-use-default-member-init]
virtualWeight_(0.0), | |
, |
src/gpl2/src/placerObjects.cpp
Outdated
uy_(0), | ||
isDontCare_(false), | ||
virtualWeight_(0.0), | ||
weight_(1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member initializer for 'weight_' is redundant [modernize-use-default-member-init]
weight_(1.0) | |
src/gpl2/src/poissonSolver.h
Outdated
Kokkos::View<float*> electroForceY); | ||
|
||
// Compute Potential Only (not Electric Force) the row-major order | ||
void solvePoissonPotential(const Kokkos::View<float*> binDensity, Kokkos::View<float*> potential); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter 'binDensity' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]
void solvePoissonPotential(const Kokkos::View<float*> binDensity, Kokkos::View<float*> potential); | |
void solvePoissonPotential(Kokkos::View<float*> binDensity, Kokkos::View<float*> potential); |
src/gpl2/src/routeBase.cpp
Outdated
// RouteBase | ||
|
||
RouteBase::RouteBase() | ||
: rbVars_(), db_(nullptr), grouter_(nullptr), nbc_(nullptr), log_(nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: initializer for member 'rbVars_' is redundant [readability-redundant-member-init]
: rbVars_(), db_(nullptr), grouter_(nullptr), nbc_(nullptr), log_(nullptr) | |
: , db_(nullptr), grouter_(nullptr), nbc_(nullptr), log_(nullptr) |
src/gpl2/src/routeBase.cpp
Outdated
RouteBase::RouteBase(RouteBaseVars rbVars, | ||
odb::dbDatabase* db, | ||
grt::GlobalRouter* grouter, | ||
std::shared_ptr<PlacerBaseCommon> nbc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: the parameter 'nbc' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
std::shared_ptr<PlacerBaseCommon> nbc, | |
const std::shared_ptr<PlacerBaseCommon>& nbc, |
src/gpl2/src/routeBase.cpp
Outdated
nbVec_ = std::move(nbVec); | ||
} | ||
|
||
RouteBase::~RouteBase() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: use '= default' to define a trivial destructor [modernize-use-equals-default]
src/gpl2/src/routeBase.cpp:98:
- {
- }
+ = default;
} | ||
|
||
TimingBase::TimingBase(std::shared_ptr<PlacerBaseCommon> nbc, | ||
rsz::Resizer* rs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: the parameter 'nbc' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
rsz::Resizer* rs, | |
TimingBase::TimingBase(const std::shared_ptr<PlacerBaseCommon>&const ingBase::TimingBase(std::shared_p&tr<PlacerBaseCommon> nbc, | |
rsz::Resizer* rs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS "19") | ||
link_libraries(stdc++) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What necessitates this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After bumping clang to version 19, it started defaulting to linking against libc++
in CUDA code. It only affected gpl2
; there was a missing libstdc++
definiton when linking only this specific module. Should I add a CUDA/GPL2 conditional there as well?
@@ -121,6 +122,9 @@ while [ "$#" -gt 0 ]; do | |||
echo "${1} requires an argument" >&2 | |||
_help | |||
;; | |||
-use_gpl2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a description to the usage message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
etc/DependencyInstaller.sh
Outdated
@@ -76,6 +76,7 @@ _installCommonDev() { | |||
gtestChecksum="a1279c6fb5bf7d4a5e0d0b2a4adb39ac" | |||
bisonVersion=3.8.2 | |||
bisonChecksum="1e541a097cda9eca675d29dd2832921f" | |||
kokkosfftVersion="2c616d29a7ad0c390259efeb9224115bfa6910fd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this using a release tag and checksum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It accomplishes the same goal in a single defintion and eliminates the need to hash the directory. If it's preferred to keep convention, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not as there is no easy way to know what release (if any) we are using from a commit id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a version tag. Other dependencies managed by git were not getting hashed either, as it would require to tar
the directory.
# Older version of g++ is needed for compatibility with NVCC | ||
ARGS_KOKKOSFFT+=" -DKokkos_ENABLE_CUDA=ON -DCMAKE_CXX_COMPILER=g++-10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html suggests it works with many default compiler versions. Is this really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVlabs/instant-ngp#119 presents the same issue as I have encountered. Ubuntu is using the supposedly supported gcc 11.4, but it does not compile Kokkos properly.
# Older version of g++ is needed for compatibility with NVCC | ||
ARGS_KOKKOSFFT+=" -DKokkos_ENABLE_CUDA=ON -DCMAKE_CXX_COMPILER=g++-10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you enable both openmp and cuda in one build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will default to CUDA if it's detected but both can be compiled in at once.
src/gpl2/LICENSE
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for a separate LICENSE file here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
src/gpl2/include/gpl2/DgReplace.h
Outdated
// The three main functions | ||
void doInitialPlace(); | ||
int doNesterovPlace(int start_iter = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the comment, it was here since the CUDA-native implementation, there's no clear indication what would've been the third function.
src/gpl2/include/gpl2/DgReplace.h
Outdated
// We should only have one placerBaseCommon, timingBase and routeBase | ||
// But we need multiple placerBases to handle fences and multiple domains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to have become separate from its context (move down 4 lines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you author this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally written by @ZhiangWang033 (as can be seen in the commit history), then ported to Kokkos by the other committers (including me).
src/gpl2/src/densityOp.h
Outdated
int coreUx_; | ||
int coreUy_; | ||
|
||
// We need to store all the statictis information for each bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: statictis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
974d4c0
to
e0f9b1f
Compare
Currently, the mainline |
Co-authored-by: Kamil Rakoczy <krakoczy@antmicro.com> Co-authored-by: Jan Bylicki <jbylicki@antmicro.com> Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com> Signed-off-by: Kamil Rakoczy <krakoczy@antmicro.com> Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
…ackends Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>
Removes interpretation of LayoutRight data as LayoutLeft Fixes `input and output extents must be the same except for the transform axis` gpl2 error caused by incomplete workaround against differing default 2d data layouts beetween CUDA and CPU. I considered alternative approches (such as getting rid of LayoutRight specific code entirely) but they turned out to be unproportionaly complex. Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Always calculate fft on host to avoid differing results between impls NOTE: This may have some performance repercussions. Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Device and host may use different implementations of math functions giving different results which is not desirable in OpenROAD The fix relies on (possibly wrong) assumption, that the error of double precision built-in function is less than precision of float. Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Replace non-deterministic paralel reduces with serial loops that give same result regardless of platform NOTE: This change results in serious performance degradation Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
computeWeightedHPWL() suffered from implicit lossy (in case of big numbers) conversion of int64_t to float. After fixing it, the summation can be made parallel without introducing inconsistencies between kokkos configurations. NOTE: the computeHPWL() never suffered from this issue, but I had excesivly deparalelized it before. In this fix, I added safe-guards to both computeWeightedHPWL() and computeHPWL() for consistency. Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Replace dNetWith and dNetHeight with single dNetWidthPlusHeight The time improvement is small (it could be measurement error as well) Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Serial code is order of magnitude slower to execute on GPU than on CPU Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
I'm a bit suprised, but this simple change reduced time from 2m38s to 2m30s Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Calculate individual distances in parallel, then sum them serially Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Before, we executed serial reduction for X and Y separatly. Now we have parallel calculation of view with abs(X)+abs(Y), and one serial reduction of it. Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
NOTE: I used static vars for storing plans. While simple and convienient, it assumes that N and M won't change between calls (changing them would result in runtime error) Signed-off-by: Szymon Gizler <sgizler@antmicro.com>
d841e44
to
76e5a69
Compare
This MR converts DG-RePlAce algorithm that was originally written for CUDA to Kokkos.
Kokkos provides abstraction for writing parallel code that can be translated into several backends including CUDA, OpenMP and C++ threads.
Tested on single run with RTX 3090 and i7-8700 CPU @ 3.20GHz using
ariane133
design.