Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsec_advise_data_on_device for zpotrf_L #118

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

QingleiCao
Copy link

@QingleiCao QingleiCao commented Jun 7, 2024

These are the performance comparisons.

image

image

I changed to Lower in Cholesky testing and will create an issue to track why Upper behaves differently.

@QingleiCao QingleiCao requested a review from a team as a code owner June 7, 2024 13:39
src/zpotrf_L.jdf Outdated Show resolved Hide resolved
src/zpotrf_wrapper.c Outdated Show resolved Hide resolved
@@ -18,7 +18,8 @@ int main(int argc, char ** argv)
{
parsec_context_t* parsec;
int iparam[IPARAM_SIZEOF];
dplasma_enum_t uplo = dplasmaUpper;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to undo that at the end when the U is also done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you map the A matrix on lower ? You don't need to alter this variable, instead just call the function with dplasmaLower and add a comment on why you choose to do so.

@QingleiCao
Copy link
Author

I uploaded the performance results. BTW, I updated to the latest PaRSEC.

@bosilca
Copy link
Contributor

bosilca commented Jun 18, 2024

I understand the performance benefits but I have many concerns about the approach. We (or at least I) claimed for a long time that JDF algorithms were expressing the dataflow and were independent of other platform related constraints. This PR is clearly at odds with this statement, and I'm not sure we want to go that route.

@QingleiCao
Copy link
Author

As discussed, these changes will be moved to the wrapper to keep the JDF clean.

@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 2 times, most recently from c67b4b5 to 6743a5a Compare September 5, 2024 02:03
@QingleiCao
Copy link
Author

How about adding this feature in parsec instead of in dplasma?

@bosilca
Copy link
Contributor

bosilca commented Sep 5, 2024

Can you give us a hint on how that would look in parsec ?

@QingleiCao
Copy link
Author

Can you give us a hint on how that would look in parsec ?

I'm thinking of providing this feature in PaRSEC like apply with several predefined functions (like operation in apply). Then, users can call that in the wrapper or other places, as this advice_device needs to be called only once (if my understanding is correct) and no change is needed in JDF

@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 2 times, most recently from 9bfc9fc to 78f8740 Compare September 27, 2024 12:12
@QingleiCao QingleiCao force-pushed the qinglei/potrf_gpu_advice branch 3 times, most recently from e0a77ea to 252d978 Compare September 27, 2024 17:22
Copy link
Contributor

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other programming paradigms call this a mapper. It would be nice if we reuse a concept that is familiar to [at least some] users.

#if defined(DPLASMA_HAVE_CUDA) || defined(DPLASMA_HAVE_HIP)

/* Find all devices */
void dplasma_find_nb_devices(int **dev_index, int *nb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a generic name suggesting it gets all devices when internally it only selects the accelerators. Please change the name, and please add also support for ZE.

if((*nb) == 0) {
char hostname[256];
gethostname(hostname, 256);
fprintf(stderr, "No CUDA device found on rank %d on %s\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have proper methods in parsec to output warnings. Please use parsec_warning instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think it is a good idea to return low level memory upstream, the ownership is unclear. As here we are talking about a very small amount of memory, I would suggest requiring the caller to provide an array of the size parsec_nb_devices into this call, and this function then does a single loop to fill this array up. All memory ownership is then on the caller size.

dplasma_find_nb_devices(&args->gpu_device_index, &args->nb_gpu_devices);

/* Calculate the nested grid for the multiple GPUs on one process
* gpu_rows >= gpu_cols and as square as possible */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation is screwed. Please fix.

@@ -18,7 +18,8 @@ int main(int argc, char ** argv)
{
parsec_context_t* parsec;
int iparam[IPARAM_SIZEOF];
dplasma_enum_t uplo = dplasmaUpper;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you map the A matrix on lower ? You don't need to alter this variable, instead just call the function with dplasmaLower and add a comment on why you choose to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants