Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for GPU solvers via pyamgx #567

Merged
merged 38 commits into from
Jul 17, 2018
Merged

Conversation

shwina
Copy link

@shwina shwina commented Apr 28, 2018

@guyer @tkphd

Currently this PR is incomplete, but I am opening it for early feedback and for your thoughts on whether this is the right direction to take.

Summary

  • Adds new solvers fipy.solvers.pyamgx.LinearCGSSolver and fipy.solvers.pyamgx.LinearGMRESSolver. Both are subclasses of fipy.solvers.pyamgx.PyAMGXSolver. Currently, the preconditioners are "baked-in", but I will extend this PR to make them configurable.

  • Adds support for the command-line option --pyamgx or the env variable FIPY_SOLVERS=pyamgx.

Important

One controversial proposed change is the addition of "empty" __enter__ and __exit__ methods to the Solver base-class. This enables one to write:

with LinearCGSSolver() as solver:
    eq.solve(var=phi, solver=solver)

For all other solvers than pyamgx, this is equivalent to:

eq.solve(var=phi, solver=LinearCGSSolver())

This change would not break any existing code, as the above code is still valid.

However, for the pyamgx solvers, the former is preferred. For an understanding of why, see the __init__ and __exit__ methods of the PyAMGXSolver class. As you can see, attributes of this class need to be explicitly cleaned up by calling their destroy() methods. with ensures that this cleanup is done automatically.

By adding empty __enter__ and __exit__ methods to the Solver class, users can write code that works for the pyamgx backend as well as other available solvers.

I hope this is clear, and that this is not too intrusive! I'm open to any alternate approaches.

@wd15
Copy link
Contributor

wd15 commented Apr 30, 2018

Thanks for the pull request. Please do ask questions about FiPy if there are things we can help you with.

I think that having the __enter__ and __exit__ methods is fine. There are probably other places where that should be used in FiPy. FiPy was written before Python had that functionality.

I'm not sure why the tests are failing. The latest version of develop, 43bbbd6, has all the tests passing currently, https://travis-ci.org/usnistgov/fipy/builds/359546736. Did your branch start from there?

Please do also add a --pyagmx test option to .travis.yml. Remember that you can run a shorter version of the test using python setup.py test --modules. That doesn't run the examples, which makes the tests much faster. This can be helpful during the development cycle.

On Travis, it may be easier to just have a python setup.py test --modules --pyamgx test (the short version). We currently have a lot of untested options on Travis as its difficult to test everything in a timely way. We can help with that once you're happy with the pull request. You can also have a doctest for pyamgx, which can be switched on or off depending on whether pyamgx is available. Switching a test on or off can be done with register_skipper, see https://github.com/usnistgov/fipy/blob/develop/fipy/tests/doctestPlus.py#L96 and https://github.com/usnistgov/fipy/blob/develop/fipy/variables/distanceVariable.py#L96 for an example of how to use it.

@shwina
Copy link
Author

shwina commented May 1, 2018

Thanks, @wd15 !

I'm not sure why they are failing either; it looks like develop is failing now unless I am overlooking something:

I created a fork of fipy, added an inconsequential commit on top of develop (shwina@5c85042224ff), and it fails:

https://travis-ci.org/shwina/fipy/builds/373447933

I can run tests locally using the command you provided (python setup.py test --modules), but that runs into a different issue. The doctest fipy.tools.dimensions.physicalField.physicalField.itemset fails with the following error:

    try:
        a.itemset(PhysicalField("6 ft"))
    except IndexError:
        # NumPy 1.7 has changed the exception type
        raise ValueError("can only place a scalar for an  array of size 1")
Expected:
    Traceback (most recent call last):
        ...
    ValueError: can only place a scalar for an  array of size 1
Got:
    Traceback (most recent call last):
      File "/software/anaconda/5.1.0/lib/python2.7/doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest fipy.tools.dimensions.physicalField.PhysicalField.itemset[4]>", line 2, in <module>
        a.itemset(PhysicalField("6 ft"))
      File "/home/atrikut/tmp/fipy/fipy/tools/dimensions/physicalField.py", line 616, in itemset
        self.value.itemset(value)
    ValueError: can only convert an array of size 1 to a Python scalar


---------------------------------

To make the test pass, I had to change the IndexError in https://github.com/usnistgov/fipy/blob/develop/fipy/tools/dimensions/physicalField.py#L601 to ValueError. Note that I am running NumPy 1.14.0 with Python 2.7.14.

To verify, could you please try triggering a build on Travis for the develop branch?

@guyer
Copy link
Member

guyer commented May 1, 2018

To verify, could you please try triggering a build on Travis for the develop branch?

No need. There's something broken with scipy (or the conda package or the way were using it). This is a recent change that I haven't had time to diagnose.

@shwina
Copy link
Author

shwina commented May 1, 2018

Thanks. Shall I proceed to work on this PR independent of this issue?

@guyer
Copy link
Member

guyer commented May 1, 2018

There's something broken with scipy

Probably related to this

@wd15
Copy link
Contributor

wd15 commented May 1, 2018 via email

@shwina
Copy link
Author

shwina commented May 3, 2018

I'm finally at the stage where all the tests (except 1, see #568) are passing (see attached log) BUT, I get a message about memory leaks when the AMGX library is finalized, and a dump of occupied memory blocks.

This is because, as described above, objects need to be explicitly destroyed, and this can be done with the with statement.

Before proceeding, I thought I'd ask for your opinion on how best to handle this. One seriously hacky way I have considered is to unregister the call to pyamgx.finalize(), which is currently registered with atexit. This will leak GPU memory (and possibly some CPU memory as well), but will not result in the error message and memory dump being printed.

test_results.txt

@wd15
Copy link
Contributor

wd15 commented May 4, 2018

@shwina, if you think that's the best approach then I'm supportive. As long as the change only impacts what you're already contributing then it's fine.

@shwina
Copy link
Author

shwina commented May 7, 2018

OK thanks! That worked quite well, and to confirm, yes: this leak will only happen when testing the pyamgx solvers.

@guyer
Copy link
Member

guyer commented May 8, 2018

Not ideal, but probably the best we can do right now. We should probably be exploring much more pervasive use of context managers in FiPy.

wd15 added 2 commits July 6, 2018 15:23
Address usnistgov#567

The tests pass when using solver choices similar to PySparse.

 - The Stoke's flow example tests are skipped for PyAMGX as they don't
   pass.
@wd15
Copy link
Contributor

wd15 commented Jul 9, 2018

@shinwa, there is a pull request for shinwa/fipy. This makes all the tests pass other than a physical field test unrelated to PyAMGX.

Make tests pass when using the pyamgx solvers
@wd15
Copy link
Contributor

wd15 commented Jul 9, 2018

@guyer, I'm happy with this now. Tests all pass other than fipy.tools.dimensions.physicalField.PhysicalField.itemset, which is most likely related to my setup.

@shwina
Copy link
Author

shwina commented Jul 9, 2018

@wd15

Forgive me if I'm missing something, but looking at the Travis log, it looks like fipy.tools.dimensions.physicalField.PhysicalField.itemset passes, but examples.flow.stokesCavity fails.

@wd15
Copy link
Contributor

wd15 commented Jul 9, 2018

@shwina, sorry, I committed a mistake. New pull request should fix it.

I think the physical field error was just related to my own setup and unrelated to pyamgx. It fails for all solvers.

Fix typo left behind from recent debugging
@shwina
Copy link
Author

shwina commented Jul 9, 2018

Sweet!

@wd15 wd15 assigned guyer and unassigned shwina Jul 10, 2018
@shwina
Copy link
Author

shwina commented Jul 10, 2018

I can confirm that all tests except fipy.tools.dimensions.physicalField.PhysicalField.itemset pass on my test environment (Python 2.7.15, NVIDIA Tesla V100 GPU, pyamgx master) with pyamgx solvers

@guyer
Copy link
Member

guyer commented Jul 11, 2018

What is the test failure for fipy.tools.dimensions.physicalField.PhysicalField.itemset?

@shwina
Copy link
Author

shwina commented Jul 11, 2018


======================================================================
FAIL: itemset (fipy.tools.dimensions.physicalField.PhysicalField)
Doctest: fipy.tools.dimensions.physicalField.PhysicalField.itemset
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/atrikut/.conda/envs/pyamgx2/lib/python2.7/doctest.py", line 2226, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for fipy.tools.dimensions.physicalField.PhysicalField.itemset
  File "/home/atrikut/local/fipy/fipy/tools/dimensions/physicalField.py", line 589, in itemset

----------------------------------------------------------------------
File "/home/atrikut/local/fipy/fipy/tools/dimensions/physicalField.py", line 599, in fipy.tools.dimensions.physicalField.PhysicalField.itemset
Failed example:
    try:
        a.itemset(PhysicalField("6 ft"))
    except IndexError:
        # NumPy 1.7 has changed the exception type
        raise ValueError("can only place a scalar for an  array of size 1")
Expected:
    Traceback (most recent call last):
        ...
    ValueError: can only place a scalar for an  array of size 1
Got:
    Traceback (most recent call last):
      File "/home/atrikut/.conda/envs/pyamgx2/lib/python2.7/doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest fipy.tools.dimensions.physicalField.PhysicalField.itemset[4]>", line 2, in <module>
        a.itemset(PhysicalField("6 ft"))
      File "/home/atrikut/local/fipy/fipy/tools/dimensions/physicalField.py", line 616, in itemset
        self.value.itemset(value)
    ValueError: can only convert an array of size 1 to a Python scalar

@shwina
Copy link
Author

shwina commented Jul 11, 2018

Related to #568 ?

@@ -128,3 +136,8 @@ def __init__(self, solver):
test=lambda: solver == 'pysparse',
why="the PySparse solvers are not being used.",
skipWarning=True)

register_skipper(flag='PYAMGX_SOLVER',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sematics for this skipper are backwards. The other skippers, e.g. SCIPY_SOLVER, are for when the item is being used, not when it's not.

Simplest fix is rename this to NOT_PYAMGX_SOLVER.

@guyer guyer assigned wd15 and shwina and unassigned guyer Jul 13, 2018
@guyer
Copy link
Member

guyer commented Jul 16, 2018

@shwina @wd15 @tkphd
If somebody will confirm that they've successfully run the tests with the NOT_PYAMGX_SOLVER flag, I will merge this pull request.

@shwina
Copy link
Author

shwina commented Jul 16, 2018

@guyer

I can confirm that all tests but fipy.tools.dimensions.physicalField.PhysicalField.itemset pass with the pyamgx solvers.

The NOT_PYAMGX_SOLVER skipper works as those 3 tests are skipped when pyamgx is used, but not otherwise.

@guyer guyer merged commit 85d93ed into usnistgov:develop Jul 17, 2018
@shwina shwina deleted the pyamgx branch July 17, 2018 12:58
@tkphd
Copy link
Contributor

tkphd commented Jul 18, 2018

Just got a chance to run this (from the merged develop branch):

$ python2 setup.py test
----------------------------------------------------------------------
Ran 545 tests in 89.708s

OK
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Skipped 2 doctest examples because `lsmlib` must be used to run some tests
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$ python2 setup.py test --pyamgx
----------------------------------------------------------------------
Ran 545 tests in 161.911s

OK
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Skipped 2 doctest examples because `lsmlib` must be used to run some tests
Skipped 2 doctest examples because the PySparse solvers are not being used.
Skipped 3 doctest examples because the PyAMGX solver is being used.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

GTX 1070 GPU peaked at 20% Volatile utilization and 550MB, indicating that the work was indeed done on the card. Thanks for making this happen, @shwina!

@shwina
Copy link
Author

shwina commented Jul 18, 2018

@tkphd - awesome! Looking forward to trying the GPU for more complex/larger problems!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants