Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highs with python-mip is slow #372

Open
colonetti opened this issue Feb 22, 2024 · 5 comments
Open

Highs with python-mip is slow #372

colonetti opened this issue Feb 22, 2024 · 5 comments

Comments

@colonetti
Copy link

colonetti commented Feb 22, 2024

Hello, everyone

Firstly, thanks for the great package and also for adding support for Highs.

However, it seems that creating a optimization model with Highs is rather slow.

Describe the bug
Adding variables to a Highs optimization model is slow.

To Reproduce

import sys
import mip as pm
from timeit import default_timer as dt

def main(solver):
    m = pm.Model(solver_name=solver)
    ini = dt()
    x = {i: m.add_var() for i in range(10000)}
    print(f"{dt() - ini} seconds to add vars", flush=True)

if __name__ == '__main__':
    main(sys.argv[1])

Expected behavior
GRB: 0.021987227955833077 seconds to add vars
CBC: 0.026054809102788568 seconds to add vars
Highs: 6.408406416885555 seconds to add vars

Desktop (please complete the following information):

  • Operating System, version: Ubuntu 23.10
  • Python version: 3.11.6
  • Python-MIP version (we recommend you to test with the latest version): 1.16tc0

Thanks again!

@rschwarz
Copy link
Contributor

rschwarz commented Feb 25, 2024

I have not looked into the code in a long while, but since the above call to add_var passes neither a name nor a variable type (integrality), the python-mip wrapper doesn't do much beyond calling Highs_addCol in a loop (see https://github.com/coin-or/python-mip/blob/master/mip/highs.py#L809). I'm not sure why this takes a lot of time, or how this could be avoided, because models in python-mip are created incrementally, not in bulk (vectorized), right? :-/

EDIT: Maybe the call to Highs_getNumCol is expensive and we could cache this count on the Python side. See also the implementation in the Julia/MOI wrapper for comparison: https://github.com/jump-dev/HiGHS.jl/blob/master/src/MOI_wrapper.jl#L738

@colonetti
Copy link
Author

I have not looked into the code in a long while, but since the above call to add_var passes neither a name nor a variable type (integrality), the python-mip wrapper doesn't do much beyond calling Highs_addCol in a loop (see https://github.com/coin-or/python-mip/blob/master/mip/highs.py#L809). I'm not sure why this takes a lot of time, or how this could be avoided, because models in python-mip are created incrementally, not in bulk (vectorized), right? :-/

EDIT: Maybe the call to Highs_getNumCol is expensive and we could cache this count on the Python side. See also the implementation in the Julia/MOI wrapper for comparison: https://github.com/jump-dev/HiGHS.jl/blob/master/src/MOI_wrapper.jl#L738

Thanks, @rschwarz. I'll take a look into it.

@metab0t
Copy link

metab0t commented May 29, 2024

@rschwarz

The solution is to avoid using Highs_passColName and Highs_passRowName. They are quite expensive and should not be called for each variable/column.

The name of variable is set automatically at

name = "var({})".format(len(self.__vars))
although the user does not specify the name.

I met the same problem in PyOptInterface and decide to maintain the name mapping internally: metab0t/PyOptInterface@0074dc0

Update: A fix is proposed at ERGO-Code/HiGHS#1782 to solve this problem.

@rschwarz
Copy link
Contributor

Since the fix (ERGO-Code/HiGHS#1782) has been merged in the meantime, I wanted to reevaluate the issue.

On my machine, with python-mip on master, and highspy==1.8.1, I can observe the following time measurements (repeated 5 times):

$ python test372.py highs
Running HiGHS 1.8.1 (git hash: 4a7f24a): Copyright (c) 2024 HiGHS under MIT licence terms
0.032399341000200366 seconds to add vars
0.024648570000408654 seconds to add vars
0.024442776999876514 seconds to add vars
0.034494724000069255 seconds to add vars
0.02395381099995575 seconds to add vars

$ python test372.py cbc
0.021088200000122015 seconds to add vars
0.021199468999839155 seconds to add vars
0.012396355999953812 seconds to add vars
0.012266616000033537 seconds to add vars
0.01953656100022272 seconds to add vars

So there is still a difference between Cbc and HiGHS, but it's arguably less than the variation within each solver's run.

When I increase the number of variables by another factor 10, the timings become less varied and there is a more obvious factor 2 from Cbc to HiGHS.

That means that the overhead can maybe still be improved (maybe by caching the names on the Python side) but the order of magnitude is now OK.

@rschwarz
Copy link
Contributor

What is more striking to me is that the test script above doesn't even set a name for the variables. The default value for name in Model.add_var is "", so I thought by doing if name: in SolverHighs.add_var, I would avoid the unnecessary work of Highs_passColName.

But, it turns out that a name is always provided, using this logic:
https://github.com/coin-or/python-mip/blob/master/mip/lists.py#L39-L40

I find this a bit strange, because if generic names are needed, eg, to export a problem from the solver to .mps or .lp, the solver will usually use row and column indices to create them, right?

In any case, removing this default name logic will reduce the time for HiGHS by about a third. This sounds like a lot, but in a more realistic use case that actually builds expressions for constraints etc, I don't think it would make a big difference.

In summary, I'd consider this issue closed (@colonetti).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants