Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving TimerAlreadyCanceledException in TwoPhaseHASCO when running MLPlan #259

Open
fmohr opened this issue Jul 6, 2021 · 1 comment
Labels

Comments

@fmohr
Copy link
Member

fmohr commented Jul 6, 2021

Observing this error when running MLPlan in cluster experiments:

	Error message: Timer already cancelled.
	Error trace:
		java.util.Timer.sched(Timer.java:397)
		java.util.Timer.scheduleAtFixedRate(Timer.java:328)
		ai.libs.jaicore.concurrent.TrackableTimer.scheduleAtFixedRate(TrackableTimer.java:135)
		ai.libs.hasco.twophase.TwoPhaseHASCO.nextWithException(TwoPhaseHASCO.java:195)
		ai.libs.jaicore.basic.algorithm.AOptimizer.call(AOptimizer.java:134)
		ai.libs.jaicore.components.optimizingfactory.OptimizingFactory.nextWithException(OptimizingFactory.java:63)
		ai.libs.jaicore.components.optimizingfactory.OptimizingFactory.call(OptimizingFactory.java:80)
		ai.libs.mlplan.core.MLPlan.nextWithException(MLPlan.java:258)
		ai.libs.mlplan.core.MLPlan.call(MLPlan.java:291)
		naiveautoml.experiments.NaiveAutoMLExperimentRunner.evaluate(NaiveAutoMLExperimentRunner.java:217)
		ai.libs.jaicore.experiments.ExperimentRunner.conductExperiment(ExperimentRunner.java:217)
		ai.libs.jaicore.experiments.ExperimentRunner.lambda$randomlyConductExperiments$0(ExperimentRunner.java:104)
		java.lang.Thread.run(Thread.java:748)

Logs show that this stack trace is immediately followed by an indication of memory overflow:

java.lang.OutOfMemoryError: Java heap space

One dataset where this occured was the DNA dataset (https://www.openml.org/d/40670) using 24G memory.

The following message directly preceding the exception suggests that the error occurred when training a BayesNet:

2021-06-01 17:22:03.846 [ORGraphSearch-worker-1] INFO executor - Fitting the learner (class: ai.libs.mlplan.core.TimeTrackingLearnerWrapper) ai.libs.mlplan.core.TimeTrackingLearnerWrapper -
2021-06-01 17:23:03.691 [Global Timer] INFO InterruptionTimerTask - Executing interruption task 1293092700 with descriptor "Timeout for timed computation with thread Thread[ORGraphSearch-wo
2021-06-01 17:23:03.693 [Global Timer] INFO Interrupter - Interrupting Thread[ORGraphSearch-worker-1,5,main] on behalf of Thread[Global Timer,10,main] with reason InterruptionTimerTask [thr
2021-06-01 17:23:03.694 [Global Timer] INFO Interrupter - Interrupt accomplished. Interrupt flag of Thread[ORGraphSearch-worker-1,5,main]: true
2021-06-01 17:23:03.833 [Global Timer] INFO InterruptionTimerTask - Executing interruption task 1024325039 with descriptor "Timeout for timed computation with thread Thread[ORGraphSearch-wo
2021-06-01 17:23:03.834 [Global Timer] INFO Interrupter - Interrupting Thread[ORGraphSearch-worker-1,5,main] on behalf of Thread[Global Timer,10,main] with reason InterruptionTimerTask [thr
2021-06-01 17:23:03.835 [Global Timer] INFO Interrupter - Interrupt accomplished. Interrupt flag of Thread[ORGraphSearch-worker-1,5,main]: true

The question is really whether this can be avoided without spawning external processes.

@fmohr fmohr added the question label Jul 6, 2021
@fmohr
Copy link
Member Author

fmohr commented Jul 6, 2021

Thinking more about this, I believe that there is really no solution to this problem except of spawning a new process. The problem with new processes is though that one needs to block a good deal of memory for each of them to avoid problems. This can easily become a total waste of resources.

Probably the best solution is to introduce an option that allows to run ML-Plan in process mode if there is an anticipated risk of memory overflows.

Then, more generally, it would be cool to add the opportunity to the process project of AILibs that allows to execute objects that implement both Callable<T extends Serializable> and Serializable in a separate process with specific resource limitations. One could then have a general executor for such operations that serializes the object to be executed and launches a new JVM with a general executor that unserializes the object, calls it, and serializes the T into some output file, which can then again be unserialized by the original process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant