Implicit Problem Aggregators #7738
Replies: 2 comments 1 reply
-
What is the reasoning for switching away from the solution that we discussed last week into the 'global implicit' approach? I don't see much gain in this approach versus the one from last week. For the sake of comments, let's call this approach 'implicit' and the one discussed last week 'argument-based'.
How do we guarantee that there is always at least one entry? Who's responsibility is to guarantee that? I like that with the 'argument-based' approach, the API user has to construct a problem aggregator to pass down the API call, so there the API gives us a guarantee that indeed the aggregator is constructed. Here I don't see any guarantee apart from 'we try to remember about it'.
What if I forget to pop off an element from the stack?
This is really more complicated than the other approach. With the 'argument-based' approach, I'd just create a nested aggregator (+- a 'push') and I do not need to care about a 'pop' - the whole structure is summarized once the problem summary is requested. The structure is maintained automatically by passing a nested instance as an argument. There is no risk of forgetting to pop it. No added complexity of exceptions.
Most of the time the Warnings API is sufficient in Enso. We have a I don't understand the reasons behind a switch towards the solution described above. I thought the solution we discussed last week was good enough. I really liked it and found it very elegant. This new solution is IMO not at all as elegant - it requires more work with maintaining the 'stack'. It will not port well to possible multi-threaded scenarios due to the global state. The benefit of this seems to be that we don't need to pass any problem aggregator explicitly. But IMO that is a disadvantage - this creates a risk of unhandled problems. The 'argument-based' approach was giving us assurance that if my function holds a reference to a problem aggregator, someone up the call-chain will summarize these problems and so will handle them. This does no longer have this, and makes the problem handling much more implicit. I don't think using global state is a good tool for this job. Problem handling should be done in some explicit manner IMO. |
Beta Was this translation helpful? Give feedback.
-
Here's how I've been imagining the idea that we discussed together last week could work: package org.enso.table.parsing.problems.proto;
import org.enso.table.data.column.storage.Storage;
import org.enso.table.data.column.storage.type.FloatType;
import org.enso.table.data.column.storage.type.IntegerType;
import org.enso.table.data.column.storage.type.StorageType;
import org.enso.table.data.table.Column;
import org.enso.table.data.table.problems.ArithmeticOverflow;
import org.enso.table.data.table.problems.FloatingPointGrouping;
import org.enso.table.problems.Problem;
import java.util.ArrayList;
import java.util.List;
public class ProblemAggregator {
protected List<Problem> directlyReportedProblems = new ArrayList<>();
protected List<ProblemAggregator> children = new ArrayList<>();
// Called by any processing code to report a simple problem. Specialized implementations may be available too.
public void report(Problem problem) {
directlyReportedProblems.add(problem);
}
// Called by the top-level user after all processing is completed, to summarize problems that happened.
public List<Problem> summarize() {
List<Problem> result = new ArrayList<>();
result.addAll(directlyReportedProblems);
for (ProblemAggregator child : children) {
result.addAll(child.summarize());
}
return result;
}
// The simple constructor is private, so children need to use one that specifies the parent, thus guaranteeing that a parent exists.
private ProblemAggregator() {}
// The constructor to use for inheritors, that guarantees that it is attached to a parent.
protected ProblemAggregator(ProblemAggregator parent) {
parent.children.add(this);
}
/* This should only be used by top-level code, and any call to this method should be paired up with a summarize
* call that translates problems from Java to Enso.
*
* We can guarantee this by relying on an Enso wrapper:
* with_problem_aggregator on_problems code =
* agg = ProblemAggregator.makeTopLevelAggregator
* result = code agg
* on_problems.attach_problems_before (Java_Problems.translate agg.summarize) result
*
* No function other than `with_problem_aggregator` should ever call `makeTopLevelAggregator`, to guarantee correctness.
* Usage:
* with_problem_aggregator Problem_Behavior.Report_Warning problem_aggregator->
* Java_Helper.call args... problem_aggregator
*/
public static ProblemAggregator makeTopLevelAggregator() {
return new ProblemAggregator();
}
}
// We can use a more specific aggregator for specific tasks, that will contain a more efficient implementation.
public class ArithmeticOperationProblemAggregator extends ProblemAggregator {
public ArithmeticOperationProblemAggregator(ProblemAggregator parent) {
super(parent);
}
private long overflowCount = 0;
private Object[] overflowExample = null;
private StorageType overflowTargetType = null;
public void reportOverflow(StorageType targetType, long x, String op, long y) {
overflowCount++;
if (overflowTargetType == null) {
overflowTargetType = targetType;
overflowExample = new Object[] {x, op, y};
}
}
@Override
public List<Problem> summarize() {
List<Problem> problems = super.summarize();
if (overflowCount > 0) {
problems.add(new ArithmeticOverflow(overflowTargetType, overflowCount, overflowExample));
}
return problems;
}
}
// Or we can have an aggregator that provides context info to report more 'rich' problems.
public class ColumnOperationProblemAggregator extends ProblemAggregator {
private final String columnName;
public ColumnOperationProblemAggregator(ProblemAggregator parent, String columnName) {
super(parent);
this.columnName = columnName;
}
public void reportFloatingPointEquality(int row) {
report(new FloatingPointGrouping(columnName, row));
}
}
// Usage
public class Java_Stuff {
public static int enso_facing_method(Column java_column, ProblemAggregator aggregator) {
int x = computeArithmeticOperation("foo", new ArithmeticOperationProblemAggregator(aggregator));
int y = storageMapping(java_column.getStorage(), new ColumnOperationProblemAggregator(aggregator, java_column.getName()));
return x + y;
}
private static int computeArithmeticOperation(String args, ArithmeticOperationProblemAggregator aggregator) {
// ...
aggregator.reportOverflow(IntegerType.INT_64, 123, "+", 42);
// ...
return 23;
}
// alternatively we can create the aggregator at usage site
private static int computeArithmeticOperation2(String args, ProblemAggregator aggregator) {
// Create a specialzied aggregator because we want it to be more efficient, all is passed to parent anyway.
ArithmeticOperationProblemAggregator specialized = new ArithmeticOperationProblemAggregator(aggregator);
// ...
specialized.reportOverflow(IntegerType.INT_64, 123, "+", 42);
// ...
return 23;
}
private static int storageMapping(Storage<?> storage, ColumnOperationProblemAggregator aggregator) {
if (storage.getType() instanceof FloatType) {
// ...
aggregator.reportFloatingPointEquality(0);
// ...
}
return 0;
}
}
/*
* Enso code:
* my_function column =
* # The wrapper guarantees that `summarize` will be called (unless an exception is thrown - but then we don't care about the problems - everything is simply discarded and cleaned up by GC).
* with_problem_aggregator Problem_Behavior.Report_Warning problem_aggregator->
* Java_Stuff.enso_facing_method column.java_column problem_aggregator
*/ Here we can see that we don't need any bookkeeping - thanks to the The specialized The specialized I don't think we can easily achieve this kind of context-information passing nor specialized implementations for efficient handling of big amounts of problems using the global state approach - there is no place to 'specialize' the aggregators there, at least in an elegant way. |
Beta Was this translation helpful? Give feedback.
-
Implicit Problem Aggregators
The Problem
Problem aggregation in the Java Table implementation suffers from a number of issues:
Each way of handling / aggregating problems is slightly different
Many of these isues would be solved by creating a more general problem aggregator (PA) to use everywhere. But the last problem remains: developers have to take care to propagate problems at all times, both in writing new code and modifying existing code.
Automatic Aggregation
The solution to this problem is to provide a facility for reporting problems that will always ensure that the problems are dealt with – propagated to the user by default, or optionally handled in some other way.
We propose that there exists at all times an ambient PA that problems can be reported to, and which will collect those problems and propagate them upwards to the user.
We maintain a stack of PAs. When crossing the boundary from Enso to Java, we push a PA onto the stack. We can push additional PAs onto the stack to wrap particular computations, and then pop them off. Popping a PA from the stack transfers its problems to its parent.
There is no need to pass PAs everywhere – the current PA is always available through the
ProblemAggregators
class.At any point, we can report a problem like this:
Calling
report
adds the problem to the current PA. Since the PA stack always has at least one entry, there is always a place for a problem to go.Developers can inspect and modify problems that have been aggregated, but by default they are simply propagated.
Problem Aggregator Scope
PAs are dynamically scoped. You can add a PA at any time:
This can default to a standard PA:
You only need to specify the PA if you are using a special one:
A push is generally paired with a pop, which should be inside a finally block:
Alternately, you can take care of the pop automatically with a wrapper
or
By default, when a PA is popped, it gives its problems to its parent. Alternatively, you can choose to intervene:
Or
Or
In the simplest case, you only need call
report(problem)
for each problem, knowing that the problem will be propagated correctly. All other calls to the PA system are for special cases: reducing / combining bulk problems; removing misleading or unimportant problems; or adding additional information to problems.Specialized PAs
We currently have a number of aggregators with convenience methods, such as
We could implement these methods on the default PA for many common cases. For more specialized use, you could refer to the PA directly:
Other specialized PAs might be:
report(new ArithmeticError(), columnName)
Extension to Enso code
While this proposal is for reporting problems within Java code, we can also consider using it from Enso. Enso code has some of the same problem aggregation problems: multiple implementations, impedance mismatch between different aggregators, and of course the possibility of dropping problems on the floor.
With Enso, the difference is that problems are attached to values. Reporting problems in Enso then means including the value in the aggregator, which allows the problem to be displayed in the right place in the IDE.
By default, any problem encountered while evaluating a node will be attached to that node. Additional logic could be used to rearrange or reattach problems, if needed.
Using PAs from Enso will look a lot like using it from Java, except perhaps even cleaner:
As before, there is no need to pass PAs around; they’re always available in any scope.
Beta Was this translation helpful? Give feedback.
All reactions