Skip to content

Commit a64ed72

Browse files
authored
Change range in repeat to track counters per iteration (#3241)
https://issues.apache.org/jira/browse/TINKERPOP-3202 This changeset aligns the semantics of traversals with range() and limit() used in repeat() with the behaviour of the equivalent 'unrolled' traversal without repeat(). This is done by tracking per-iteration counters instead of a global counter. The semantics of range() and limit() without repeat() and with local() were not modified. Single and nested loop Traverser implementations were also refactored to reduce much code duplication by consolidating common loop logic into a new NL_SL_Traverser base interface for loop-aware traversers.
1 parent 76a76c1 commit a64ed72

File tree

40 files changed

+2398
-558
lines changed

40 files changed

+2398
-558
lines changed

CHANGELOG.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ This release also includes changes from <<release-3-7-XXX, 3.7.XXX>>.
117117
* Renamed `MergeElementStep` to `MergeElementStep` as it is a base class to `mergeV()` and `mergeE()`.
118118
* Renamed `MergeStep` of `merge()` to `MergeElementStep` for consistency.
119119
* Modified `RepeatUnrollStrategy` to use a more conservative approach, only unrolling repeat loops containing safe navigation and filtering steps.
120+
* Modified `limit()`, `skip()`, range()` in `repeat()` to track per-iteration counters.
121+
* Moved `Traverser` loop logic into new interface `NL_SL_Traverser` and changed loop-supporting `Traverser`s to extend the common interface.
120122
121123
== TinkerPop 3.7.0 (Gremfir Master of the Pan Flute)
122124

docs/src/reference/the-traversal.asciidoc

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4075,10 +4075,23 @@ g.V().until(has('name','ripple')).
40754075
<1> do-while semantics stating to do `out()` 2 times.
40764076
<2> while-do semantics stating to break if the traverser is at a vertex named "ripple".
40774077
4078-
IMPORTANT: There are two modulators for `repeat()`: `until()` and `emit()`. If `until()` comes after `repeat()` it is
4079-
do/while looping. If `until()` comes before `repeat()` it is while/do looping. If `emit()` is placed after `repeat()`,
4080-
it is evaluated on the traversers leaving the repeat-traversal. If `emit()` is placed before `repeat()`, it is
4081-
evaluated on the traversers prior to entering the repeat-traversal.
4078+
[gremlin-groovy,grateful-dead]
4079+
----
4080+
g.V().has('name','JAM').repeat(out('followedBy').limit(2)).times(3) <1>
4081+
g.V().has('name','DRUMS').repeat(__.in('followedBy').range(1,3)).until(loops().is(2)) <2>
4082+
g.V().has('name','HEY BO DIDDLEY').repeat(out('followedBy').skip(5)).times(2) <3>
4083+
----
4084+
4085+
<1> Starting from the song 'JAM' get 2 songs that have followed, looping 3 times.
4086+
<2> Starting from the song 'DRUMS' get the 2nd and 3rd songs that have preceded, looping twice.
4087+
<3> Starting from the song 'HEY BO DIDDLEY' get the songs that have followed, skipping the first 5 and looping twice.
4088+
4089+
IMPORTANT: There are three modulators for `repeat()`: `times()`, `until()`, and `emit()`. The most straightforward is
4090+
`times()`, which indicates the number of times to execute the loop. Conditional loops can be executed using `until()`.
4091+
If `until()` comes after `repeat()` it is do/while looping. If `until()` comes before `repeat()` it is while/do looping.
4092+
Emission of traversers from the loop are controlled with `emit()`. If `emit()` is placed after `repeat()`, it is evaluated
4093+
on the traversers leaving the repeat-traversal. If `emit()` is placed before `repeat()`, it is evaluated on the
4094+
traversers prior to entering the repeat-traversal.
40824095
40834096
The `repeat()`-step also supports an "emit predicate", where the predicate for an empty argument `emit()` is
40844097
`true` (i.e. `emit() == emit{true}`). With `emit()`, the traverser is split in two -- the traverser exits the code

docs/src/upgrade/release-3.8.x.asciidoc

Lines changed: 64 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -901,15 +901,28 @@ impact.
901901
902902
===== Examples of Affected Traversals =====
903903
904-
Usage of `limit()` inside `repeat()` is an example of a traversal that will no longer be unrolled. The following results
905-
returned from the `modern` graph demonstrate the change of semantics if the `limit()` in `repeat()` were to be unrolled:
904+
Usage of `inject()` inside `repeat()` is an example of a traversal that will no longer be unrolled. The following results
905+
returned from the `modern` graph demonstrate the change of semantics if the `inject()` in `repeat()` were to be unrolled:
906906
907907
[source,text]
908908
----
909-
gremlin> g.V().both().limit(1).both().limit(1)
910-
==>v[1]
911-
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().repeat(both().limit()).times(2)
912-
gremlin>
909+
gremlin> g.V().values('name').inject('foo').inject('foo')
910+
==>foo
911+
==>foo
912+
==>marko
913+
==>vadas
914+
==>lop
915+
==>josh
916+
==>ripple
917+
==>peter
918+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().values('name').repeat(inject('foo')).times(2)
919+
==>foo
920+
==>marko
921+
==>vadas
922+
==>lop
923+
==>josh
924+
==>ripple
925+
==>peter
913926
----
914927
915928
Another example is the usage of `aggregate()` inside `repeat()`. The following results returned from the `modern` graph
@@ -973,6 +986,51 @@ g.V().repeat(both()).times(2).dedup()
973986
974987
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3192[TINKERPOP-3192]
975988
989+
==== Modified limit() skip() range() Semantics in repeat()
990+
991+
The semantics of `limit()`, `skip()`, and `range()` steps called with default `Scope` or explicit `Scope.global` inside
992+
`repeat()` have been modified to ensure consistent semantics across repeat iterations. Previously, these steps would
993+
track global state across iterations, leading to unexpected filtering behavior between loops.
994+
995+
Consider the following examples which demonstrate the unexpected behavior. Note that the examples for version 3.7.4
996+
disable the `RepeatUnrollStrategy` so that strategy optimization does not replace the `repeat()` traversal with a
997+
non-looping equivalent. 3.8.0 examples do not disable the `RepeatUnrollStrategy` as the strategy was modified to be more
998+
restrictive in this version.
999+
1000+
[source,groovy]
1001+
----
1002+
// 3.7.4 - grateful dead graph examples producing no results due to global counters
1003+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().has('name','JAM').repeat(out('followedBy').limit(2)).times(2).values('name')
1004+
gremlin>
1005+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().has('name','DRUMS').repeat(__.in('followedBy').range(1,3)).times(2).values('name')
1006+
gremlin>
1007+
// 3.7.4 - modern graph examples demonstrating too many results with skip in repeat due to global counters
1008+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V(1).repeat(out().skip(1)).times(2).values('name')
1009+
==>ripple
1010+
==>lop
1011+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V(1).out().skip(1).out().skip(1).values('name')
1012+
==>lop
1013+
1014+
// 3.8.0 - grateful dead graph examples producing results as limit counters tracked per iteration
1015+
gremlin> g.V().has('name','JAM').repeat(out('followedBy').limit(2)).times(2).values('name')
1016+
==>HURTS ME TOO
1017+
==>BLACK THROATED WIND
1018+
gremlin> g.V().has('name','DRUMS').repeat(__.in('followedBy').range(1,3)).times(2).values('name')
1019+
==>DEAL
1020+
==>WOMEN ARE SMARTER
1021+
// 3.8.0 - modern graph examples demonstrating consistent skip semantics
1022+
gremlin> g.V(1).repeat(out().skip(1)).times(2).values('name')
1023+
==>lop
1024+
gremlin> g.V(1).out().skip(1).out().skip(1).values('name')
1025+
==>lop
1026+
----
1027+
1028+
This change ensures that `limit()`, `skip()`, and `range()` steps called with default `Scope` or explicit `Scope.global`
1029+
inside `repeat()` are more consistent with manually unrolled traversals. Before upgrading, users should determine if any
1030+
traversals use `limit()`, skip()`, or `range()` with default `Scope` or explicit `Scope.global` inside `repeat()`. If it
1031+
is desired that the limit or range should apply across all loops then the `limit()`, `skip()`, or `range()` step should
1032+
be moved out of the `repeat()` step.
1033+
9761034
=== Upgrading for Providers
9771035
9781036
==== Graph System Providers

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Traverser.java

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@
2323

2424
import java.io.Serializable;
2525
import java.util.Collections;
26-
import java.util.Comparator;
27-
import java.util.Optional;
2826
import java.util.Set;
2927
import java.util.function.Function;
3028

@@ -90,7 +88,9 @@ public default <A> A path(final Pop pop, final String stepLabel) {
9088
*
9189
* @return The number of times the traverser has gone through a loop
9290
*/
93-
public int loops();
91+
default int loops() {
92+
throw new UnsupportedOperationException("This traverser does not support loops: " + this.getClass().getCanonicalName());
93+
}
9494

9595
/**
9696
* Return the number of times the traverser has gone through the named looping section of a traversal.
@@ -99,7 +99,9 @@ public default <A> A path(final Pop pop, final String stepLabel) {
9999
* @return The number of times the traverser has gone through a loop
100100
* @throws IllegalArgumentException if the loopName is not defined
101101
*/
102-
public int loops(final String loopName);
102+
default int loops(final String loopName) {
103+
throw new UnsupportedOperationException("This traverser does not support loops: " + this.getClass().getCanonicalName());
104+
}
103105

104106
/**
105107
* A traverser may represent a grouping of traversers to allow for more efficient data propagation.
@@ -239,19 +241,22 @@ public interface Admin<T> extends Traverser<T>, Attachable<T> {
239241
* @param stepLabel the label of the step that is being set-up.
240242
* @param loopName the user defined name for referencing the loop counter or null if not set
241243
*/
242-
public void initialiseLoops(final String stepLabel, final String loopName);
244+
default void initialiseLoops(final String stepLabel, final String loopName) {
245+
}
243246

244247
/**
245248
* Increment the number of times the traverser has gone through a looping section of traversal.
246249
*/
247-
public void incrLoops();
250+
default void incrLoops() {
251+
}
248252

249253
/**
250254
* Set the number of times the traverser has gone through a loop back to 0.
251255
* When a traverser exits a looping construct, this method should be called.
252256
* In a nested loop context, the highest stack loop counter should be removed.
253257
*/
254-
public void resetLoops();
258+
default void resetLoops() {
259+
}
255260

256261
/**
257262
* Get the step id of where the traverser is located.

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/branch/RepeatStep.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ protected Iterator<Traverser.Admin<S>> computerAlgorithm() throws NoSuchElementE
237237
return IteratorUtils.of(start);
238238
} else {
239239
start.setStepId(this.repeatTraversal.getStartStep().getId());
240-
start.initialiseLoops(start.getStepId(), this.loopName);
240+
start.initialiseLoops(this.getId(), this.getLoopName());
241241
if (doEmit(start, true)) {
242242
final Traverser.Admin<S> emitSplit = start.split();
243243
emitSplit.resetLoops();

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/RangeGlobalStep.java

Lines changed: 69 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,19 +18,25 @@
1818
*/
1919
package org.apache.tinkerpop.gremlin.process.traversal.step.filter;
2020

21+
import java.io.Serializable;
22+
import java.util.ArrayList;
23+
import java.util.Collections;
24+
import java.util.HashMap;
25+
import java.util.List;
26+
import java.util.Map;
27+
import java.util.Set;
28+
import java.util.concurrent.atomic.AtomicLong;
29+
import java.util.function.BinaryOperator;
30+
import org.apache.tinkerpop.gremlin.process.traversal.Step;
2131
import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
2232
import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
33+
import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
34+
import org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep;
2335
import org.apache.tinkerpop.gremlin.process.traversal.traverser.TraverserRequirement;
2436
import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.TraverserSet;
2537
import org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException;
2638
import org.apache.tinkerpop.gremlin.structure.util.StringFactory;
2739

28-
import java.io.Serializable;
29-
import java.util.Collections;
30-
import java.util.Set;
31-
import java.util.concurrent.atomic.AtomicLong;
32-
import java.util.function.BinaryOperator;
33-
3440
/**
3541
* @author Bob Briody (http://bobbriody.com)
3642
* @author Marko A. Rodriguez (http://markorodriguez.com)
@@ -39,7 +45,10 @@ public final class RangeGlobalStep<S> extends FilterStep<S> implements RangeGlob
3945

4046
private long low;
4147
private long high;
42-
private AtomicLong counter = new AtomicLong(0l);
48+
/**
49+
* If this range step is used inside a loop there can be multiple counters, otherwise there should only be one
50+
*/
51+
private Map<String, AtomicLong> counters = new HashMap<>();
4352
private boolean bypass;
4453

4554
public RangeGlobalStep(final Traversal.Admin traversal, final long low, final long high) {
@@ -55,31 +64,37 @@ public RangeGlobalStep(final Traversal.Admin traversal, final long low, final lo
5564
protected boolean filter(final Traverser.Admin<S> traverser) {
5665
if (this.bypass) return true;
5766

58-
if (this.high != -1 && this.counter.get() >= this.high) {
67+
final String counterKey = getCounterKey(traverser);
68+
final AtomicLong counter = counters.computeIfAbsent(counterKey, k -> new AtomicLong(0L));
69+
70+
if (this.high != -1 && counter.get() >= this.high) {
71+
if (hasRepeatStepParent()) {
72+
return false;
73+
}
5974
throw FastNoSuchElementException.instance();
6075
}
6176

6277
long avail = traverser.bulk();
63-
if (this.counter.get() + avail <= this.low) {
78+
if (counter.get() + avail <= this.low) {
6479
// Will not surpass the low w/ this traverser. Skip and filter the whole thing.
65-
this.counter.getAndAdd(avail);
80+
counter.getAndAdd(avail);
6681
return false;
6782
}
6883

6984
// Skip for the low and trim for the high. Both can happen at once.
7085

7186
long toSkip = 0;
72-
if (this.counter.get() < this.low) {
73-
toSkip = this.low - this.counter.get();
87+
if (counter.get() < this.low) {
88+
toSkip = this.low - counter.get();
7489
}
7590

7691
long toTrim = 0;
77-
if (this.high != -1 && this.counter.get() + avail >= this.high) {
78-
toTrim = this.counter.get() + avail - this.high;
92+
if (this.high != -1 && counter.get() + avail >= this.high) {
93+
toTrim = counter.get() + avail - this.high;
7994
}
8095

8196
long toEmit = avail - toSkip - toTrim;
82-
this.counter.getAndAdd(toSkip + toEmit);
97+
counter.getAndAdd(toSkip + toEmit);
8398
traverser.setBulk(toEmit);
8499

85100
return true;
@@ -88,7 +103,7 @@ protected boolean filter(final Traverser.Admin<S> traverser) {
88103
@Override
89104
public void reset() {
90105
super.reset();
91-
this.counter.set(0l);
106+
this.counters.clear();
92107
}
93108

94109
@Override
@@ -109,7 +124,7 @@ public Long getHighRange() {
109124
@Override
110125
public RangeGlobalStep<S> clone() {
111126
final RangeGlobalStep<S> clone = (RangeGlobalStep<S>) super.clone();
112-
clone.counter = new AtomicLong(0l);
127+
clone.counters = new HashMap<>();
113128
return clone;
114129
}
115130

@@ -142,6 +157,43 @@ public void processAllStarts() {
142157

143158
}
144159

160+
private String getCounterKey(final Traverser.Admin<S> traverser) {
161+
final List<String> counterKeyParts = new ArrayList<>();
162+
Traversal.Admin<Object, Object> traversal = this.getTraversal();
163+
if (hasRepeatStepParent()) {
164+
// the range step is inside a loop so we need to track counters per iteration
165+
// using a counter key that is composed of the parent steps to the root
166+
while (!traversal.isRoot()) {
167+
final TraversalParent pt = traversal.getParent();
168+
final Step<?, ?> ps = pt.asStep();
169+
final String pid = ps.getId();
170+
if (traverser.getLoopNames().contains(pid)) {
171+
counterKeyParts.add(pid);
172+
counterKeyParts.add(String.valueOf(traverser.loops(pid)));
173+
}
174+
traversal = ps.getTraversal();
175+
}
176+
}
177+
// reverse added parts so that it starts from root
178+
Collections.reverse(counterKeyParts);
179+
counterKeyParts.add(this.getId());
180+
if (traverser.getLoopNames().contains(this.getId())) {
181+
counterKeyParts.add(String.valueOf(traverser.loops(this.getId())));
182+
}
183+
return String.join(":", counterKeyParts);
184+
}
185+
186+
private boolean hasRepeatStepParent() {
187+
Traversal.Admin<?, ?> traversal = this.getTraversal();
188+
while (!traversal.isRoot()) {
189+
if (traversal.getParent() instanceof RepeatStep) {
190+
return true;
191+
}
192+
traversal = traversal.getParent().asStep().getTraversal();
193+
}
194+
return false;
195+
}
196+
145197
////////////////
146198

147199
public static final class RangeBiOperator<S> implements BinaryOperator<TraverserSet<S>>, Serializable {

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/RepeatUnrollStrategy.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ public final class RepeatUnrollStrategy extends AbstractTraversalStrategy<Traver
6363

6464
private static final RepeatUnrollStrategy INSTANCE = new RepeatUnrollStrategy();
6565
static final int MAX_BARRIER_SIZE = 2500;
66+
/**
67+
* When adding more 'allowed' steps, consider if tests are needed that use the step 'withoutStrategies(RepeatUnrollStrategy)'
68+
* to validate the traversal with repeat has the same semantics as the unrolled equivalent
69+
*/
6670
private static final Set<Class> ALLOWED_STEP_CLASSES = Set.of(
6771
VertexStepContract.class,
6872
EdgeVertexStep.class,

0 commit comments

Comments
 (0)