Skip to content

Commit f502941

Browse files
committed
Merge branch '3.8-dev'
2 parents 6b5c24d + a64ed72 commit f502941

File tree

40 files changed

+2404
-577
lines changed

40 files changed

+2404
-577
lines changed

CHANGELOG.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,8 @@ This release also includes changes from <<release-3-7-XXX, 3.7.XXX>>.
183183
* Renamed `MergeElementStep` to `MergeElementStep` as it is a base class to `mergeV()` and `mergeE()`.
184184
* Renamed `MergeStep` of `merge()` to `MergeElementStep` for consistency.
185185
* Modified `RepeatUnrollStrategy` to use a more conservative approach, only unrolling repeat loops containing safe navigation and filtering steps.
186+
* Modified `limit()`, `skip()`, range()` in `repeat()` to track per-iteration counters.
187+
* Moved `Traverser` loop logic into new interface `NL_SL_Traverser` and changed loop-supporting `Traverser`s to extend the common interface.
186188
187189
== TinkerPop 3.7.0 (Gremfir Master of the Pan Flute)
188190

docs/src/reference/the-traversal.asciidoc

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4079,10 +4079,23 @@ g.V().until(has('name','ripple')).
40794079
<1> do-while semantics stating to do `out()` 2 times.
40804080
<2> while-do semantics stating to break if the traverser is at a vertex named "ripple".
40814081
4082-
IMPORTANT: There are two modulators for `repeat()`: `until()` and `emit()`. If `until()` comes after `repeat()` it is
4083-
do/while looping. If `until()` comes before `repeat()` it is while/do looping. If `emit()` is placed after `repeat()`,
4084-
it is evaluated on the traversers leaving the repeat-traversal. If `emit()` is placed before `repeat()`, it is
4085-
evaluated on the traversers prior to entering the repeat-traversal.
4082+
[gremlin-groovy,grateful-dead]
4083+
----
4084+
g.V().has('name','JAM').repeat(out('followedBy').limit(2)).times(3) <1>
4085+
g.V().has('name','DRUMS').repeat(__.in('followedBy').range(1,3)).until(loops().is(2)) <2>
4086+
g.V().has('name','HEY BO DIDDLEY').repeat(out('followedBy').skip(5)).times(2) <3>
4087+
----
4088+
4089+
<1> Starting from the song 'JAM' get 2 songs that have followed, looping 3 times.
4090+
<2> Starting from the song 'DRUMS' get the 2nd and 3rd songs that have preceded, looping twice.
4091+
<3> Starting from the song 'HEY BO DIDDLEY' get the songs that have followed, skipping the first 5 and looping twice.
4092+
4093+
IMPORTANT: There are three modulators for `repeat()`: `times()`, `until()`, and `emit()`. The most straightforward is
4094+
`times()`, which indicates the number of times to execute the loop. Conditional loops can be executed using `until()`.
4095+
If `until()` comes after `repeat()` it is do/while looping. If `until()` comes before `repeat()` it is while/do looping.
4096+
Emission of traversers from the loop are controlled with `emit()`. If `emit()` is placed after `repeat()`, it is evaluated
4097+
on the traversers leaving the repeat-traversal. If `emit()` is placed before `repeat()`, it is evaluated on the
4098+
traversers prior to entering the repeat-traversal.
40864099
40874100
The `repeat()`-step also supports an "emit predicate", where the predicate for an empty argument `emit()` is
40884101
`true` (i.e. `emit() == emit{true}`). With `emit()`, the traverser is split in two -- the traverser exits the code

docs/src/upgrade/release-3.8.x.asciidoc

Lines changed: 64 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -901,15 +901,28 @@ impact.
901901
902902
===== Examples of Affected Traversals =====
903903
904-
Usage of `limit()` inside `repeat()` is an example of a traversal that will no longer be unrolled. The following results
905-
returned from the `modern` graph demonstrate the change of semantics if the `limit()` in `repeat()` were to be unrolled:
904+
Usage of `inject()` inside `repeat()` is an example of a traversal that will no longer be unrolled. The following results
905+
returned from the `modern` graph demonstrate the change of semantics if the `inject()` in `repeat()` were to be unrolled:
906906
907907
[source,text]
908908
----
909-
gremlin> g.V().both().limit(1).both().limit(1)
910-
==>v[1]
911-
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().repeat(both().limit()).times(2)
912-
gremlin>
909+
gremlin> g.V().values('name').inject('foo').inject('foo')
910+
==>foo
911+
==>foo
912+
==>marko
913+
==>vadas
914+
==>lop
915+
==>josh
916+
==>ripple
917+
==>peter
918+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().values('name').repeat(inject('foo')).times(2)
919+
==>foo
920+
==>marko
921+
==>vadas
922+
==>lop
923+
==>josh
924+
==>ripple
925+
==>peter
913926
----
914927
915928
Another example is the usage of `aggregate()` inside `repeat()`. The following results returned from the `modern` graph
@@ -973,6 +986,51 @@ g.V().repeat(both()).times(2).dedup()
973986
974987
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3192[TINKERPOP-3192]
975988
989+
==== Modified limit() skip() range() Semantics in repeat()
990+
991+
The semantics of `limit()`, `skip()`, and `range()` steps called with default `Scope` or explicit `Scope.global` inside
992+
`repeat()` have been modified to ensure consistent semantics across repeat iterations. Previously, these steps would
993+
track global state across iterations, leading to unexpected filtering behavior between loops.
994+
995+
Consider the following examples which demonstrate the unexpected behavior. Note that the examples for version 3.7.4
996+
disable the `RepeatUnrollStrategy` so that strategy optimization does not replace the `repeat()` traversal with a
997+
non-looping equivalent. 3.8.0 examples do not disable the `RepeatUnrollStrategy` as the strategy was modified to be more
998+
restrictive in this version.
999+
1000+
[source,groovy]
1001+
----
1002+
// 3.7.4 - grateful dead graph examples producing no results due to global counters
1003+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().has('name','JAM').repeat(out('followedBy').limit(2)).times(2).values('name')
1004+
gremlin>
1005+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V().has('name','DRUMS').repeat(__.in('followedBy').range(1,3)).times(2).values('name')
1006+
gremlin>
1007+
// 3.7.4 - modern graph examples demonstrating too many results with skip in repeat due to global counters
1008+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V(1).repeat(out().skip(1)).times(2).values('name')
1009+
==>ripple
1010+
==>lop
1011+
gremlin> g.withoutStrategies(RepeatUnrollStrategy).V(1).out().skip(1).out().skip(1).values('name')
1012+
==>lop
1013+
1014+
// 3.8.0 - grateful dead graph examples producing results as limit counters tracked per iteration
1015+
gremlin> g.V().has('name','JAM').repeat(out('followedBy').limit(2)).times(2).values('name')
1016+
==>HURTS ME TOO
1017+
==>BLACK THROATED WIND
1018+
gremlin> g.V().has('name','DRUMS').repeat(__.in('followedBy').range(1,3)).times(2).values('name')
1019+
==>DEAL
1020+
==>WOMEN ARE SMARTER
1021+
// 3.8.0 - modern graph examples demonstrating consistent skip semantics
1022+
gremlin> g.V(1).repeat(out().skip(1)).times(2).values('name')
1023+
==>lop
1024+
gremlin> g.V(1).out().skip(1).out().skip(1).values('name')
1025+
==>lop
1026+
----
1027+
1028+
This change ensures that `limit()`, `skip()`, and `range()` steps called with default `Scope` or explicit `Scope.global`
1029+
inside `repeat()` are more consistent with manually unrolled traversals. Before upgrading, users should determine if any
1030+
traversals use `limit()`, skip()`, or `range()` with default `Scope` or explicit `Scope.global` inside `repeat()`. If it
1031+
is desired that the limit or range should apply across all loops then the `limit()`, `skip()`, or `range()` step should
1032+
be moved out of the `repeat()` step.
1033+
9761034
=== Upgrading for Providers
9771035
9781036
==== Graph System Providers

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Traverser.java

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@
2323

2424
import java.io.Serializable;
2525
import java.util.Collections;
26-
import java.util.Comparator;
27-
import java.util.Optional;
2826
import java.util.Set;
2927
import java.util.function.Function;
3028

@@ -90,7 +88,9 @@ public default <A> A path(final Pop pop, final String stepLabel) {
9088
*
9189
* @return The number of times the traverser has gone through a loop
9290
*/
93-
public int loops();
91+
default int loops() {
92+
throw new UnsupportedOperationException("This traverser does not support loops: " + this.getClass().getCanonicalName());
93+
}
9494

9595
/**
9696
* Return the number of times the traverser has gone through the named looping section of a traversal.
@@ -99,7 +99,9 @@ public default <A> A path(final Pop pop, final String stepLabel) {
9999
* @return The number of times the traverser has gone through a loop
100100
* @throws IllegalArgumentException if the loopName is not defined
101101
*/
102-
public int loops(final String loopName);
102+
default int loops(final String loopName) {
103+
throw new UnsupportedOperationException("This traverser does not support loops: " + this.getClass().getCanonicalName());
104+
}
103105

104106
/**
105107
* A traverser may represent a grouping of traversers to allow for more efficient data propagation.
@@ -239,19 +241,22 @@ public interface Admin<T> extends Traverser<T>, Attachable<T> {
239241
* @param stepLabel the label of the step that is being set-up.
240242
* @param loopName the user defined name for referencing the loop counter or null if not set
241243
*/
242-
public void initialiseLoops(final String stepLabel, final String loopName);
244+
default void initialiseLoops(final String stepLabel, final String loopName) {
245+
}
243246

244247
/**
245248
* Increment the number of times the traverser has gone through a looping section of traversal.
246249
*/
247-
public void incrLoops();
250+
default void incrLoops() {
251+
}
248252

249253
/**
250254
* Set the number of times the traverser has gone through a loop back to 0.
251255
* When a traverser exits a looping construct, this method should be called.
252256
* In a nested loop context, the highest stack loop counter should be removed.
253257
*/
254-
public void resetLoops();
258+
default void resetLoops() {
259+
}
255260

256261
/**
257262
* Get the step id of where the traverser is located.

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/branch/RepeatStep.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@ protected Iterator<Traverser.Admin<S>> computerAlgorithm() throws NoSuchElementE
236236
return IteratorUtils.of(start);
237237
} else {
238238
start.setStepId(this.repeatTraversal.getStartStep().getId());
239-
start.initialiseLoops(start.getStepId(), this.loopName);
239+
start.initialiseLoops(this.getId(), this.getLoopName());
240240
if (doEmit(start, true)) {
241241
final Traverser.Admin<S> emitSplit = start.split();
242242
emitSplit.resetLoops();

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/RangeGlobalStep.java

Lines changed: 69 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,19 +18,25 @@
1818
*/
1919
package org.apache.tinkerpop.gremlin.process.traversal.step.filter;
2020

21+
import java.io.Serializable;
22+
import java.util.ArrayList;
23+
import java.util.Collections;
24+
import java.util.HashMap;
25+
import java.util.List;
26+
import java.util.Map;
27+
import java.util.Set;
28+
import java.util.concurrent.atomic.AtomicLong;
29+
import java.util.function.BinaryOperator;
30+
import org.apache.tinkerpop.gremlin.process.traversal.Step;
2131
import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
2232
import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
33+
import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
34+
import org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep;
2335
import org.apache.tinkerpop.gremlin.process.traversal.traverser.TraverserRequirement;
2436
import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.TraverserSet;
2537
import org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException;
2638
import org.apache.tinkerpop.gremlin.structure.util.StringFactory;
2739

28-
import java.io.Serializable;
29-
import java.util.Collections;
30-
import java.util.Set;
31-
import java.util.concurrent.atomic.AtomicLong;
32-
import java.util.function.BinaryOperator;
33-
3440
/**
3541
* @author Bob Briody (http://bobbriody.com)
3642
* @author Marko A. Rodriguez (http://markorodriguez.com)
@@ -39,7 +45,10 @@ public final class RangeGlobalStep<S> extends FilterStep<S> implements RangeGlob
3945

4046
private long low;
4147
private long high;
42-
private AtomicLong counter = new AtomicLong(0l);
48+
/**
49+
* If this range step is used inside a loop there can be multiple counters, otherwise there should only be one
50+
*/
51+
private Map<String, AtomicLong> counters = new HashMap<>();
4352
private boolean bypass;
4453

4554
public RangeGlobalStep(final Traversal.Admin traversal, final long low, final long high) {
@@ -55,31 +64,37 @@ public RangeGlobalStep(final Traversal.Admin traversal, final long low, final lo
5564
protected boolean filter(final Traverser.Admin<S> traverser) {
5665
if (this.bypass) return true;
5766

58-
if (this.high != -1 && this.counter.get() >= this.high) {
67+
final String counterKey = getCounterKey(traverser);
68+
final AtomicLong counter = counters.computeIfAbsent(counterKey, k -> new AtomicLong(0L));
69+
70+
if (this.high != -1 && counter.get() >= this.high) {
71+
if (hasRepeatStepParent()) {
72+
return false;
73+
}
5974
throw FastNoSuchElementException.instance();
6075
}
6176

6277
long avail = traverser.bulk();
63-
if (this.counter.get() + avail <= this.low) {
78+
if (counter.get() + avail <= this.low) {
6479
// Will not surpass the low w/ this traverser. Skip and filter the whole thing.
65-
this.counter.getAndAdd(avail);
80+
counter.getAndAdd(avail);
6681
return false;
6782
}
6883

6984
// Skip for the low and trim for the high. Both can happen at once.
7085

7186
long toSkip = 0;
72-
if (this.counter.get() < this.low) {
73-
toSkip = this.low - this.counter.get();
87+
if (counter.get() < this.low) {
88+
toSkip = this.low - counter.get();
7489
}
7590

7691
long toTrim = 0;
77-
if (this.high != -1 && this.counter.get() + avail >= this.high) {
78-
toTrim = this.counter.get() + avail - this.high;
92+
if (this.high != -1 && counter.get() + avail >= this.high) {
93+
toTrim = counter.get() + avail - this.high;
7994
}
8095

8196
long toEmit = avail - toSkip - toTrim;
82-
this.counter.getAndAdd(toSkip + toEmit);
97+
counter.getAndAdd(toSkip + toEmit);
8398
traverser.setBulk(toEmit);
8499

85100
return true;
@@ -88,7 +103,7 @@ protected boolean filter(final Traverser.Admin<S> traverser) {
88103
@Override
89104
public void reset() {
90105
super.reset();
91-
this.counter.set(0l);
106+
this.counters.clear();
92107
}
93108

94109
@Override
@@ -109,7 +124,7 @@ public Long getHighRange() {
109124
@Override
110125
public RangeGlobalStep<S> clone() {
111126
final RangeGlobalStep<S> clone = (RangeGlobalStep<S>) super.clone();
112-
clone.counter = new AtomicLong(0l);
127+
clone.counters = new HashMap<>();
113128
return clone;
114129
}
115130

@@ -142,6 +157,43 @@ public void processAllStarts() {
142157

143158
}
144159

160+
private String getCounterKey(final Traverser.Admin<S> traverser) {
161+
final List<String> counterKeyParts = new ArrayList<>();
162+
Traversal.Admin<Object, Object> traversal = this.getTraversal();
163+
if (hasRepeatStepParent()) {
164+
// the range step is inside a loop so we need to track counters per iteration
165+
// using a counter key that is composed of the parent steps to the root
166+
while (!traversal.isRoot()) {
167+
final TraversalParent pt = traversal.getParent();
168+
final Step<?, ?> ps = pt.asStep();
169+
final String pid = ps.getId();
170+
if (traverser.getLoopNames().contains(pid)) {
171+
counterKeyParts.add(pid);
172+
counterKeyParts.add(String.valueOf(traverser.loops(pid)));
173+
}
174+
traversal = ps.getTraversal();
175+
}
176+
}
177+
// reverse added parts so that it starts from root
178+
Collections.reverse(counterKeyParts);
179+
counterKeyParts.add(this.getId());
180+
if (traverser.getLoopNames().contains(this.getId())) {
181+
counterKeyParts.add(String.valueOf(traverser.loops(this.getId())));
182+
}
183+
return String.join(":", counterKeyParts);
184+
}
185+
186+
private boolean hasRepeatStepParent() {
187+
Traversal.Admin<?, ?> traversal = this.getTraversal();
188+
while (!traversal.isRoot()) {
189+
if (traversal.getParent() instanceof RepeatStep) {
190+
return true;
191+
}
192+
traversal = traversal.getParent().asStep().getTraversal();
193+
}
194+
return false;
195+
}
196+
145197
////////////////
146198

147199
public static final class RangeBiOperator<S> implements BinaryOperator<TraverserSet<S>>, Serializable {

gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/RepeatUnrollStrategy.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ public final class RepeatUnrollStrategy extends AbstractTraversalStrategy<Traver
6363

6464
private static final RepeatUnrollStrategy INSTANCE = new RepeatUnrollStrategy();
6565
static final int MAX_BARRIER_SIZE = 2500;
66+
/**
67+
* When adding more 'allowed' steps, consider if tests are needed that use the step 'withoutStrategies(RepeatUnrollStrategy)'
68+
* to validate the traversal with repeat has the same semantics as the unrolled equivalent
69+
*/
6670
private static final Set<Class> ALLOWED_STEP_CLASSES = Set.of(
6771
VertexStepContract.class,
6872
EdgeVertexStep.class,

0 commit comments

Comments
 (0)