-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix various eagerloading bugs #10862
Fix various eagerloading bugs #10862
Conversation
$manyManyComponent['extraFields'] = $schema->manyManyExtraFieldsForComponent($parentDataClass, $relationName) ?: []; | ||
if (is_a($manyManyComponent['relationClass'], ManyManyThroughList::class, true)) { | ||
$manyManyComponent['joinClass'] = $manyManyComponent['join']; | ||
$manyManyComponent['join'] = $schema->baseDataTable($manyManyComponent['joinClass']); | ||
} else { | ||
$manyManyComponent['joinClass'] = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make join
always be the join table, and add the join class separately.
Also fetch the extra fields schema up front.
// $join will either be: | ||
// - the join table name for many-many | ||
// - the join data class for many-many-through | ||
$join = $manyManyLastComponent['join']; | ||
$joinTable = $manyManyLastComponent['join']; | ||
$extraFields = $manyManyLastComponent['extraFields']; | ||
$joinClass = $manyManyLastComponent['joinClass']; | ||
|
||
// Get the join records so we can correctly identify which children belong to which parents | ||
$joinRows = DB::query('SELECT * FROM "' . $joinTable . '" WHERE "' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')'); | ||
|
||
// many_many_through | ||
if (is_a($manyManyLastComponent['relationClass'], ManyManyThroughList::class, true)) { | ||
$joinThroughObjs = DataObject::get($join)->filter([$parentIDField => $parentIDs]); | ||
$relationItemIDs = []; | ||
$joinRows = []; | ||
foreach ($joinThroughObjs as $joinThroughObj) { | ||
$joinRows[] = [ | ||
$parentIDField => $joinThroughObj->$parentIDField, | ||
$childIDField => $joinThroughObj->$childIDField | ||
]; | ||
$relationItemIDs[] = $joinThroughObj->$childIDField; | ||
} | ||
if ($joinClass !== null) { | ||
$relationList = ManyManyThroughList::create( | ||
$relationDataClass, | ||
$joinClass, | ||
$childIDField, | ||
$parentIDField, | ||
$extraFields, | ||
$relationDataClass, | ||
$parentDataClass | ||
)->forForeignID($parentIDs); | ||
// many_many + belongs_many_many | ||
} else { | ||
$joinTableQuery = DB::query('SELECT * FROM "' . $join . '" WHERE "' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')'); | ||
$relationItemIDs = $joinTableQuery->column($childIDField); | ||
$joinRows = $joinTableQuery; | ||
$relationList = ManyManyList::create( | ||
$relationDataClass, | ||
$joinTable, | ||
$childIDField, | ||
$parentIDField, | ||
$extraFields | ||
)->forForeignID($parentIDs); | ||
} | ||
|
||
// Get ALL of the items for this relation up front, for ALL of the parents | ||
// Use a real RelationList here so that the extraFields and join record are correctly set for all relations | ||
// Fetched as a map so we can get the ID for all records up front (instead of in another nested loop) | ||
// Fetched after that as an array because for some reason that performs better in the loop | ||
// Note that "Me" is a method on ViewableData that returns $this - i.e. that is the actual DataObject record | ||
$relationArray = DataObject::get($relationDataClass)->byIDs($relationItemIDs)->map('ID', 'Me')->toArray(); | ||
$relationArray = $relationList->map('ID', 'Me')->toArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this is just using the ManyManyList
and ManyManyThroughList
directly to get the records, instead of crafting our own queries. This means we get the extraFields data and join records set up correctly "for free" (read: without having to add any extra code, but with a ~50ms performance hit)
01d4b2b
to
7e616ac
Compare
87de3a1
to
dd5b8f9
Compare
CI failures are unrelated and will be fixed in #10866 |
$dataList = EagerLoadObject::get()->eagerLoad(...$eagerLoad); | ||
list($results, $selectCount) = $this->iterateEagerLoadData($dataList); | ||
// We can rely on the non-eager-loaded data being correct, since it's validated by other unit tests | ||
list($expectedResults, $_) = $this->iterateEagerLoadData(EagerLoadObject::get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just seems wrong. We shouldn't be assuming that whatever we read from the database is "expected". Expected is something that's hardcoded like it previously was.
Please put old expected values back in like they were here in the old tests/php/ORM/DataListTest.php::expectedEagerLoadData()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just testing that the eager loading functionality itself is working, yeah? And we know the lazy loading functionality works and gives us correct results because there's a whole bunch of tests for that already. So we can trust that if the eager loaded list matches the lazy loaded list, it's going to be correct.
That said, for the sake of avoiding ping pong over a unit test which ultimately whether it's static data or correct dynamic data it ends out the same, I'll just go ahead and make this change.
I'll leave the new tests as they are though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems fine, got a merge conflict
Includes making sure toArray() uses the correct iteration logic, and cloning resets the eagerloaded data for the new clone. Previously, a second iteration would add every relation item to the relation list a second time - so with each iteration your relation list count doubled (though it was the same records time and again).
Add tests for: - filtered, limited, and sorted DataLists with eagerloaded relations - iterating through the list multiple times - eagerLoad method can be called anywhere in a query chain - eager loaded relation lists can be correctly empty without throwing exceptions - repeating relations to be eagerloaded in various permutations doesn't cause issues - eagerloading doesn't negatively impact chunkedFetch functionality squash
Don't fetch eager-loaded has_many or many_many unless we've validated that the relation IS actually a has_many or many_many relation. Also clear eager-loaded relations when flushing the record.
75e50a0
to
ed4c34b
Compare
Conflict resolved - it was just the addition of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Multiple commits
I'm doing this all in one PR because it's going to all be affecting the same two files, and I don't want to have to keep rebasing each time a PR is merged for this. It's all related to each other anyway.
Anything which needed a fix in the code has an appropriately prefixed commit. There's one commit that's just a bunch of new tests that didn't need any changes to the code.
Feel free to squash if you want to when merging, though - if you do so, use the PR name as the commit message.
Commit 1 - minor refactor
Refactors the unit tests related to eagerloading into their own class, and moves their associated DataObjects into a new eagerloading namespace.
This is just to make it easier to find, when someone needs to check or add a test for eagerloading it's immediately clear where that goes.
Commit 2 -
many_many_extraFields
and join records formany_many
throughAdd tests for
many_many_extraFields
and join records formany_many
through.Fix functionality for both. Previously, both of these new tests would fail.
I'm using the actual
ManyManyList
andManyManyThroughList
classes here because they already know how to correctly handle the extraFields and join records. This does end up being ~50ms slower than$relationArray = DataObject::get($relationDataClass)->where('ID in ('. implode(',', $relationItemIDs) . ')')->map('ID', 'Me')->toArray();
(which is the faster version ofbyIDs
, see #10860) - but it has the following benefits:DataList::createDataObject()
and therefore part of the record's hydrated dataCommit 3 - iterating over the same list multiple times
Fix bug where iterating over the same
DataList
more than once would duplicate the records in eager-loaded relation lists.The test is in the following commit, since this bug was found while I was working on filtering and had to be fixed for the filtering test to work.
Also fixes some related issues, such as
toArray()
not using the normal iterator logic (which meant it would separately exhibit this bug), and no longer carrying over eager-loaded data to a clone of the list.Commit 4 - Add tests for
filter
,sort
,limit
, and other thingsTests that the underlying
DataList
can be filtered, sorted, limited, iterated over multiple times. also checks thateagerLoad()
can be called at any stage in a query chain and a couple of other things (see the commit message).There's also a slight refactor of the test class here - instead of using a static set of expected results, we're relying on the lazy-loaded data which we know will be correct if all other framework ORM tests pass. Other small changes were also made as necessary to accomodate the new tests
Commit 5 - DataObject protections against bad relations
DataObject
had some protections when fetching has_many or many_many relations using the components methods - we were bypassing those protections but now we're not. Also make sure we clear eager-loaded relation data when flushing a record, which is very useful for memory management.Issue