-
Notifications
You must be signed in to change notification settings - Fork 0
Optimisations
This page shows how to optimise performance of the prime ORM.
Some queries are more optimised than the default Query
implementation.
If you simply want to load an entity by its primary key without any additional filters, use RepositoryQueryFactory::findById()
instead of Query::get()
, or manually apply primary key filters. The query instance, and also it's prepared statement be reused
on future call, and use KeyValueQuery
implementation, providing better compiler performance.
Note: This method cannot be used if you need extra filters on the query.
Example:
// Ok
User::findById(42);
User::repository()->findById(42);
User::findById(42)?->load('groups'); // You can load relations using load() method
// Not optimised
User::builder()->get(42);
User::get(42);
User::where('id', 42)->first();
User::with('groups')->get(42); // with() is not more optimised than load() in case of single entity
// Cannot be optimised properly
// Use extra filter
User::where('roles', (new Like(5))->searchableArray())->get(42);
// Filter can be performed in PHP, but this way will perform unnecessary load of an entity if the filter do not match
// Also performing filter on a relation is counterproductive
$user = ($user = User::findById(42)) && in_array($user->roles(), 5) ? $user : null;
// There is no optimised equivalent implemented natively
User::getOrFail(42);
// You can manually throw the exception
User::findById(42) ?? throw new EntityNotFoundException();
For perform simply filter operations, KeyValueQuery
will perform better than base Query
implementation.
This query can be used if :
- Entity has no constraints
- Filter with equality on entity fields (or use filters at all)
- Do not filter through relations (i.e. do not perform join)
- Do not filter on
NULL
values - Only use
AND
filters combination
Unlike Query
, calling where()
multiple times on the same field will replace the compared value (i.e. change bind value)
instead of adding a new filter. So this query can be reused multiple times with different bound values.
Note: Because of the implementation of
Walker
, iterate over large number of entities if less performant onKeyValueQuery
than onQuery
. So, even if the query can be executed byKeyValueQuery
, prefer useQuery
in this specific case (i.e. walk on very large table).
// Optimised
User::keyValue()->all(); // No filters at all
User::keyValue('login', 'robert')->first(); // Single filter
User::keyValue(['login' => 'robert', 'enabled' => true])->first(); // Also works with multiple AND filters
Order::keyValue()->where('userId', 42)->sum('price'); // Aggregation methods are available
User::keyValue('enabled', false)->update(['password' => null]); // Update with simple filter
User::keyValue('enabled', false)->delete(); // Delete with simple filter
User::keyValue()->with('groups')->where('enabled', true)->getOrFail(42); // query extensions methods are also available
// Reuse a query instance, by just changing a bound value
$query = Post::keyValue('threadId', 42)->where('postNumber', 26);
$posts = []
while ($post = $query->first()) {
$posts[] = $post;
$query->where('postNumber', $post->quotedNumber());
}
// Not optimised
User::builder()->all();
User::where('login', 'robert')->first();
User::where(['login' => 'robert', 'enabled' => true])->first();
User::where('enabled', false)->update(['password' => null]);
User::with('groups')->where('enabled', true)->getOrFail(42);
// Default query builder instance cannot be reused
$posts = [];
$postNumber = 26;
while ($post = Post::where('threadId', 42)->where('postNumber', $postNumber)->first()) {
$posts[] = $post;
$postNumber = $post->quotedNumber();
}
// Cannot be optimised properly
foreach (User::walk() as $user) { /* ... */ } // Walker is not optimised on KeyValueQuery on large table. Prefer use default query when table has more than 10k rows
User::builder()->where('updatedAt', null)->all(); // null comparison is not possible
User::builder()->where('createdAt', '>=', new DateTime('2022-10-05'))->all(); // any operator other than equality are not supported
User::builder()->where('groups.id', 5)->all(); // Join and relations are not supported
User::builder()->where('login', 'robert')->orWhere('name', 'robert')->all(); // OR is also not supported
EntityRepository
and RepositoryQueryFactory
provides
some utility methods which use optimised queries and reuse prepared statements :
-
EntityRepository::exists(object $entity)
: Check existence of an entity from database -
EntityRepository::refresh(object $entity)
: Retrieve database value of the given entity. Internally usefindById
Note: optimised query is not use when extra filters is given at second parameter. -
RepositoryQueryFactory::countKeyValue()
: Count number of entities usingKeyValueQuery
. Generated query will be reused.
// Optimised
User::countKeyValue(); // Count all users
User::countKeyValue('login', 'robert') > 0; // Check existence of user with login = robert
User::exists($user); // Check existence of user in database
User::refresh($user); // Retrieve user from database
// Not optimised
User::count(); // Do not use optimised query
User::where('login', 'robert')->count() > 0; // Use default Query
User::keyValue('login', 'robert')->count() > 0; // Do not cache query instance and prepared statement, so performance is lower on multiple calls
User::where('id', $user->id())->count() > 0; // Use default Query
User::keyValue('id', $user->id())->count() > 0; // Do not cache query instance and prepared statement
User::get($user->id()); // Do not use optimised query
// Cannot be optimised properly
User::where('groups.id', 5)->count(); // Use relation filter
User::count(['name :like' => 'foo%']); // Use custom operator
User::refresh($user, ['enabled' => true]); // Custom constraint will disable optimisation
The N+1 problem occurs when one (or more) query is executed on each result of a first query. More generally, N+1 problem occurs when a query is executed an arbitrary number of times into a loop.
Example:
// Execute a first query
foreach (User::where('name', ':like', 'a%') as $user) {
// Load groups of the user, so execute one query on each user
$user->load('groups');
}
// Execute a query for each role of the user
$user = User::findById(42);
$roles = array_map(fn (int $id) => Role::findById($id), $user->roles());
This problem cannot be completely solved, but some utilities are provided to mitigate some cases.
The simplest way of resolving N+1 problem with relation loading is to call with()
method on the query,
with all required relations when requesting multiple entities.
Note: this may need some changed on the entity structure to allow attached relations.
// Ok
foreach (User::with(['groups', 'administrated'])->where($filters) as $user) {
// Note: be sure that the called function do not load relations
$this->doSomeJob($user);
}
// Bad
foreach (User::where($filters) as $user) {
$this->doSomeJob($user->load('groups'), $user->relation('administrated')->all());
}
Sometimes you need to load multiple entities by their primary key, or simply refreshing them. To perform this operation in one query you can use :
-
RepositoryQueryFactory::entities()
which refresh a list of entities -
where()
primary key in filter
$user = User::findById(42);
// Ok
$roles = Role::where('id', $user->roles())->all(); // equality operator with an array will be automatically converted to "IN" filter
// Do some long work...
// Refresh all entities in one query
$refreshed = Role::entities($roles);
$hasChanged = $roles == $refreshed; // So we can check if there is a change
// Bad
$roles = array_map(fn (int $id) => Role::findById($id), $user->roles());
$hasChanged = false;
foreach ($roles as $role) {
if ($role != Role::refresh($role)) {
$hasChanged = true;
break;
}
}
EntityCollection
allows to perform some bulk operations with a minimal number of queries :
-
EntityCollection::load()
Load relations on each entity of the collection. Will perform likeQuery::with()
method. -
EntityCollection::link()
Get aQuery
object requesting the given relation for all entities on the collection. This is equivalent of callingEntityRelation::query()
on each entity, and merging results. -
EntityCollection::query()
Get aQuery
object pointing to all entities of the collection. This is equivalent of calling:MyEntity::where('pk', 'in', $ids)
. -
EntityCollection::refresh()
Refresh all entities of the collections. This is equivalent of callingRepositoryQueryFactory::entities()
but on a collection object. -
EntityCollection::update()
Perform a bulk update on each entity. Note: this method will perform a low level update query, soupdated
event will not be triggered.
This collection can be created by calling Query::wrapAs(EntityCollection::class)
or passing array of entities as parameter of method EntityRepository::collection()
.
$users = User::where($filters)->wrapAs(EntityCollection::class)->all();
$users = User::collection(User::where($filters)->all());
// Ok
$users->load('groups'); // If you don't have access to query builder instance, you can load relations without N+1 by using EntityCollection after base query execution.
$groups = $users->link('administrated')->where('name', ':like', 'a%')->all(); // Query all relations in a single query.
$hasChanged = clone $users == $users->refresh(); // Retrieve database version of all entities and check if there is a change.
$users->update(['password' => null]); // Perform update query on all entities. Be aware: behaviors will be skipped, because no events will be triggered !
// Bad
foreach ($users as $user) { $user->load('groups'); }
$groups = array_merge(...array_map(fn ($user) => $user->relation('administrated')->where('name', ':like', 'a%')->all(), $users));
See How MySQL Uses Indexes for more information. Using an index of table is a powerful mechanism for optimise perform search operations. Indexes are declared on mapper.
Consider adding an index when :
- Table is large and queries are slow.
- Same filters are applied often. In this case you can add compound index on fields used by all filters.
- Requested value is mostly unique (adding an index on a boolean field is counterproductive).
- A field is often used for sort results.
- In case of compound indexes, use the most unique field first.
Do not add an index when :
- Only few values are available for the given field (ex: boolean or enum).
- Perform search string operation other than starts with (i.e.
LIKE %xxx%
is not optimised). - Filter with primary key, or any other unique field (unique index will be used).
- Too many indexes. Keep only most efficient ones.
Note: Use index will decrease write performance, but increase read one. So you need to find the best balance between read and write performance.
You can also use an external indexer for querying, and synchronize entities asynchronously to get the best read and write performances. See
b2pweb/bdf-prime-indexer
for elasticsearch usage on prime.
// "login" is already marked as unique, so explicitly adding an index is unnecessary
User::keyValue('login', $login)->first();
// "name" field is almost unique, and a "starts with" filter is performed, so adding an index can improve performance
// Adding an index on "enabled" is not recommended because has only two distinct values
User::where('name', ':like', 'a%')->where('enabled', true)->all();
// Here index on "name" cannot be used, so declaring it is not useful
// But result should be sorted bby "createdAt", so you can add an index on this field
User::where('name', ':like', '%a%')->order('createdAt')->all();
Prime has two caches :
- Metadata cache, which stores built metadata of mapper classes. So when used, mapper builders will be configured once, and all built metadata, like fields, indexes or relations will be stored into the cache. This cache use PSR-16 simple cache.
- Query cache, which stores query results. So when used on a given query, and if the query generates same SQL and has same binding multiple times, the cached result will be returned.
This cache use
CacheInterface
type.
All cache instances should be passed at MapperFactory
constructor.
Once declared, metadata cache will be automatically used.
<?php
use Bdf\Prime\ConnectionManager;
use Bdf\Prime\ServiceLocator;
use Bdf\Prime\Mapper\MapperFactory;
use Bdf\Prime\Cache\ArrayCache;
$connections = new ConnectionManager();
$connections->declareConnection('DB', 'mysql://root@localhost/DB');
$prime = new ServiceLocator($connections, new MapperFactory(
null,
new FilesystemCache(__DIR__.'/var/cache'), // Metadata cache. You can use any implementation of PSR-16 simple cache.
new ArrayCache() // Query cache. It's recommended to use a memory cache (i.e. non persistent cache)
));
Query cache has to be enabled manually by calling Cachable::useCache()
on query.
Once activated, when same query will be executed multiple times, the same result will be returned from cache.
Note: when a write operation is performed on a table, all cache related to this table will be cleared.
Because this cache handle dynamic values, it must have a short lifetime and have a limited scope. So it's recommended to not enable this cache on long-life scripts like workers, and always use memory cache to limit the scope.
Note: The query cache will only slightly improve read performance. So enable query cache only if you are in situation where the same query is called multiple times, and you can't remove those calls.
User::where('name', ':like', 'a%')->useCache()->all(); // Get actual database result abd store into cache
User::where('name', ':like', 'a%')->all(); // Get actual database result
User::where('name', ':like', 'a%')->useCache()->all(); // Get cached result
User::where('name', ':like', 'b%')->useCache()->all(); // Different binding, so get from database
User::findById(42)->setPassword(null)->save(); // Trigger a write operation, caches will be cleared
User::where('name', ':like', 'a%')->useCache()->all(); // Cache is cleared, so get from database
Prime uses hydrators for fill entities properties from database results, and also for methods Model::import()
and Model::export()
.
Default implementation use reflection API for database to model transformation, and array cast + getter/setter call for Model::import()
and Model::export()
methods.
So model structure is resolved dynamically at runtime which adds an overhead when loading hydrator and on each hydration.
To improve performance, hydrators can be generated, so each model will have its own hydrator class with entirely static resolution of properties.
To generates those hydrators, you can call command prime:hydrator
.
Generated hydrators can directly fill properties value without using a getter or setter if there are declared as public
or protected
.
Getters and setters will be called for private
properties.
Note: if you use symfony bundle, you simply have to generate hydrators using
prime:hydrators
command. Generated loader will be automatically used once it exists.
First you have to generate hydrators :
bin/console prime:hydrators -l var/generated/hydrators/loader.php -i src/Entity
Once executed, the file var/generated/hydrators/loader.php
and hydrator classes will be generated.
Now, you can use it, by including the loader file :
<?php
use Bdf\Prime\ConnectionManager;
use Bdf\Prime\ServiceLocator;
use Bdf\Prime\Mapper\MapperFactory;
use Bdf\Prime\Cache\ArrayCache;
$connections = new ConnectionManager();
$connections->declareConnection('DB', 'mysql://root@localhost/DB');
$prime = new ServiceLocator($connections);
(function () use($prime) {
// loader file will use $registry var name for register generated hydrators
$registry = $prime->hydrators();
// Include load to register hydrators
include __DIR__ . '/var/generated/hydrators/loader.php';
})();