Skip to content

Optimisations

Vincent QUATREVIEUX edited this page Nov 29, 2022 · 1 revision

Optimise prime

This page shows how to optimise performance of the prime ORM.

Use optimised queries

Some queries are more optimised than the default Query implementation.

Find by primary key

If you simply want to load an entity by its primary key without any additional filters, use RepositoryQueryFactory::findById() instead of Query::get(), or manually apply primary key filters. The query instance, and also it's prepared statement be reused on future call, and use KeyValueQuery implementation, providing better compiler performance.

Note: This method cannot be used if you need extra filters on the query.

Example:

// Ok
User::findById(42);
User::repository()->findById(42);
User::findById(42)?->load('groups'); // You can load relations using load() method

// Not optimised
User::builder()->get(42);
User::get(42);
User::where('id', 42)->first();
User::with('groups')->get(42); // with() is not more optimised than load() in case of single entity

// Cannot be optimised properly

// Use extra filter
User::where('roles', (new Like(5))->searchableArray())->get(42); 
// Filter can be performed in PHP, but this way will perform unnecessary load of an entity if the filter do not match
// Also performing filter on a relation is counterproductive
$user = ($user = User::findById(42)) && in_array($user->roles(), 5) ? $user : null;

// There is no optimised equivalent implemented natively
User::getOrFail(42);
// You can manually throw the exception
User::findById(42) ?? throw new EntityNotFoundException();

Use KeyValueQuery when possible

For perform simply filter operations, KeyValueQuery will perform better than base Query implementation. This query can be used if :

  • Entity has no constraints
  • Filter with equality on entity fields (or use filters at all)
  • Do not filter through relations (i.e. do not perform join)
  • Do not filter on NULL values
  • Only use AND filters combination

Unlike Query, calling where() multiple times on the same field will replace the compared value (i.e. change bind value) instead of adding a new filter. So this query can be reused multiple times with different bound values.

Note: Because of the implementation of Walker, iterate over large number of entities if less performant on KeyValueQuery than on Query. So, even if the query can be executed by KeyValueQuery, prefer use Query in this specific case (i.e. walk on very large table).

// Optimised
User::keyValue()->all(); // No filters at all
User::keyValue('login', 'robert')->first(); // Single filter
User::keyValue(['login' => 'robert', 'enabled' => true])->first(); // Also works with multiple AND filters
Order::keyValue()->where('userId', 42)->sum('price'); // Aggregation methods are available
User::keyValue('enabled', false)->update(['password' => null]); // Update with simple filter
User::keyValue('enabled', false)->delete(); // Delete with simple filter
User::keyValue()->with('groups')->where('enabled', true)->getOrFail(42); // query extensions methods are also available

// Reuse a query instance, by just changing a bound value
$query = Post::keyValue('threadId', 42)->where('postNumber', 26);
$posts = []

while ($post = $query->first()) {
    $posts[] = $post;
    $query->where('postNumber', $post->quotedNumber());
}

// Not optimised
User::builder()->all();
User::where('login', 'robert')->first();
User::where(['login' => 'robert', 'enabled' => true])->first();
User::where('enabled', false)->update(['password' => null]);
User::with('groups')->where('enabled', true)->getOrFail(42);

// Default query builder instance cannot be reused
$posts = [];
$postNumber = 26;

while ($post = Post::where('threadId', 42)->where('postNumber', $postNumber)->first()) {
    $posts[] = $post;
    $postNumber = $post->quotedNumber();
}

// Cannot be optimised properly
foreach (User::walk() as $user) { /* ... */ } // Walker is not optimised on KeyValueQuery on large table. Prefer use default query when table has more than 10k rows
User::builder()->where('updatedAt', null)->all(); // null comparison is not possible
User::builder()->where('createdAt', '>=', new DateTime('2022-10-05'))->all(); // any operator other than equality are not supported
User::builder()->where('groups.id', 5)->all(); // Join and relations are not supported
User::builder()->where('login', 'robert')->orWhere('name', 'robert')->all(); // OR is also not supported

Use EntityRepository or RepositoryQueryFactory utility methods

EntityRepository and RepositoryQueryFactory provides some utility methods which use optimised queries and reuse prepared statements :

  • EntityRepository::exists(object $entity) : Check existence of an entity from database
  • EntityRepository::refresh(object $entity) : Retrieve database value of the given entity. Internally use findById Note: optimised query is not use when extra filters is given at second parameter.
  • RepositoryQueryFactory::countKeyValue() : Count number of entities using KeyValueQuery. Generated query will be reused.
// Optimised
User::countKeyValue(); // Count all users
User::countKeyValue('login', 'robert') > 0; // Check existence of user with login = robert
User::exists($user); // Check existence of user in database
User::refresh($user); // Retrieve user from database

// Not optimised
User::count(); // Do not use optimised query
User::where('login', 'robert')->count() > 0; // Use default Query
User::keyValue('login', 'robert')->count() > 0; // Do not cache query instance and prepared statement, so performance is lower on multiple calls
User::where('id', $user->id())->count() > 0; // Use default Query
User::keyValue('id', $user->id())->count() > 0; // Do not cache query instance and prepared statement
User::get($user->id()); // Do not use optimised query

// Cannot be optimised properly
User::where('groups.id', 5)->count(); // Use relation filter
User::count(['name :like' => 'foo%']); // Use custom operator
User::refresh($user, ['enabled' => true]); // Custom constraint will disable optimisation

N+1 problem

The N+1 problem occurs when one (or more) query is executed on each result of a first query. More generally, N+1 problem occurs when a query is executed an arbitrary number of times into a loop.

Example:

// Execute a first query
foreach (User::where('name', ':like', 'a%') as $user) {
    // Load groups of the user, so execute one query on each user
    $user->load('groups');
}

// Execute a query for each role of the user
$user = User::findById(42);
$roles = array_map(fn (int $id) => Role::findById($id), $user->roles());

This problem cannot be completely solved, but some utilities are provided to mitigate some cases.

Loading relations

The simplest way of resolving N+1 problem with relation loading is to call with() method on the query, with all required relations when requesting multiple entities.

Note: this may need some changed on the entity structure to allow attached relations.

// Ok
foreach (User::with(['groups', 'administrated'])->where($filters) as $user) {
    // Note: be sure that the called function do not load relations
    $this->doSomeJob($user);
}

// Bad
foreach (User::where($filters) as $user) {
    $this->doSomeJob($user->load('groups'), $user->relation('administrated')->all());
}

Refreshing or loading entities

Sometimes you need to load multiple entities by their primary key, or simply refreshing them. To perform this operation in one query you can use :

  • RepositoryQueryFactory::entities() which refresh a list of entities
  • where() primary key in filter
$user = User::findById(42);

// Ok
$roles = Role::where('id', $user->roles())->all(); // equality operator with an array will be automatically converted to "IN" filter

// Do some long work...

// Refresh all entities in one query
$refreshed = Role::entities($roles);
$hasChanged = $roles == $refreshed; // So we can check if there is a change

// Bad
$roles = array_map(fn (int $id) => Role::findById($id), $user->roles());
$hasChanged = false;

foreach ($roles as $role) {
    if ($role != Role::refresh($role)) {
        $hasChanged = true;
        break;
    }
}

EntityCollection

EntityCollection allows to perform some bulk operations with a minimal number of queries :

  • EntityCollection::load() Load relations on each entity of the collection. Will perform like Query::with() method.
  • EntityCollection::link() Get a Query object requesting the given relation for all entities on the collection. This is equivalent of calling EntityRelation::query() on each entity, and merging results.
  • EntityCollection::query() Get a Query object pointing to all entities of the collection. This is equivalent of calling: MyEntity::where('pk', 'in', $ids).
  • EntityCollection::refresh() Refresh all entities of the collections. This is equivalent of calling RepositoryQueryFactory::entities() but on a collection object.
  • EntityCollection::update() Perform a bulk update on each entity. Note: this method will perform a low level update query, so updated event will not be triggered.

This collection can be created by calling Query::wrapAs(EntityCollection::class) or passing array of entities as parameter of method EntityRepository::collection().

$users = User::where($filters)->wrapAs(EntityCollection::class)->all();
$users = User::collection(User::where($filters)->all());

// Ok
$users->load('groups'); // If you don't have access to query builder instance, you can load relations without N+1 by using EntityCollection after base query execution.
$groups = $users->link('administrated')->where('name', ':like', 'a%')->all(); // Query all relations in a single query.
$hasChanged = clone $users == $users->refresh(); // Retrieve database version of all entities and check if there is a change.
$users->update(['password' => null]); // Perform update query on all entities. Be aware: behaviors will be skipped, because no events will be triggered !

// Bad
foreach ($users as $user) { $user->load('groups'); }
$groups = array_merge(...array_map(fn ($user) => $user->relation('administrated')->where('name', ':like', 'a%')->all(), $users));

Declare indexes on tables

See How MySQL Uses Indexes for more information. Using an index of table is a powerful mechanism for optimise perform search operations. Indexes are declared on mapper.

Consider adding an index when :

  • Table is large and queries are slow.
  • Same filters are applied often. In this case you can add compound index on fields used by all filters.
  • Requested value is mostly unique (adding an index on a boolean field is counterproductive).
  • A field is often used for sort results.
  • In case of compound indexes, use the most unique field first.

Do not add an index when :

  • Only few values are available for the given field (ex: boolean or enum).
  • Perform search string operation other than starts with (i.e. LIKE %xxx% is not optimised).
  • Filter with primary key, or any other unique field (unique index will be used).
  • Too many indexes. Keep only most efficient ones.

Note: Use index will decrease write performance, but increase read one. So you need to find the best balance between read and write performance.

You can also use an external indexer for querying, and synchronize entities asynchronously to get the best read and write performances. See b2pweb/bdf-prime-indexer for elasticsearch usage on prime.

// "login" is already marked as unique, so explicitly adding an index is unnecessary
User::keyValue('login', $login)->first();

// "name" field is almost unique, and a "starts with" filter is performed, so adding an index can improve performance
// Adding an index on "enabled" is not recommended because has only two distinct values
User::where('name', ':like', 'a%')->where('enabled', true)->all();

// Here index on "name" cannot be used, so declaring it is not useful
// But result should be sorted bby "createdAt", so you can add an index on this field
User::where('name', ':like', '%a%')->order('createdAt')->all();

Use cache

Prime has two caches :

  • Metadata cache, which stores built metadata of mapper classes. So when used, mapper builders will be configured once, and all built metadata, like fields, indexes or relations will be stored into the cache. This cache use PSR-16 simple cache.
  • Query cache, which stores query results. So when used on a given query, and if the query generates same SQL and has same binding multiple times, the cached result will be returned. This cache use CacheInterface type.

Declaration

All cache instances should be passed at MapperFactory constructor. Once declared, metadata cache will be automatically used.

<?php
use Bdf\Prime\ConnectionManager;
use Bdf\Prime\ServiceLocator;
use Bdf\Prime\Mapper\MapperFactory;
use Bdf\Prime\Cache\ArrayCache;

$connections = new ConnectionManager();
$connections->declareConnection('DB', 'mysql://root@localhost/DB');

$prime = new ServiceLocator($connections, new MapperFactory(
    null,
    new FilesystemCache(__DIR__.'/var/cache'), // Metadata cache. You can use any implementation of PSR-16 simple cache.
    new ArrayCache() // Query cache. It's recommended to use a memory cache (i.e. non persistent cache)
));

Query cache

Query cache has to be enabled manually by calling Cachable::useCache() on query. Once activated, when same query will be executed multiple times, the same result will be returned from cache.

Note: when a write operation is performed on a table, all cache related to this table will be cleared.

Because this cache handle dynamic values, it must have a short lifetime and have a limited scope. So it's recommended to not enable this cache on long-life scripts like workers, and always use memory cache to limit the scope.

Note: The query cache will only slightly improve read performance. So enable query cache only if you are in situation where the same query is called multiple times, and you can't remove those calls.

User::where('name', ':like', 'a%')->useCache()->all(); // Get actual database result abd store into cache
User::where('name', ':like', 'a%')->all(); // Get actual database result
User::where('name', ':like', 'a%')->useCache()->all(); // Get cached result
User::where('name', ':like', 'b%')->useCache()->all(); // Different binding, so get from database
User::findById(42)->setPassword(null)->save(); // Trigger a write operation, caches will be cleared
User::where('name', ':like', 'a%')->useCache()->all(); // Cache is cleared, so get from database

Generated hydrators

Prime uses hydrators for fill entities properties from database results, and also for methods Model::import() and Model::export(). Default implementation use reflection API for database to model transformation, and array cast + getter/setter call for Model::import() and Model::export() methods. So model structure is resolved dynamically at runtime which adds an overhead when loading hydrator and on each hydration.

To improve performance, hydrators can be generated, so each model will have its own hydrator class with entirely static resolution of properties. To generates those hydrators, you can call command prime:hydrator.

Generated hydrators can directly fill properties value without using a getter or setter if there are declared as public or protected. Getters and setters will be called for private properties.

Usage

Note: if you use symfony bundle, you simply have to generate hydrators using prime:hydrators command. Generated loader will be automatically used once it exists.

First you have to generate hydrators :

bin/console prime:hydrators -l var/generated/hydrators/loader.php -i src/Entity

Once executed, the file var/generated/hydrators/loader.php and hydrator classes will be generated.

Now, you can use it, by including the loader file :

<?php
use Bdf\Prime\ConnectionManager;
use Bdf\Prime\ServiceLocator;
use Bdf\Prime\Mapper\MapperFactory;
use Bdf\Prime\Cache\ArrayCache;

$connections = new ConnectionManager();
$connections->declareConnection('DB', 'mysql://root@localhost/DB');

$prime = new ServiceLocator($connections);

(function () use($prime) {
    // loader file will use $registry var name for register generated hydrators
    $registry = $prime->hydrators();
    
    // Include load to register hydrators
    include __DIR__ . '/var/generated/hydrators/loader.php';
})();