WIP: Snapshot create and restore support for projections #277

robertlemke · 2020-08-21T12:06:32Z

interface for createSnapshot() and restoreSnapshot()
CLI commands for create snapshot and restore snapshot
trait for Doctrine based Projectors helping with creating and restoring of snapshots
persist Snapshot objects so that ProjectionManager and others can look up which snapshots (should) exist and retrieve metadata
implemente deleting of snapshots
provide unit tests for all important parts

Ideas for later:

automatically create snapshots after each x events, configurable per projection
automatic clean up of old / not needed snapshots
create / restore snapshots for multiple projections at once
dump snapshots to and restore from file system / cloud storage, for quick disaster recovery

This is basically working, but the snapshot metadata is not persisted yet and thus one very concrete snapshot is currently hardcoded in ProjectionManager

robertlemke · 2020-08-21T12:19:44Z

Instructions:

choose a Doctrine-based projector
add implements SnapshotAwareProjectorInterface
implement the getTableNames() method
adjust the hardcoded Snapshot object in ProjectionManager->restoreSnapshot()
./flow projection:replay --maximum-sequence-number xxx yourprojection
./flow projection:createsnapshot yourprojection
see the new table(s) in the database, prefixed with "snap_"
./flow projection:catchup
check that your projection is up to date. See also the applied events table
./flow projection:restoresnapshot uuid-of-your-snapshot
your projection should be back to the xx sequence number

bwaidelich · 2020-08-21T14:56:06Z

Sorry for being the sceptic voice.. I probably over-react a little but I removed a lot of code from this package in order to keep it focused and scaleable so please bear with me and hear me out:

In event sourcing snapshotting is usually applied to aggregates as performance optimization, i.e. in order to speed up reconstitution that is needed for every mutation.
It adds a layer of complexity and potential bugs though, so it is generally advised to only be used if there are no better alternatives.
From http://cqrs.nu/Faq/event-sourcing:

An optimization where a snapshot of the aggregate's state is also saved (conceptually) in the event queue every so often, so that event application can start from the snapshot instead of from scratch. This can speed things up. [...]

Snapshotting has a number of drawbacks related to re-introducing current state in the database. Rather than assume you will need it, start without snapshotting, and add it only after profiling shows you that it will help.

Having said that I still think that it can be quite a useful tool but I wonder if the framework is a good place for the implementation.
I.e. it's totally possible to implement aggregate snapshots today like:

<?php
class SomeAggregateRepository {

  public function load(SomeAggregateId $id): SomeAggregate
  {
    $snapshot = $this->loadSnapshot($id);
    $eventStream = $this->eventStore->load($this->streamName($id), $snapshot->sequenceNumber());
    return SomeAggregate::reconstituteFromSnapshotAndEventStream($snapshot, $eventStream);
  }

  public function save(SomeAggregate $aggregate): void
  {
    $this->eventStore->commit($this->streamName($id), $aggregate->pullUncommittedEvents(), $aggregate->getReconstitutionVersion());
    if ($this->requiresSnapshot($aggregate)) {
      $this->saveSnapshot($aggregate);
    }
  }

  // ...
}

We should add a proper example to the documentation and/or provide a base implementation, but IMO this could be done in a separate package because most of the code will be specific to the aggregate implementation.

Now this PR is about snapshots for projections and – if I get it right – it serves a different scenario.
Maybe you thought it all through but I got the feeling that we might need to discuss the scenario a bit more before we can agree on a solution.

A snapshot as described above doesn't make sense for projections IMO because the events are usually only applied once and the projection is only replayed if its implementation changed (which renders any previous snapshot invalid most probably).
From looking at the code (and from what we discussed the other day) I think, this one is more about a backup mechanism that allows to restore a previous projection state for debugging purposes.

With #269 this should already be possible via

./flow projection:replay projection --maximumSequenceNumber 123

and this PR is "merely" there to speed up that process, right?

bwaidelich · 2020-08-21T15:02:27Z

Sorry if the above didn't sound constructive, I was mainly trying to turn my initial thoughts into writing.

TL;DR: I'm not sure about the new interfaces and traits and would prefer a first implementation in the affected model. E.g. for the event-sourced Content Repository two new CLI commands

./flow contentsubgraph:backup
./flow contentsubgraph:restore

..at least until we learned more about it :)

albe · 2020-08-23T12:45:09Z

@bwaidelich that were exactly my thoughts when I saw the PR and thought about it :)

robertlemke · 2020-08-24T07:17:09Z

@bwaidelich thanks for your comments. I guess it's mostly a misunderstanding.

The motivation for this feature is mostly an optimization for debugging projections. It can be implemented as a separate package, if you like. It has nothing to do and does not aim for providing something like snapshots for aggregates.

Consider having an event store full of 1.000.000 events and trying to debug / improve a specific projection. You'll end up in a replay - check database - catch up - adjust code - replay -loop. In my actual case a replay of the respective projection takes a few minutes. Now, if I could create a snapshot at event 990.000 and reset to / catch from there, this would be a big time-saver.

If that is a too exotic feature, just let me know. @skurfuerst asked me to implement it for the ESCR development.

skurfuerst · 2021-01-19T12:35:42Z

diff --git a/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Projection/GraphProjector.php b/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Projection/GraphProjector.php
index 57b5196..80b6429 100644
--- a/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Projection/GraphProjector.php
+++ b/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Projection/GraphProjector.php
@@ -37,6 +37,8 @@ use Neos\EventSourcedContentRepository\Domain\Context\NodeAggregate\Event\RootNo
 use Neos\EventSourcedContentRepository\Domain\Context\NodeAggregate\NodeAggregateClassification;
 use Neos\EventSourcedContentRepository\Domain\ValueObject\SerializedPropertyValues;
 use Neos\EventSourcedContentRepository\Infrastructure\Projection\AbstractProcessedEventsAwareProjector;
+use Neos\EventSourcing\Projection\DoctrineProjectorSnapshotTrait;
+use Neos\EventSourcing\Projection\SnapshotAwareProjectorInterface;
 use Neos\Flow\Annotations as Flow;

 /**
@@ -44,11 +46,12 @@ use Neos\Flow\Annotations as Flow;
  *
  * @Flow\Scope("singleton")
  */
-class GraphProjector extends AbstractProcessedEventsAwareProjector
+class GraphProjector extends AbstractProcessedEventsAwareProjector implements SnapshotAwareProjectorInterface
 {
     use RestrictionRelations;
     use NodeRemoval;
     use NodeMove;
+    use DoctrineProjectorSnapshotTrait;

     const RELATION_DEFAULT_OFFSET = 128;

@@ -58,6 +61,16 @@ class GraphProjector extends AbstractProcessedEventsAwareProjector
      */
     protected $projectionContentGraph;

+    protected static function getTableNames(): array
+    {
+        return [
+            'neos_contentgraph_node',
+            'neos_contentgraph_hierarchyrelation',
+            'neos_contentgraph_referencerelation',
+            'neos_contentgraph_restrictionrelation'
+        ];
+    }
+
     /**
      * @throws \Throwable
      */
diff --git a/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Repository/ContentGraph.php b/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Repository/ContentGraph.php
index 1b247b1..b61dbd1 100644
--- a/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Repository/ContentGraph.php
+++ b/Neos.ContentGraph.DoctrineDbalAdapter/Classes/Domain/Repository/ContentGraph.php
@@ -169,7 +169,7 @@ final class ContentGraph implements ContentGraphInterface
      */
     public function findNodeAggregateByIdentifier(
         ContentStreamIdentifier $contentStreamIdentifier,
-        NodeAggregateIdentifier $nodeAggregateIdentifier
+            NodeAggregateIdentifier $nodeAggregateIdentifier
     ): ?NodeAggregate {
         $connection = $this->client->getConnection();

bwaidelich · 2021-07-23T14:47:18Z

This one got stuck after I shared my concerns. Now, almost one year later, I still think that "snapshots" should be avoided if possible (and apparently I'm not the only one, see https://twitter.com/mat_mcloughlin/status/1418191418309451780) and certainly don't be a part of the core functionality.

Having said that, of course that's just my personal opinion and – more importantly – it doesn't mean that we should not make debugging large projections easier and faster!
I'm just convinced that it is much easier to solve this for a concrete scenario rather than with a generic pattern.

btw: I shared the twitter link above not to "prove my point" (and it doesn't for all cases) but it contains an interesting idea: How can we prevent (too) long living streams?

WIP: Snapshot create and restore support for projections

467f584

This is basically working, but the snapshot metadata is not persisted yet and thus one very concrete snapshot is currently hardcoded in ProjectionManager

robertlemke marked this pull request as draft August 21, 2020 12:06

skurfuerst mentioned this pull request Aug 21, 2020

Debug Tools for Projections neos/contentrepository-development-collection#148

Open

4 tasks

robertlemke added the ⭐ Feature label Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Snapshot create and restore support for projections #277

WIP: Snapshot create and restore support for projections #277

robertlemke commented Aug 21, 2020 •

edited

Loading

robertlemke commented Aug 21, 2020

bwaidelich commented Aug 21, 2020

bwaidelich commented Aug 21, 2020

albe commented Aug 23, 2020

robertlemke commented Aug 24, 2020

skurfuerst commented Jan 19, 2021

bwaidelich commented Jul 23, 2021 •

edited

Loading

WIP: Snapshot create and restore support for projections #277

Are you sure you want to change the base?

WIP: Snapshot create and restore support for projections #277

Conversation

robertlemke commented Aug 21, 2020 • edited Loading

robertlemke commented Aug 21, 2020

bwaidelich commented Aug 21, 2020

bwaidelich commented Aug 21, 2020

albe commented Aug 23, 2020

robertlemke commented Aug 24, 2020

skurfuerst commented Jan 19, 2021

bwaidelich commented Jul 23, 2021 • edited Loading

robertlemke commented Aug 21, 2020 •

edited

Loading

bwaidelich commented Jul 23, 2021 •

edited

Loading