Skip to content

Consider FLEncoder based approach for setting document properties #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
blaugold opened this issue Mar 24, 2022 · 4 comments
Open

Consider FLEncoder based approach for setting document properties #289

blaugold opened this issue Mar 24, 2022 · 4 comments
Labels
💪 enhancement New feature or request

Comments

@blaugold
Copy link
Contributor

Actually, there is something else that I have been meaning to bring up relating to document properties and Fleece.

I have basically ported the Fleece integration layer (MValue, MCollection, etc.) to Dart to efficiently read and write document properties from Dart. This means that I don't use of mutable Fleece collections and instead pass a FLEncoder to MRoot to serialize the current state of the document properties into Fleece data.

The current CBL C API does not support this workflow in an optimal way because there is no way to directly set document properties from Fleece data. I work around this by creating a FLDoc from the Fleece data and setting its root value as the new document properties. The downside of this approach is that the document properties are encoded a second time when the document is actually saved.

I would be great if a FLEncoder based workflow for setting document properties was fully supported.

Originally posted by @blaugold in #286 (comment)

@blaugold
Copy link
Contributor Author

blaugold commented Mar 24, 2022

@pasin wrote in #286 (comment):

I don't know dart and limitation or inefficiency around wrapping around FLDict or FLMutableDict versus implementing mutable fleece in dart. Does this also mean you need to serialize FLDict to return from C to Dart when calling CBLDocument_Properties(CBLDocument*)?

I don't make the conversion from the FLDict to Dart objects in one step, and not all at once. I pass the FLDict pointer from CBLDocument_Properties to Dart and then use the Fleece API to lazily read Fleece data and convert it into Dart objects when specific dict entries or array elements are accessed.

I could imagine an API similar to this one:

FLEncoder* CBLDatabase_DocumentEncoder(CBLDatabase *db);
void CBLDocument_SetPropertiesFromEncoder(CBLDocument *doc, FLEncoder *enc);

And this is how it would be used (for brevity I omitted error checks of FLEncoder calls):

auto enc = CBLDatabase_DocumentEncoder(db);
FLEncoder_Reset(enc);
FLEncoder_BeginDict(enc, 1);
FLEncoder_WriteKey(enc, FLSTR("A"));
FLEncoder_WriteString(enc, FLSTR("B"));
FLEncoder_EndDict(enc);
auto doc = CBLDocument_Create();
CBLDocument_SetPropertiesFromEncoder(doc, enc);

@pasin pasin added the 💪 enhancement New feature or request label Mar 24, 2022
@blaugold
Copy link
Contributor Author

From looking at the implementation, the API below seems realistic to me:

/** The callback of a \ref CBLDocumentPropertiesEncoder that is invoked when the properties
    of the document need to be encoded to Fleece.
    The callback must write the document properties to the provided encoder but must not
    call \ref FLEncoder_Finish. If for any reason, the callback is unable to encode the
    document properties, it must return false.
    @note  The callback might be invoked from any thread.
    @note  The encoded root value must be a dictionary.
    @note  The callback may only use the Fleece API and must not use the CBL API.
    @note  The callback must not hold onto the \ref encoder.
    @note  The callback must not use \ref FLEncoder_Free, \ref FLEncoder_GetExtraInfo,
           \ref FLEncoder_SetExtraInfo or \ref FLEncoder_SetSharedKeys.
    @param context  The arbitrary value provided in \ref CBLDocumentPropertiesEncoder.
    @param encoder  The encoder to use for encoding the document properties to Fleece.
    @return  Whether the document properties could be fully encoded. */
typedef bool (*CBLDocumentPropertiesEncoderCallback)(void* context, FLEncoder encoder);

/** Encodes an external representation of a document's properties by writing to a \ref FLEncoder.

    This provides an alternative to building up a document's properties through mutable Fleece
    collections.

    If a properties encoder has been set for a document (\ref CBLDocument_SetPropertiesEncoder),
    the properties encoder's callback (\ref CBLDocumentPropertiesEncoderCallback) will be invoked
    as part of saving the document, to encode the current state of the document's properties.
    The callback receives an arbitrary context value that should be used to give the callback access
    to the external representation of the document's properties.

    While a properties encoder is set for a document, changes to the document's properties
    through other APIs will be overridden by the properties encoder when the document is saved.
    Directly after a document has been saved it's properties reflect the state of the external
    representation of the document's properties.
    */
struct CBLDocumentPropertiesEncoder {
    /** Callback that is invoked when the properties of the document need to be encoded to Fleece. */
    CBLDocumentPropertiesEncoderCallback callback;
    /** Arbitrary value that is passed to the callback. */
    void* context;
};

/** Sets a document's \ref CBLDocumentPropertiesEncoder.
    @param encoder  The properties encoder to use for the document, or NULL to remove the current
                    one. The provided properties encoder is copied. */
void CBLDocument_SetPropertiesEncoder(CBLDocument*,
                                      CBLDocumentPropertiesEncoder* _cbl_nullable encoder) CBLAPI;

/** Returns a document's current \ref CBLDocumentPropertiesEncoder.
    May be NULL, meaning that the document has no properties encoder.
    @warning  When the document is released or a new properties encoder is set, this reference
              becomes invalid. */
CBLDocumentPropertiesEncoder* _cbl_nullable CBLDocument_GetPropertiesEncoder(CBLDocument*) CBLAPI;

/** Writes a \ref CBLBlob to the encoder */
void FLEncoder_WriteBlob(FLEncoder, CBLBlob *blob) CBLAPI;

/** Writes a \ref CBLEncryptable to the encoder */
void FLEncoder_WriteEncryptable(FLEncoder, CBLEncryptable *encryptable) CBLAPI;

This API allows users to pass an FLEncoder through data structures to encode them to Fleece. That in itself is from my experience ~2x faster than building up mutable Fleece collections. Additionally, the mutable Fleece collections don't have to be written to Fleece binary in a separate step.

Especially for large documents, this API would allow for significantly more efficient save operations.

@pasin @borrrden I'd be happy to work on this if you think this is a reasonable addition to the API.

@pasin
Copy link
Contributor

pasin commented Sep 27, 2022

@blaugold First of all , thanks for the suggestion. I have a few comments on this.

  1. The suggested API could introduce inconsistency to the document properties after setting the encoder as the doc properties do not represent what will get saved any more. The actual properties of the document will be the result of the encoder callback when the document is saved.

  2. Will it work the same if having the API like :

// Note: data will be converted to properties' dict if properties is accessed after:
bool CBLDocument_SetFleeceData(CBLDocument*, FLSlice data, CBLError* _cbl_nullable outError) CBLAPI;
  1. With both suggested APIs, I'm seeing a problem around the process of checking blobs and validating Encryptables in an array. I'm talking about the process here. Installing blobs and validating Encryptables in arrays might be done when encoding I guessed, but setting the attachment flag is questionable. Because of this, it makes me worry that the suggested APIs might have the problem in the future, if we (really) need to have additional validation of the content of the document before save.

@blaugold
Copy link
Contributor Author

blaugold commented Sep 30, 2022

Thanks for taking a look @pasin.

The suggested API could introduce inconsistency to the document properties after setting the encoder as the doc properties do not represent what will get saved any more. The actual properties of the document will be the result of the encoder callback when the document is saved.

If someone uses a model (external representation) to access the document properties, I would expect that they usually decode their model from Fleece after loading a document, set a CBLDocumentPropertiesEncoder for the document, with the model as the context, and continue using the model instead of reading from the document properties through the Fleece API.

To ensure consistency, if a CBLDocumentPropertiesEncoder has been set, it could be used to encode the model when the document properties are accessed and the CBLDocumentPropertiesEncoder could be removed subsequently, putting the document back into the normal mode.

CBLDocumentPropertiesEncoder should probably also have the option to specify a dispose function that is called when the document removes a CBLDocumentPropertiesEncoder or the document is released and has a CBLDocumentPropertiesEncoder set.

  1. Will it work the same if having the API like :
// Note: data will be converted to properties' dict if properties is accessed after:
bool CBLDocument_SetFleeceData(CBLDocument*, FLSlice data, CBLError* _cbl_nullable outError) CBLAPI;

The goal of the suggested API is to avoid unnecessary work when encoding model data to Fleece. The fastest way to do this is to pass a FLEncoder through the model data structure. What complicates this approach is that we really want to use the shared encoder of the database so that the database shared keys are used during encoding. This also requires that the encoding happens in the database transaction of the save operation because only during that phase can shared keys be added. That's why the alternative API won't help. We would have to reencode the Fleece data when saving the document to make sure it uses the shared keys of the database.

  1. With both suggested APIs, I'm seeing a problem around the process of checking blobs and validating Encryptables in an array. I'm talking about the process here. Installing blobs and validating Encryptables in arrays might be done when encoding I guessed, but setting the attachment flag is questionable. Because of this, it makes me worry that the suggested APIs might have the problem in the future, if we (really) need to have additional validation of the content of the document before save.

The suggested FLEncoder_WriteBlob and FLEncoder_WriteEncryptable functions are just there for convenience. After the CBLDocumentPropertiesEncoderCallback is done encoding, the resulting Fleece data would always have to be traversed to do the checks you mentioned. Users could write Blobs and Encryptables shaped data to the encoder instead of using FLEncoder_WriteBlob and FLEncoder_WriteEncryptable but we still need to detect those Blobs for the attachment flag and validate Encryptables.

Ideally, the Encoder could be extended to do those checks on the fly, but I'm not too concerned about performance here because reading Fleece is so fast.

As a side note, my idea was that FLEncoder_SetExtraInfo could be used to provide FLEncoder_WriteBlob with the context necessary to install blobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants