Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi table transactions #77

Merged
merged 7 commits into from
Oct 16, 2024

Conversation

purkhusid
Copy link
Contributor

What?

Changes the transaction API to allow multi table transactions

How?

Instead of letting TransactWrite accept items with a specific record type I added a TransactWrite member to the table context.

This TransactWrite member has the Put/Update/Delete/Checkoprerations and replaces operations on the TransactWrite module that previously existed.

## What?
Changes the transaction API to allow multi table transactions

## How?
Instead of letting `TransactWrite` accept items with a specific record type I added a `TransactWrite` member to the table context.

This `TransactWrite` member has the `Put/Update/Delete/Check`oprerations and replaces operations on the `TransactWrite` module that previously existed.
@purkhusid
Copy link
Contributor Author

@samritchie @bartelink The changes currently expose the TransactWriteItem that is part of the AWS SDK. Do you prefer not to expose it and would you rather have a wrapper type that has the AWS SDK type as an internal member?

@purkhusid
Copy link
Contributor Author

The tests are also not done. Just wanted feedback on the approach before spending more time on this.

@bartelink
Copy link
Member

Hi Daniel, Firstly, thanks for contributing; I hope to get some time to look at it properly later.

The main thing is you have tests

I punted on doing multi-table originally, and didn't have any particular ideas on how it should be shaped as such

Breaking changes for the Transaction support isnt a problem from my point of view - I'll be able to do an update to Equinox.DynamoStore pretty quickly if anything needs to be moved (as opposed to bending over backwards and leaving a mess in the name of binary compatibility)

@purkhusid
Copy link
Contributor Author

purkhusid commented Aug 29, 2024

Hi Daniel, Firstly, thanks for contributing; I hope to get some time to look at it properly later.

The main thing is you have tests

I punted on doing multi-table originally, and didn't have any particular ideas on how it should be shaped as such

Breaking changes for the Transaction support isnt a problem from my point of view - I'll be able to do an update to Equinox.DynamoStore pretty quickly if anything needs to be moved (as opposed to bending over backwards and leaving a mess in the name of binary compatibility)

There is a test that shows the new API. This is of course a breaking change but I figured it might be acceptable since the transaction API in this library wasn't covering the whole functionality.

A similar approach could be used for the BatchWriteItem as well.

@samritchie
Copy link
Collaborator

It would be great to support this functionality. @bartelink probably has more to contribute here as he did the TransactWrite work.

Probably my main concern with the API is it being an instance method hanging off an arbitrary table (which hopefully has the same account/credentials as the items). I’m not sure how to resolve this without introducing a top level DB type though.

@purkhusid
Copy link
Contributor Author

It would be great to support this functionality. @bartelink probably has more to contribute here as he did the TransactWrite work.

Probably my main concern with the API is it being an instance method hanging off an arbitrary table (which hopefully has the same account/credentials as the items). I’m not sure how to resolve this without introducing a top level DB type though.

This was the least intrusive way I found without having to add a new type that only has a DynamoDB client but that also makes the UX worse since you will already have a TableContext with a DynamoDB client.

@samritchie
Copy link
Collaborator

samritchie commented Aug 29, 2024

You will already have multiple TableContexts with their own DynamoDB client, I’m wondering if it’s neater/more consistent/more obvious to just have an extra non-generic MultiTableContext that implements the Transact (& multi-table batch?) methods.

@purkhusid
Copy link
Contributor Author

You will already have multiple TableContexts with their own DynamoDB client, I’m wondering if it’s neater/more consistent/more obvious to just have an extra non-generic MultiTableContext that implements the Transact (& multi-table batch?) methods.

I'm not sure if i like that approach since it would just add one more thing that the end user has to create without any real value. The end user has already configured the context for each of his types and I think it would be a step backwards if they had to configure one more thing to use transactions.

I still agree that it is kind of wonky that you can use any TableContext to do the actual transaction but I think that is rooted in that this library kind of clashes with how the DynamoDB client is request based while this library is designed around a configured context.

Any concrete suggestions on how this API might work better are of course greatly appreciated.

@samritchie
Copy link
Collaborator

We could certainly add a convenience TableContext.MultiTableContext getter to minimise the ceremony for users that want to do ad hoc multi table transactions, but creating an additional context would not be too big a deal for people already managing a larger number of tables.

Probably another alternative would be creating a TransactionBuilder type and threading it through a fluent (or piped) chain.

For batch operations I wouldn’t want to lose the existing strongly typed single call methods so multi-table batch should be a new interface rather than a modification of the existing. Mixing generic & non-generic methods on a single context is a bit redundant & likely to confuse.

@purkhusid
Copy link
Contributor Author

purkhusid commented Aug 30, 2024

What about something like this:

namespace FSharp.AWS.DynamoDB

[<RequireQualifiedAccess>]
module TransactionBuilder =
    open Amazon.DynamoDBv2.Model
    open ExprCommon
    open Amazon.DynamoDBv2

    type Transaction = { RequestItems: (TransactWriteItem * IAmazonDynamoDB) seq }

    let createTransaction () : Transaction = { RequestItems = [] }

    let put
        (tableContext: TableContext<'TRecord>)
        (item: 'TRecord)
        (precondition: option<ConditionExpression<'TRecord>>)
        (transaction: Transaction)
        : Transaction =
        let req = Put(TableName = tableContext.TableName, Item = tableContext.Template.ToAttributeValues item)
        precondition
        |> Option.iter (fun cond ->
            let writer = AttributeWriter(req.ExpressionAttributeNames, req.ExpressionAttributeValues)
            req.ConditionExpression <- cond.Conditional.Write writer)
        { transaction with
            RequestItems = (Seq.append [ (TransactWriteItem(Put = req), tableContext.Client) ] transaction.RequestItems) }

    let check
        (tableContext: TableContext<'TRecord>)
        (key: TableKey)
        (condition: ConditionExpression<'TRecord>)
        (transaction: Transaction)
        : Transaction =
        let req = ConditionCheck(TableName = tableContext.TableName, Key = tableContext.Template.ToAttributeValues key)
        let writer = AttributeWriter(req.ExpressionAttributeNames, req.ExpressionAttributeValues)
        req.ConditionExpression <- condition.Conditional.Write writer
        { transaction with
            RequestItems = (Seq.append [ (TransactWriteItem(ConditionCheck = req), tableContext.Client) ] transaction.RequestItems) }

    let update
        (tableContext: TableContext<'TRecord>)
        (key: TableKey)
        (precondition: option<ConditionExpression<'TRecord>>)
        (updater: UpdateExpression<'TRecord>)
        (transaction: Transaction)
        : Transaction =
        let req = Update(TableName = tableContext.TableName, Key = tableContext.Template.ToAttributeValues key)
        let writer = AttributeWriter(req.ExpressionAttributeNames, req.ExpressionAttributeValues)
        req.UpdateExpression <- updater.UpdateOps.Write(writer)
        precondition |> Option.iter (fun cond -> req.ConditionExpression <- cond.Conditional.Write writer)
        { transaction with
            RequestItems = (Seq.append [ (TransactWriteItem(Update = req), tableContext.Client) ] transaction.RequestItems) }

    let delete
        (tableContext: TableContext<'TRecord>)
        (key: TableKey)
        (precondition: option<ConditionExpression<'TRecord>>)
        (transaction: Transaction)
        : Transaction =
        let req = Delete(TableName = tableContext.TableName, Key = tableContext.Template.ToAttributeValues key)
        precondition
        |> Option.iter (fun cond ->
            let writer = AttributeWriter(req.ExpressionAttributeNames, req.ExpressionAttributeValues)
            req.ConditionExpression <- cond.Conditional.Write writer)
        { transaction with
            RequestItems = (Seq.append [ (TransactWriteItem(Delete = req), tableContext.Client) ] transaction.RequestItems) }

    let writeItems (clientRequestToken: string option) (transaction: Transaction) : Async<unit> = async {
        if (Seq.length transaction.RequestItems) = 0 || (Seq.length transaction.RequestItems) > 100 then
            raise
            <| System.ArgumentOutOfRangeException(nameof transaction.RequestItems, "must be between 1 and 100 items.")
        let req =
            TransactWriteItemsRequest(
                ReturnConsumedCapacity = returnConsumedCapacity,
                TransactItems = (transaction.RequestItems |> Seq.map (fun i -> fst i) |> ResizeArray)
            )
        clientRequestToken |> Option.iter (fun x -> req.ClientRequestToken <- x)
        let! ct = Async.CancellationToken
        let! response = (snd (Seq.head transaction.RequestItems)).TransactWriteItemsAsync(req, ct) |> Async.AwaitTaskCorrect
        maybeReport
        |> Option.iter (fun r -> r TransactWriteItems (Seq.toList response.ConsumedCapacity) (Seq.length items))
        if response.HttpStatusCode <> HttpStatusCode.OK then
            failwithf "TransactWriteItems request returned error %O" response.HttpStatusCode
    }

module Usage =
    let request =
        TransactionBuilder.createTransaction ()
        |> TransactionBuilder.put ctx1 { Name = "John"; Age = 30 } None
        |> TransactionBuilder.put ctx1 { Name = "Jane"; Age = 25 } None
        |> TransactionBuilder.put ctx2 { SomethingElse = "Hello" } None
        |> TransactionBuilder.writeItems None
        |> Async.RunSynchronously

@samritchie
Copy link
Collaborator

I think TransactionBuilder is definitely better than methods on the context. My main concern here is the rest of the public API is method-based rather than modules & functions - there’s value in keeping it consistent/predictable.

@purkhusid
Copy link
Contributor Author

I think TransactionBuilder is definitely better than methods on the context. My main concern here is the rest of the public API is method-based rather than modules & functions - there’s value in keeping it consistent/predictable.

Yeah, I figured that using modules and functions was not really consistent with how the rest of the library operates. I'll create a new class that has methods instead and the class can keep the state internally.

@purkhusid
Copy link
Contributor Author

@samritchie I changed the API to use a class instead of a module. Please take a look when you have the time.

@purkhusid
Copy link
Contributor Author

@samritchie @bartelink Friendly reminder, would appreciate a review if you have some spare time for it.

@samritchie
Copy link
Collaborator

Thanks @purkhusid for your efforts here.

I’m happy with this design, just a bit unsure about naming (not wanting to start an endless 'naming things' discussion but would like to look at options). And yes, I know I suggested the name originally - it’s relevant that I felt this was the best name to communicate the concept.

However in .NET I’m expecting a type {Something}Builder to have a Build() method at the end of the chain (even though it’s unnecessary here). In F# I’m expecting a type {Something}Builder to be a CE.

A couple of other options off the top of my head:

    Transaction()
        .AddCheck(table1, key, existsConditionTable1)
        .AddPut(table2, item2) // or .WithCheck(), .WithPut()?
        .WriteItems()
    TransactWriteItems()
        .AppendCheck(table1, key, existsConditionTable1)
        .AppendPut(table2, item2)
        .Write()
    table1.BeginTransaction()
        .Check(table1, key, existsConditionTable1)
        .Put(table2, item2)
        .WriteTransaction()

I may be overthinking it, open to any sort of feedback here.

@purkhusid
Copy link
Contributor Author

I think just renaming the builder should be enough. The methods on the build er all follow the DynamoDB client naming which I think is preferable.
So I think I like it like this:

    table1.Transaction()
        .Check(table1, key, existsConditionTable1)
        .Put(table2, item2)
        .TransactWriteItems()

What do you think?

@samritchie
Copy link
Collaborator

@purkhusid yes - that looks good.

@purkhusid
Copy link
Contributor Author

@samritchie Updated the API

Copy link
Collaborator

@samritchie samritchie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good - are you able to update the readme showing the new usage? I will do the release notes with those couple of other changes.

Comment on lines 260 to 268
let Transaction = Transaction()

let requests =
[ transactionBuilder.Update(table1, key, compileUpdateTable1 <@ fun t -> { t with Value = 42 } @>, existsConditionTable1)
transactionBuilder.Put(table1, item2)
transactionBuilder.Put(table1, item3, doesntExistConditionTable1)
transactionBuilder.Delete(table1, table1.Template.ExtractKey item4, Some doesntExistConditionTable1)
transactionBuilder.Delete(table1, table1.Template.ExtractKey item5, None)
transactionBuilder.Check(
[ Transaction.Update(table1, key, compileUpdateTable1 <@ fun t -> { t with Value = 42 } @>, existsConditionTable1)
Transaction.Put(table1, item2)
Transaction.Put(table1, item3, doesntExistConditionTable1)
Transaction.Delete(table1, table1.Template.ExtractKey item4, Some doesntExistConditionTable1)
Transaction.Delete(table1, table1.Template.ExtractKey item5, None)
Transaction.Check(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value naming threw me here a little, should this be transaction instead of Transaction?

@@ -2023,7 +2023,7 @@ module TransactWriteItemsRequest =
| _ -> None


type TransactionBuilder(?metricsCollector: (RequestMetrics -> unit)) =
type Transaction(?metricsCollector: (RequestMetrics -> unit)) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a convenience .Transaction factory method on TableContext? It should be more discoverable, downside would be it may be less obvious why the table needs to be supplied...

@purkhusid
Copy link
Contributor Author

@samritchie Added docs and addressed your comments.

@samritchie samritchie merged commit f31ff7b into fsprojects:master Oct 16, 2024
4 checks passed
Copy link
Member

@bartelink bartelink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for being way late to the game here (I'm still on the busy side, but will try to be a bit more responsive tp any follow-ups)
I'd ideally like to see my suggestions considered as the rough edges (TransactWriteItemsRequest module name coming out of nowhere, verbing of Transaction, mysterious derivation of the client and linkage to the metrics collector) make for a confusing API.

Having the wiring of metrics and client be magic and silent (especially arbitrarily picking the client from the first operation added) is not helpful as nobody that's just using intellisense will even think about it

I'd like to ideally see the readme make it clear that the moving parts are:

  1. it needs a connection - either pass it direct,
  2. it will can consistenly convey metrics just like Table does

And then show two usage patterns:

  1. let xact = table.Transact() - borrows the client and collector from table. The XMLDOC should make that clear
  2. let xact = Transaction(client[, collector = emitStats)

The second one should not be the preferred API as it means people go off and write islands of code that don't get the RU costs logged. That's doubly ironic given TransactWriteItems API costs double the charges. In my opinion, the xmldoc warning that moved to TransactWriteItems should be replicated in part on the constructor or docs for the Transaction type. Yes I get the fact that only an idiot would not consider this stuff, but over time, the amount of tab-tabbing going on top spin up writes to tables will go up, and reading of the docs will go down.

@@ -2096,3 +2149,12 @@ module Scripting =
member t.UpdateProvisionedThroughput(provisionedThroughput: ProvisionedThroughput) : unit =
let spec = Throughput.Provisioned provisionedThroughput
t.UpdateTableIfRequiredAsync(spec) |> Async.Ignore |> Async.RunSynchronously

/// Helpers for working with <c>TransactWriteItemsRequest</c>
module TransactWriteItemsRequest =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be either renamed or namespaced within Transaction to align with the top level API that can cause it to happen

i.e. make this

module Transaction =

/// </summary>
/// <param name="clientRequestToken">The <c>ClientRequestToken</c> to supply as an idempotency key (10 minute window).</param>
member _.TransactWriteItems(?clientRequestToken) : Async<unit> = async {
if (Seq.length transactionItems) = 0 || (Seq.length transactionItems) > 100 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transactionItems.Count

if (Seq.length transactionItems) = 0 || (Seq.length transactionItems) > 100 then
raise
<| System.ArgumentOutOfRangeException(nameof transactionItems, "must be between 1 and 100 items.")
let req = TransactWriteItemsRequest(ReturnConsumedCapacity = returnConsumedCapacity, TransactItems = (ResizeArray transactionItems))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a ResizeArray clone? I'd have used it directly, or transactionItems.ToArray()

let writer = AttributeWriter(req.ExpressionAttributeNames, req.ExpressionAttributeValues)
req.ConditionExpression <- cond.Conditional.Write writer)
transactionItems.Add(TransactWriteItem(Put = req))
this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless there is an actual immutable builder thing going on, its confusing to return values that have to be ignored

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s mostly there because that’s what people generally do with C# builders. We haven’t called it a builder so happy to drop this.

let mutable (dynamoDbClient: IAmazonDynamoDB) = null

let setClient client =
if dynamoDbClient = null then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is bad news - better to be explicit that a transaction is going to need a client
Force people to either pick a relevant one,
or expose it on the Table so they can call get it from there
or have a table.TransactWriteItems(transaction) that makes it clear where it's coming from

Copy link
Collaborator

@samritchie samritchie Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went backwards and forwards on this a lot. I have struggled to get a good grasp of when people would use multiple clients (and how we would be able to check multiple clients are being used properly). If we need to make this explicit I would have preferred to go down the MultiTableContext path, but it seems like a lot of difficult-to-discover ceremony for a (presumably) vanishingly small number of API consumers.

Perhaps the best middle ground is new Transaction(client, metricsCollector) as well as table.NewTransaction()? The latter would work fine for the bulk of consumers that only have one client & don’t want to pass it around separately to the tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think both mechanisms is fine, as the 90% case is to do table.Transact/NewTransaction

Its more about Explicit Is Better than Explicit wrt the mechanism of how the client is derived when you create it independently

Don't like New, and I think FDG outlaws it, but I guess Transact is a bit much if it's not actually going to Transact directly - maybe CreateTransaction?
I was thinking about a Transaction.Attach(table) or a Transaction(table), but ultimately that coupling/semantic is confusing too.

Similarly BeginTransaction would be confusing as nothing is really beginning (and it's not beginning from the Table end

I guess this is why I punted on it all in the first instance.

For my own use, as long as there's a Tranaction ctor with a mandatory client arg and an optional ctor, I'll pass that around as a Func<Transaction> and wire it up in the same way that I have a Func<TableContext> now

I guess if there was a ClientContext that held the client and the collector, then you could pass that common thing to the TableContext and TransactionContext ctors, but that's a bigger job and it's a big jump from a library that had one abstraction (TableContext) to suddenly having 3. (Though the chances are it does make sense overall - in Equinox I have DynamoStoreClient that wraps the IAmazonDynamoDb, but its definitely at the other end of the separating things out vs whacking it all into one type spectrum)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateTransaction sounds good to me


let transaction = Transaction()

transaction.Check(table, key, doesntExistCondition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current API is returning this, which means this needs lots of ugly ignores

@@ -1402,6 +1318,141 @@ type TableContext<'TRecord>
else
t.VerifyTableAsync()

member t.Transaction() =
match metricsCollector with
| Some metricsCollector -> Transaction(metricsCollector = metricsCollector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transaction(?metricsCollector = metricsCollector)
also I would call this CreateTransaction or maybe Transact - verbing a noun is not very intuitive?

let ``Empty request list is rejected with AORE`` () =
shouldBeRejectedWithArgumentOutOfRangeException (Transaction())
|> Async.RunSynchronously
|> ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ignore needed? you can make the signature on L312 : unit as a safeguard

(
tableContext: TableContext<'TRecord>,
key: TableKey,
precondition: option<ConditionExpression<'TRecord>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to make this be ?precondition: ConditionExpression<'TRecord> and add an OptionalAttribute as that means
a) F# users can omit the argument
b) C# users can do the same

Same for the other args

@purkhusid
Copy link
Contributor Author

@bartelink Thanks for the review. These changes sound good to me if @samritchie likes them as well.

@samritchie
Copy link
Collaborator

Yes, that seems okay, we don’t have particularly rapid uptake of the new version. Re: metricsCollector I’d like to see this go eventually & expose the same data via OpenTelemetry - it was always a bit of a stop-gap solution.

@bartelink
Copy link
Member

Yeah, I've been hiding from the need to go OT in Equinox too - the core and the MessageDb piece are wired, but the Cosmos, Dynamo and ESDB drivers are not yet. Using that does nicely sidestep my earlier non-proposal of having something to wrap the client and collector.

@samritchie
Copy link
Collaborator

There’s AWS SDK OpenTelemetry instrumentation in beta, but I’ve had a look and it’s not very useful; one of the perils of a codegen SDK. Having something implemented here would be more useful to more people. I’ll have a look & see if any other dynamo clients have good naming conventions to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants