Pagination of long transaction list responses #348

colatkinson · 2018-01-12T08:56:26Z

Currently, if the server generates an extremely long response, it simply returns an error message, such as

{ message: 'response too large (at least 1021611 bytes)', code: -32600 }

If the client receives this message, it will just hang indefinitely while attempting to sync.

Proposed Solution

Implement pagination in relevant methods, using block heights as a range. This has a couple advantages:

Allow handling of very active addresses
During sync, allow clients to only fetch transactions after where they left off, instead of fetching the entirety of their history. Same goes for getting tx inputs. This could conceivably lead to issues with a reorg, but the client could just fetch transactions starting from last height - n where n is e.g. 6

Affected RPC Methods

blockchain.address.get_history
blockchain.address.get_mempool
blockchain.address.listunspent
blockchain.scripthash.get_history
blockchain.scripthash.listunspent

It seems to me like the simplest solution would be to bump the protocol version number, and add optional pagination parameters to each of these methods. The version bump would allow clients to detect support for this, and thus be able to opt to get only the new txs during sync.

The text was updated successfully, but these errors were encountered:

racquemis · 2018-01-15T12:41:24Z

Just here to say i support this idea. I'm currently running into problems because of addresses with thousands of transactions in it. If we had the ability to fetch get_history in pages it would also be easier on mobile wallets during import.

joesixpack · 2018-01-15T17:59:55Z

This needs to be implemented ASAP. Blockchain isn't getting any smaller.

AdamSEY · 2018-02-01T10:17:05Z

I highly support this feature.

luggs-co · 2018-02-06T04:40:12Z

I also support the implementation of this feature,

kyuupichan · 2018-03-01T04:17:07Z

I've had some thoughts about doing history sensibly. Once I've fleshed them out I'll revisit this. I would want to require the new API for large histories and simply reject any such requests to the old API.

kyuupichan · 2018-08-07T01:22:13Z

@erasmospunk @ecdsa @SomberNight @colatkinson @racquemis @joesixpack @AdamSEY @luggs-co

Pleaes leave any comments you have on my proposal to solve this, which I have started to implement:

http://electrumx.readthedocs.io/en/latest/protocol-changes.html#version-1-5

SomberNight · 2018-08-07T23:19:38Z

As @bauerj mentioned on slack, using the last confirmed tx hash as status has the downside of it being very hard for clients to detect lying by omission. Currently, if a server does not tell a client about a tx, the client would at least realise when it connects to another server, as the address statuses (which is a hash of the whole history) would mismatch. However, if there is a confirmed tx "after" the missing tx, then the new style statuses would no longer mismatch.

I understand the main motivation for changing how status is defined is that it has a linear time complexity (in terms of size of the whole history for the scripthash), and that using the last confirmed tx hash only requires constant time.

An alternative construction could be status := H(status || H(tx || height)), where for each confirmed tx for a given scripthash, in blockchain order, you iteratively hash the current status concatted with some hash that describes this tx. (to bootstrap, you could set status to all zeroes for example)

This too would have constant time complexity to update from previous status to new status when a new txn comes. However you would need an extra constant size storage to save the status itself for a scripthash.
There are further space-time tradeoffs one can make, such as,

to save space, set a threshold k of minimum txns associated with a scripthash before you start saving the status for the scripthash (below that, you always recompute)
reorgs are expensive with a naive implementation. you might need to recompute the whole status. to avoid this, you could delay saving the status for a given scripthash by m blocks -- but then you also need to store some extra info to know for what tx/height the stored status is for

kyuupichan · 2018-08-08T04:24:38Z

Even those ideas require a server to read the whole history of any client connecting to it at least once, and as you say maintain a cache. I think the whole approach is suboptimal and unnecessary. Its only redeeming feature is its easier for client devs to work with.

I want the client to bear more of the burden of it being SPV. It is going to have to become a bit smarter, but that is putting the burden in the right place - it is the SPV, and so it is responsible for convincing itself it has got the truth. I'd like to see more ideas on improving this client-side without burdening servers with cumbersome protocols.

SomberNight · 2018-08-08T09:32:01Z

Even those ideas require a server to read the whole history of any client connecting to it at least once, and as you say maintain a cache.

No, you can persist it into the DB. You don't need to load the whole history when the client connects. During initial sync, you process the transaction anyway; at that time you can compute the status.
If you set e.g. k=5, the extra storage requirements should be negligible (I would guess 1 GB max for current chain, maybe a lot less).

I think the whole approach is suboptimal

I don't think there is an optimal solution as there are multiple non-independent things to optimise for.

kyuupichan · 2018-08-08T12:02:04Z

I think you misunderstood me

I don't want to persist caches to the DB. Caches are a great source of cache invalidation bugs, plus they take up space. Space will be at a premium for blockchain indices; they will be huge.
to get a cache in the first place, persisted or not, you need to read the whole history.

I don't want that to be the default behaviour the client requires of any server it connects to just to find out what happened since last week. It should have a confidence level to some height and only be requesting data from that height. It will increase its height of confidence based on results it receives from various sources to various queries, based on a statistical analysis of data it collects. Clients will be motivated to not make expensive requests, because they will become ... expensive. The user will have an option to force-refresh one or more addresses if they "appear wrong".

There is a lot of scope client-side for being smart and figuring out strategies in having confidence in served data. Lying isn't easy or free. I have several ideas but I'm sure others can come up with better ones. At present no one is even trying because the status quo is offloading the costs to server operators and it's easy to argue for no change. Also today running a server is fairly inexpensive.

I'm planning for data served becoming big, not staying small. It will inevitably become commercial and very likely be costed if used beyond a certain introductory level, because it won't be cheap to maintain indices of everything and serve them for free.

erasmospunk · 2018-08-10T10:37:18Z

@kyuupichan @SomberNight what if the status was a hash of the transaction id plus the number of transactions in that address (something like status := H(last_tx_id || num_of_txs)))?

It would still need to read the transaction numbers but it doesn't need to resolve the tx hashes (that would read amplify). Something like this would do the job:

def get_tx_len(self, hashX):
    size = 0
    for key, hist in self.db.iterator(prefix=hashX):
        size += len(hist)
    assert size % 4 == 0
    return size // 4

kyuupichan · 2018-08-10T10:47:12Z

@erasmospunk let me think about it. I think my new history DB format that I'm intending to roll out soon actually stores the total TX count in the main history entry of a hashX if the history "overflows" about 10-ish transactions. If so determining length would be a single read, and another single read for the final tx number, and then a filesystem-lookup for the tx_hash. I might consider storing the final tx number in large histories in the main entry too....

kyuupichan · 2018-08-10T10:56:50Z

@SomberNight what do you think of erasmospunk's suggestion?

kyuupichan · 2018-08-10T10:59:33Z

@erasmospunk any reason to not simply return a pair: (tx_hash, tx_count)? I'm not sure if the height of the tx is interesting or useful for a client

SomberNight · 2018-08-10T16:29:51Z

@erasmospunk @kyuupichan I think it's a good idea. I would prefer it over the original suggestion, as having the tx count there too would resolve the naive case of lying by omission (regardless of it being intentional or accidental omission).

Not sure if the final tx_height really matters; I think it could be omitted.
(tx_hash, tx_count) looks ok

sergeylappo · 2018-08-19T08:37:19Z

@kyuupichan, but current implementation would not solve initial problem.
If I would like to sync client from 0 - I would be most probably unable as most it would overcome limit. Furthermore, user might have more than limit in one block.
Maybe it would be better to allow clients to download transactions not since block, but n transactions since tx_hash (optional). If no tx_hash passed download since first transaction.

SomberNight · 2018-08-19T09:41:59Z

Furthermore, user might have more than limit in one block.

@sergeylappo I don't think it's realistic or practical to care about the case where a single scripthash (not all scripthashes of user!) has more than ~10k transactions in a single block... ScriptPubKeys are not meant to be reused anyway.

If I would like to sync client from 0 - I would be most probably unable as most it would overcome limit.

Which limit? The per session one? No problem; the client would just reconnect, or connect to a different server and continue syncing there. The client needs to get the same amount of information in the end; there is not much to do about that.

sergeylappo · 2018-08-19T13:10:42Z

@SomberNight, { message: 'response too large (at least 1021611 bytes)', code: -32600 } this limit.
Initially suggested implementation does not solve issue in any way.
It would allow to sync "only latest", which, certainly is an improvement over what it is now, but if you want to sync from scratch it's impossible.
Just a couple of addresses for testing:
3D2oetdNuZUqQHPJmcMDDHYoqkyNVsFk9r
1NDyJtNTjmwk5xPNhjgAMu4HDHigtobu1s - 296143 transactions
These are also good to test 'blockchain.transaction.get' with verbose mode with default environment settings.
Limit is also not enough.

kyuupichan · 2018-08-19T13:15:30Z

@sergeylappo @SomberNight was correct. I'm not sure you're understanding the logic. It wouldn't send you more than 1MB of tx hashes and heights at a time, but that's not a problem - if there are more, you just get them on the next request, starting where the previous one left off. You can sync Satoshi Dice's busiest address from scratch.

No one block has over 10,000 txs touching 1 address. 10,000 txs fit in a single reply.

sergeylappo · 2018-08-19T20:29:06Z

@kyuupichan, yes misunderstood happened on my side. Everything is fine, sorry.

erasmospunk · 2018-08-21T13:44:45Z

One thing that came up is a client that does not watch the block headers would lose the ability to detect reorgs where a previous transaction is replaced by a lower value one.

kyuupichan · 2018-08-21T18:18:18Z

Yes - but I consider that part of the wallet's job. A wallet will have the state as of a height / block hash. It will need to detect if that block is no longer on the main chain, figure out the reorg height, and remove part of the state accordingly, and then rebuild state from the reorg height. This is not very hard nor resource intensive. It doesn't even require a headers subscription.

We should be designing a protocol that puts computation and responsibilities on the right side of the connection; offloading everything to the server is not going to work as users and blockchains grow.

markblundeberg · 2018-09-15T20:44:38Z

With the more pagination approach it seems to me that the client has a race condition where they don't know for sure whether the transaction history corresponds to the current chaintip (suppose a reorg happens in-between getting history and getting the chaintip).

I would suggest the more parameter be replaced with something like up_to_blockhash. Instead of more="false" you would put in the current chaintip, whereas instead of more="true" you would put the block hash for the last included transaction.

kyuupichan · 2018-09-15T22:02:16Z

A smart client has sufficient information to resolve all races itself. ElectrumX has the same issue with bitcoind and its mempool APIs, and doesn't get confused. My inclination is to put this burden on the client.

That said, I'm happy to replace heights with tip hashes in various APIs if you can be more specific and convincing that it would be helpful.

Giszmo · 2019-03-13T15:03:45Z

What is the status of this? https://electrumx.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-history sounds like it should work already but ...

$ (echo '[{"id":"1.5","method":"server.version","params":{"protocol_version":"1.5"}},{"id":"1.4","method":"server.version","params":{"protocol_version":"1.4"}}]'; sleep 1) | ncat --ssl $myserver 9335 | jq
[
  {
    "jsonrpc": "2.0",
    "error": {
      "code": 1,
      "message": "unsupported protocol version: 1.5"
    },
    "id": "1.5"
  },
  {
    "jsonrpc": "2.0",
    "result": [
      "ElectrumX 1.9.5",
      "1.4"
    ],
    "id": "1.4"
  }
]

I would really need this to work. :/

SomberNight · 2019-03-13T15:23:54Z

What is the status of this?

https://electrumx.readthedocs.io/en/latest/protocol-changes.html#version-1-5

This is a draft of ideas for protocol 1.5; they are not implemented

electrumx/electrumx/server/session.py

Lines 704 to 705 in 10a0026

    
           PROTOCOL_MIN = (1, 2) 
        
           PROTOCOL_MAX = (1, 4, 1)

Giszmo · 2019-03-13T16:01:03Z

Thanks for answering but it would be really helpful to know when to expect a fix. I need to tackle this somehow asap and could work on it but I would much rather help on a branch that others are already working on. Where is the magic happening?

Giszmo · 2019-06-24T18:51:47Z

For the record, the "draft of ideas for protocol 1.5" moved to ideas for protocol 2.0. And my question stands: Is anybody working on this? Would it make sense to start implementing this? Doesn't look too complicated but I'm reluctant to maintain my own version of electrumx if upstream is not going to include it before next year or so.

SomberNight · 2019-06-25T16:55:11Z

I don't think anyone is working on this atm. Electrum does not need it. The benefits are nice but marginal when compared to the work needed on both the client and server side to adapt. I think @kyuupichan is planning to do it at some point.

kyuupichan · 2020-05-27T13:36:57Z

Protocol will change going forwards

Giszmo · 2020-05-27T14:57:02Z

Protocol will change going forwards

What does that mean? Is the issue resolved? If it will be resolved going forward, why did you close it already? Where can I help implement and test if the solution works for us?

kyuupichan · 2020-05-27T16:42:59Z

ElectrumX is now only for Bitcoin and the protocol will be quite different

jlopp · 2020-05-28T00:57:36Z

Good night, sweet prince.

SomberNight · 2020-05-28T15:29:27Z

We will maintain a fork of ElectrumX that supports Bitcoin (BTC). Help is welcome.

janoside · 2020-05-28T15:44:27Z

Ok, I'm confused, probably because I'm missing some context. @kyuupichan @SomberNight - both referring to "Bitcoin"? Is someone pulling some "Bitcoin Cash is Bitcoin" nonsense? I see the line "For a future network with bigger blocks." in the project readme, so...I'm guessing I need to move over to @SomberNight for continued BTC support, correct?

ecdsa · 2020-05-28T16:33:25Z

indeed, @kyuupichan means "Bitcoin SV"

rockstardev · 2020-05-28T16:35:31Z

h/t @SomberNight @ecdsa @bauerj

mooleshacat · 2020-06-30T21:49:11Z

indeed, @kyuupichan means "Bitcoin SV"

To be clear, in other words, it is NOT bitcoin.

SomberNight · 2020-12-20T18:37:20Z

For anyone interested, we have a proposed protocol version 1.5 that includes pagination of long histories at spesmilo/electrumx#90

Feedback welcome there.

mkpuig · 2022-09-01T11:47:54Z

Hello, is there any progress about to include this on protocol 1.5? It's becoming a must have to avoid utxo/history access lost.
Thank you in advance!

ecdsa · 2022-09-04T08:08:23Z

@mkpuig this project was moved to https://github.com/spesmilo/electrumx
we are planning to add pagination to the next major release.

Tmanaccountig · 2024-06-18T00:07:13Z

what would be wrong with instructing it to break it down into managable bytes and reassembly post receipt but prior to the release?

kyuupichan mentioned this issue Mar 1, 2018

Implement some kind of incremental history fetch in the protocol #28

Closed

kyuupichan mentioned this issue Apr 9, 2018

Database format does not allow pruning #82

Closed

kyuupichan added the enhancement label Jul 18, 2018

erasmospunk mentioned this issue Aug 9, 2018

Subscription status remains the same for very busy addresses #566

Closed

buttha mentioned this issue Oct 12, 2018

Large electrum responses are rejected janoside/btc-rpc-explorer#67

Open

ArtemGr mentioned this issue Apr 21, 2020

Request the transaction history from UTXO RPCs only when balance is changed. KomodoPlatform/komodo-defi-framework#612

Closed

kyuupichan closed this as completed May 27, 2020

This was referenced Dec 14, 2020

improve wallet synchronization spesmilo/electrum#2094

Open

docs: describe protocol version 2.0 spesmilo/electrumx#90

Open

dubuqingfeng mentioned this issue Mar 5, 2021

response too large (over 1,000,000 bytes btccom/wallet-recovery-data-bridge#13

Closed

gabbieianatangan05 mentioned this issue Dec 3, 2021

this is an electrumx problem related: janoside/btc-rpc-explorer#391

Closed

Pagination of long transaction list responses #348

Pagination of long transaction list responses #348

Comments

colatkinson commented Jan 12, 2018

Proposed Solution

Affected RPC Methods

racquemis commented Jan 15, 2018

joesixpack commented Jan 15, 2018 • edited Loading

AdamSEY commented Feb 1, 2018

luggs-co commented Feb 6, 2018

kyuupichan commented Mar 1, 2018

kyuupichan commented Aug 7, 2018

SomberNight commented Aug 7, 2018

kyuupichan commented Aug 8, 2018

SomberNight commented Aug 8, 2018

kyuupichan commented Aug 8, 2018 • edited Loading

erasmospunk commented Aug 10, 2018

kyuupichan commented Aug 10, 2018

kyuupichan commented Aug 10, 2018

kyuupichan commented Aug 10, 2018

SomberNight commented Aug 10, 2018

sergeylappo commented Aug 19, 2018

SomberNight commented Aug 19, 2018

sergeylappo commented Aug 19, 2018

kyuupichan commented Aug 19, 2018 • edited Loading

sergeylappo commented Aug 19, 2018

erasmospunk commented Aug 21, 2018

kyuupichan commented Aug 21, 2018 • edited Loading

markblundeberg commented Sep 15, 2018 • edited Loading

kyuupichan commented Sep 15, 2018

Giszmo commented Mar 13, 2019

SomberNight commented Mar 13, 2019

Giszmo commented Mar 13, 2019

Giszmo commented Jun 24, 2019

SomberNight commented Jun 25, 2019

kyuupichan commented May 27, 2020

Giszmo commented May 27, 2020

kyuupichan commented May 27, 2020

jlopp commented May 28, 2020

SomberNight commented May 28, 2020 • edited Loading

janoside commented May 28, 2020

ecdsa commented May 28, 2020

rockstardev commented May 28, 2020

mooleshacat commented Jun 30, 2020

SomberNight commented Dec 20, 2020

mkpuig commented Sep 1, 2022 • edited Loading

ecdsa commented Sep 4, 2022

Tmanaccountig commented Jun 18, 2024

joesixpack commented Jan 15, 2018 •

edited

Loading

kyuupichan commented Aug 8, 2018 •

edited

Loading

kyuupichan commented Aug 19, 2018 •

edited

Loading

kyuupichan commented Aug 21, 2018 •

edited

Loading

markblundeberg commented Sep 15, 2018 •

edited

Loading

SomberNight commented May 28, 2020 •

edited

Loading

mkpuig commented Sep 1, 2022 •

edited

Loading