Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination of long transaction list responses #348

Closed
colatkinson opened this issue Jan 12, 2018 · 42 comments
Closed

Pagination of long transaction list responses #348

colatkinson opened this issue Jan 12, 2018 · 42 comments

Comments

@colatkinson
Copy link

Currently, if the server generates an extremely long response, it simply returns an error message, such as

{ message: 'response too large (at least 1021611 bytes)', code: -32600 }

If the client receives this message, it will just hang indefinitely while attempting to sync.

Proposed Solution

Implement pagination in relevant methods, using block heights as a range. This has a couple advantages:

  1. Allow handling of very active addresses
  2. During sync, allow clients to only fetch transactions after where they left off, instead of fetching the entirety of their history. Same goes for getting tx inputs. This could conceivably lead to issues with a reorg, but the client could just fetch transactions starting from last height - n where n is e.g. 6

Affected RPC Methods

  • blockchain.address.get_history
  • blockchain.address.get_mempool
  • blockchain.address.listunspent
  • blockchain.scripthash.get_history
  • blockchain.scripthash.listunspent

It seems to me like the simplest solution would be to bump the protocol version number, and add optional pagination parameters to each of these methods. The version bump would allow clients to detect support for this, and thus be able to opt to get only the new txs during sync.

@racquemis
Copy link

Just here to say i support this idea. I'm currently running into problems because of addresses with thousands of transactions in it. If we had the ability to fetch get_history in pages it would also be easier on mobile wallets during import.

@joesixpack
Copy link
Contributor

joesixpack commented Jan 15, 2018

This needs to be implemented ASAP. Blockchain isn't getting any smaller.

@AdamSEY
Copy link

AdamSEY commented Feb 1, 2018

I highly support this feature.

@luggs-co
Copy link
Contributor

luggs-co commented Feb 6, 2018

I also support the implementation of this feature,

@kyuupichan
Copy link
Owner

I've had some thoughts about doing history sensibly. Once I've fleshed them out I'll revisit this. I would want to require the new API for large histories and simply reject any such requests to the old API.

@kyuupichan
Copy link
Owner

@erasmospunk @ecdsa @SomberNight @colatkinson @racquemis @joesixpack @AdamSEY @luggs-co

Pleaes leave any comments you have on my proposal to solve this, which I have started to implement:

http://electrumx.readthedocs.io/en/latest/protocol-changes.html#version-1-5

@SomberNight
Copy link
Contributor

As @bauerj mentioned on slack, using the last confirmed tx hash as status has the downside of it being very hard for clients to detect lying by omission. Currently, if a server does not tell a client about a tx, the client would at least realise when it connects to another server, as the address statuses (which is a hash of the whole history) would mismatch. However, if there is a confirmed tx "after" the missing tx, then the new style statuses would no longer mismatch.

I understand the main motivation for changing how status is defined is that it has a linear time complexity (in terms of size of the whole history for the scripthash), and that using the last confirmed tx hash only requires constant time.

An alternative construction could be status := H(status || H(tx || height)), where for each confirmed tx for a given scripthash, in blockchain order, you iteratively hash the current status concatted with some hash that describes this tx. (to bootstrap, you could set status to all zeroes for example)

This too would have constant time complexity to update from previous status to new status when a new txn comes. However you would need an extra constant size storage to save the status itself for a scripthash.
There are further space-time tradeoffs one can make, such as,

  • to save space, set a threshold k of minimum txns associated with a scripthash before you start saving the status for the scripthash (below that, you always recompute)
  • reorgs are expensive with a naive implementation. you might need to recompute the whole status. to avoid this, you could delay saving the status for a given scripthash by m blocks -- but then you also need to store some extra info to know for what tx/height the stored status is for

@kyuupichan
Copy link
Owner

Even those ideas require a server to read the whole history of any client connecting to it at least once, and as you say maintain a cache. I think the whole approach is suboptimal and unnecessary. Its only redeeming feature is its easier for client devs to work with.

I want the client to bear more of the burden of it being SPV. It is going to have to become a bit smarter, but that is putting the burden in the right place - it is the SPV, and so it is responsible for convincing itself it has got the truth. I'd like to see more ideas on improving this client-side without burdening servers with cumbersome protocols.

@SomberNight
Copy link
Contributor

Even those ideas require a server to read the whole history of any client connecting to it at least once, and as you say maintain a cache.

No, you can persist it into the DB. You don't need to load the whole history when the client connects. During initial sync, you process the transaction anyway; at that time you can compute the status.
If you set e.g. k=5, the extra storage requirements should be negligible (I would guess 1 GB max for current chain, maybe a lot less).

I think the whole approach is suboptimal

I don't think there is an optimal solution as there are multiple non-independent things to optimise for.

@kyuupichan
Copy link
Owner

kyuupichan commented Aug 8, 2018

I think you misunderstood me

  • I don't want to persist caches to the DB. Caches are a great source of cache invalidation bugs, plus they take up space. Space will be at a premium for blockchain indices; they will be huge.
  • to get a cache in the first place, persisted or not, you need to read the whole history.

I don't want that to be the default behaviour the client requires of any server it connects to just to find out what happened since last week. It should have a confidence level to some height and only be requesting data from that height. It will increase its height of confidence based on results it receives from various sources to various queries, based on a statistical analysis of data it collects. Clients will be motivated to not make expensive requests, because they will become ... expensive. The user will have an option to force-refresh one or more addresses if they "appear wrong".

There is a lot of scope client-side for being smart and figuring out strategies in having confidence in served data. Lying isn't easy or free. I have several ideas but I'm sure others can come up with better ones. At present no one is even trying because the status quo is offloading the costs to server operators and it's easy to argue for no change. Also today running a server is fairly inexpensive.

I'm planning for data served becoming big, not staying small. It will inevitably become commercial and very likely be costed if used beyond a certain introductory level, because it won't be cheap to maintain indices of everything and serve them for free.

@erasmospunk
Copy link
Contributor

@kyuupichan @SomberNight what if the status was a hash of the transaction id plus the number of transactions in that address (something like status := H(last_tx_id || num_of_txs)))?

It would still need to read the transaction numbers but it doesn't need to resolve the tx hashes (that would read amplify). Something like this would do the job:

def get_tx_len(self, hashX):
    size = 0
    for key, hist in self.db.iterator(prefix=hashX):
        size += len(hist)
    assert size % 4 == 0
    return size // 4

@kyuupichan
Copy link
Owner

@erasmospunk let me think about it. I think my new history DB format that I'm intending to roll out soon actually stores the total TX count in the main history entry of a hashX if the history "overflows" about 10-ish transactions. If so determining length would be a single read, and another single read for the final tx number, and then a filesystem-lookup for the tx_hash. I might consider storing the final tx number in large histories in the main entry too....

@kyuupichan
Copy link
Owner

@SomberNight what do you think of erasmospunk's suggestion?

@kyuupichan
Copy link
Owner

@erasmospunk any reason to not simply return a pair: (tx_hash, tx_count)? I'm not sure if the height of the tx is interesting or useful for a client

@SomberNight
Copy link
Contributor

@erasmospunk @kyuupichan I think it's a good idea. I would prefer it over the original suggestion, as having the tx count there too would resolve the naive case of lying by omission (regardless of it being intentional or accidental omission).

Not sure if the final tx_height really matters; I think it could be omitted.
(tx_hash, tx_count) looks ok

@sergeylappo
Copy link

@kyuupichan, but current implementation would not solve initial problem.
If I would like to sync client from 0 - I would be most probably unable as most it would overcome limit. Furthermore, user might have more than limit in one block.
Maybe it would be better to allow clients to download transactions not since block, but n transactions since tx_hash (optional). If no tx_hash passed download since first transaction.

@SomberNight
Copy link
Contributor

Furthermore, user might have more than limit in one block.

@sergeylappo I don't think it's realistic or practical to care about the case where a single scripthash (not all scripthashes of user!) has more than ~10k transactions in a single block... ScriptPubKeys are not meant to be reused anyway.

If I would like to sync client from 0 - I would be most probably unable as most it would overcome limit.

Which limit? The per session one? No problem; the client would just reconnect, or connect to a different server and continue syncing there. The client needs to get the same amount of information in the end; there is not much to do about that.

@sergeylappo
Copy link

@SomberNight, { message: 'response too large (at least 1021611 bytes)', code: -32600 } this limit.
Initially suggested implementation does not solve issue in any way.
It would allow to sync "only latest", which, certainly is an improvement over what it is now, but if you want to sync from scratch it's impossible.
Just a couple of addresses for testing:
3D2oetdNuZUqQHPJmcMDDHYoqkyNVsFk9r
1NDyJtNTjmwk5xPNhjgAMu4HDHigtobu1s - 296143 transactions
These are also good to test 'blockchain.transaction.get' with verbose mode with default environment settings.
Limit is also not enough.

@kyuupichan
Copy link
Owner

kyuupichan commented Aug 19, 2018

@sergeylappo @SomberNight was correct. I'm not sure you're understanding the logic. It wouldn't send you more than 1MB of tx hashes and heights at a time, but that's not a problem - if there are more, you just get them on the next request, starting where the previous one left off. You can sync Satoshi Dice's busiest address from scratch.

No one block has over 10,000 txs touching 1 address. 10,000 txs fit in a single reply.

@sergeylappo
Copy link

@kyuupichan, yes misunderstood happened on my side. Everything is fine, sorry.

@erasmospunk
Copy link
Contributor

One thing that came up is a client that does not watch the block headers would lose the ability to detect reorgs where a previous transaction is replaced by a lower value one.

@kyuupichan
Copy link
Owner

kyuupichan commented Aug 21, 2018

Yes - but I consider that part of the wallet's job. A wallet will have the state as of a height / block hash. It will need to detect if that block is no longer on the main chain, figure out the reorg height, and remove part of the state accordingly, and then rebuild state from the reorg height. This is not very hard nor resource intensive. It doesn't even require a headers subscription.

We should be designing a protocol that puts computation and responsibilities on the right side of the connection; offloading everything to the server is not going to work as users and blockchains grow.

@markblundeberg
Copy link

markblundeberg commented Sep 15, 2018

With the more pagination approach it seems to me that the client has a race condition where they don't know for sure whether the transaction history corresponds to the current chaintip (suppose a reorg happens in-between getting history and getting the chaintip).

I would suggest the more parameter be replaced with something like up_to_blockhash. Instead of more="false" you would put in the current chaintip, whereas instead of more="true" you would put the block hash for the last included transaction.

@kyuupichan
Copy link
Owner

A smart client has sufficient information to resolve all races itself. ElectrumX has the same issue with bitcoind and its mempool APIs, and doesn't get confused. My inclination is to put this burden on the client.

That said, I'm happy to replace heights with tip hashes in various APIs if you can be more specific and convincing that it would be helpful.

@Giszmo
Copy link

Giszmo commented Mar 13, 2019

What is the status of this? https://electrumx.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-history sounds like it should work already but ...

$ (echo '[{"id":"1.5","method":"server.version","params":{"protocol_version":"1.5"}},{"id":"1.4","method":"server.version","params":{"protocol_version":"1.4"}}]'; sleep 1) | ncat --ssl $myserver 9335 | jq
[
  {
    "jsonrpc": "2.0",
    "error": {
      "code": 1,
      "message": "unsupported protocol version: 1.5"
    },
    "id": "1.5"
  },
  {
    "jsonrpc": "2.0",
    "result": [
      "ElectrumX 1.9.5",
      "1.4"
    ],
    "id": "1.4"
  }
]

I would really need this to work. :/

@SomberNight
Copy link
Contributor

What is the status of this?

https://electrumx.readthedocs.io/en/latest/protocol-changes.html#version-1-5

This is a draft of ideas for protocol 1.5; they are not implemented

PROTOCOL_MIN = (1, 2)
PROTOCOL_MAX = (1, 4, 1)

@Giszmo
Copy link

Giszmo commented Mar 13, 2019

Thanks for answering but it would be really helpful to know when to expect a fix. I need to tackle this somehow asap and could work on it but I would much rather help on a branch that others are already working on. Where is the magic happening?

@Giszmo
Copy link

Giszmo commented Jun 24, 2019

For the record, the "draft of ideas for protocol 1.5" moved to ideas for protocol 2.0. And my question stands: Is anybody working on this? Would it make sense to start implementing this? Doesn't look too complicated but I'm reluctant to maintain my own version of electrumx if upstream is not going to include it before next year or so.

@SomberNight
Copy link
Contributor

I don't think anyone is working on this atm. Electrum does not need it. The benefits are nice but marginal when compared to the work needed on both the client and server side to adapt. I think @kyuupichan is planning to do it at some point.

@kyuupichan
Copy link
Owner

Protocol will change going forwards

@Giszmo
Copy link

Giszmo commented May 27, 2020

Protocol will change going forwards

What does that mean? Is the issue resolved? If it will be resolved going forward, why did you close it already? Where can I help implement and test if the solution works for us?

@kyuupichan
Copy link
Owner

ElectrumX is now only for Bitcoin and the protocol will be quite different

@jlopp
Copy link

jlopp commented May 28, 2020

Good night, sweet prince.

@SomberNight
Copy link
Contributor

SomberNight commented May 28, 2020

We will maintain a fork of ElectrumX that supports Bitcoin (BTC). Help is welcome.

@janoside
Copy link

Ok, I'm confused, probably because I'm missing some context. @kyuupichan @SomberNight - both referring to "Bitcoin"? Is someone pulling some "Bitcoin Cash is Bitcoin" nonsense? I see the line "For a future network with bigger blocks." in the project readme, so...I'm guessing I need to move over to @SomberNight for continued BTC support, correct?

@ecdsa
Copy link
Contributor

ecdsa commented May 28, 2020

indeed, @kyuupichan means "Bitcoin SV"

@rockstardev
Copy link

h/t @SomberNight @ecdsa @bauerj

@mooleshacat
Copy link

indeed, @kyuupichan means "Bitcoin SV"

To be clear, in other words, it is NOT bitcoin.

@SomberNight
Copy link
Contributor

For anyone interested, we have a proposed protocol version 1.5 that includes pagination of long histories at spesmilo/electrumx#90

Feedback welcome there.

@mkpuig
Copy link

mkpuig commented Sep 1, 2022

Hello, is there any progress about to include this on protocol 1.5? It's becoming a must have to avoid utxo/history access lost.
Thank you in advance!

@ecdsa
Copy link
Contributor

ecdsa commented Sep 4, 2022

@mkpuig this project was moved to https://github.com/spesmilo/electrumx
we are planning to add pagination to the next major release.

@Tmanaccountig
Copy link

what would be wrong with instructing it to break it down into managable bytes and reassembly post receipt but prior to the release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests