-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing information about indexes required for indexer's endpoints #79
Comments
Thanks for the in-depth analysis! We do have some code for setting up indexes. I think we can add these there. |
There are two remarks.
|
How does the |
Adding an index or changing its type should not change the semantics of a SQL query. Standard indexes support fast searching of open-ended patterns
Without the index endpoint takes around 60 second to complete. With the index it takes about 3-6 seconds. Transctions for block height 2210429 - query results with no index:
Transctions for block height 2210429 - query results with the
That difference between results is acceptable because the query execution with an index may deliver results in different order than no index query execution and there is no additional constrain on the results order. |
Hi @blazesal, thank you very much for opening this ticket with valuable analyses, insights and recommendations. I wanted to let you know that this is something we've been discussing internally. As you've suggested the GIN index is a very big aid to the query planner for running a search query like the one you're focusing on here. But unfortunately, that index can hurt as much as it helps for search queries with different characteristics. Let me expand; Consider the long-running query you've identified in your issue description. In particular the However, imagine the opposite case of a search query that will pick millions of rows. Like "give me any transaction that has "transfer" in it". Currently, such a query runs fast, because the query planner will just walk backwards over the height index of the transactions table and it will quickly find, say 100 transactions (for a request with Just wanted to let you know that this problem with the opposite case is what held us back from applying your insightful suggestion right away. |
As identified by @blazesal in #79, our `/txs/tx` endpoint requires a requestkey index on the events and transactions tables, otherwise each request results in a table scan. I've just tried creating these indexes manually and checking their sizes; As of now, the requestkey index on the transactions table takes up 603MB of space while the one on events takes up 879MB. For comparison, these total space occupied by these two tables is ~46GB.
As identified by @blazesal in #79, our `/txs/tx` endpoint requires a requestkey index on the events and transactions tables, otherwise each request results in a table scan. I've just tried creating these indexes manually and checking their sizes; As of now, the requestkey index on the transactions table takes up 603MB of space while the one on events takes up 879MB. For comparison, these total space occupied by these two tables is ~46GB.
@blazesal Note that the |
Indexer performance problems
Investigation
Naming
In the description below following names are being used for:
postgres
,chainweb-data
,db_user
.Increase debug query size
Increase the query size, so it is possible to display large queries:
Intercept long-running query
Invoke the indexer's endpoint, which fails to return the data quickly:
Connect to the database and display the running SQL queries:
Then analyze the query:
Analysis gives the following query plan:
The line
-> Parallel Seq Scan on transactions t0 (cost=0.00..554104.98 rows=213 width=1056) (actual time=521.416..2078.174 rows=27 loops=3)"
indicates that the whole table needs to get sequentially scanned (
Seq Scan
) because of the lack of appropriate index on thecode
column.After adding the necessary index the query plan is the following:
The
Seq Scan
is now gone in favour of scanning the indexBitmap Index Scan on ""transactions-code""
.Solution
The discussion about the right index supporting generic
LIKE
queries is here:https://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations
In summary, the use of GIN or GiST trigram index with the special operator classes provided by the additional module
pg_trgm
is the solution.Install additional Postgres module
To install the needed module, log into database with super-user:
Identify missing indexes
The analysis of the queries resulted in the following list of the indexes that need to get created:
btree
btree
gin_trgm_ops
gin_trgm_ops
gin_trgm_ops
Create appropriate indexes
The commands to create the above indexes:
Should this section get included in the README?
The text was updated successfully, but these errors were encountered: