Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize timestamp filters for transactions #6

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

agodnic
Copy link

@agodnic agodnic commented Sep 9, 2024

Summary

This pull requests optimizes the filters after-time and before-time in the GET /v2/transactions endpoint.

The rationale behind this optimization is to convert timestamps into rounds (since transactions have lookup indexes based in their round number). This hopefully will avoid an inefficient query execution plan.

Test Plan

Tested in a local environment with a CRDB light indexer. The resulting execution plan seems to be more efficient, but the amount of data in a light indexer is too small to see a performance improvement.

This change would need to be tested in a full indexer (i.e. full history) to verify that CRDB is picking the right query execution plan.

@agodnic agodnic requested a review from urtho September 9, 2024 20:50
@agodnic
Copy link
Author

agodnic commented Sep 15, 2024

Performance tests on Postgres

Old queries
  • The query associated to the after-time filter timed out
  • The query associated to the before-time filter took ~40ms
explain analyze
SELECT t.round, t.intra, t.txn, h.realtime
FROM txn t JOIN block_header h ON t.round = h.round
WHERE h.realtime > TIMESTAMP '2024-01-01 00:00:00' AND t.txid IS NOT NULL
ORDER BY t.round, t.intra
LIMIT 10;
(timed out)

explain analyze
SELECT t.round, t.intra, t.txn, h.realtime
FROM txn t JOIN block_header h ON t.round = h.round
WHERE h.realtime < TIMESTAMP '2024-01-01 00:00:00' AND t.txid IS NOT NULL
ORDER BY t.round, t.intra
LIMIT 10;

    QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4.58..9.00 rows=10 width=700) (actual time=0.101..0.132 rows=10 loops=1)
   ->  Nested Loop  (cost=4.58..594946673.12 rows=1343551727 width=700) (actual time=0.101..0.131 rows=10 loops=1)
         ->  Merge Append  (cost=4.11..353530250.69 rows=1637833565 width=692) (actual time=0.087..0.092 rows=10 loops=1)
               Sort Key: t.round, t.intra
               ->  Index Scan using txn_0_pkey on txn_0 t_1  (cost=0.57..43704800.07 rows=233817219 width=692) (actual time=0.016..0.017 rows=5 loops=1)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_1_pkey on txn_1 t_2  (cost=0.57..43722775.60 rows=234121464 width=691) (actual time=0.011..0.011 rows=2 loops=1)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_2_pkey on txn_2 t_3  (cost=0.57..43740177.29 rows=234120722 width=692) (actual time=0.009..0.010 rows=3 loops=1)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_3_pkey on txn_3 t_4  (cost=0.57..43738728.03 rows=234067445 width=692) (actual time=0.009..0.010 rows=3 loops=1)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_4_pkey on txn_4 t_5  (cost=0.57..43730400.77 rows=234151294 width=692) (actual time=0.021..0.021 rows=1 loops=1)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_5_pkey on txn_5 t_6  (cost=0.57..43704161.99 rows=234074709 width=692) (actual time=0.009..0.009 rows=1 loops=1)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_6_pkey on txn_6 t_7  (cost=0.57..43631802.76 rows=233480712 width=692) (actual time=0.009..0.009 rows=1 loops=1)
                     Filter: (txid IS NOT NULL)
         ->  Memoize  (cost=0.47..0.70 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=10)
               Cache Key: t.round
               Cache Mode: logical
               Hits: 0  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 2kB
               ->  Index Scan using block_header_pkey on block_header h  (cost=0.44..0.67 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=10)
                     Index Cond: (round = t.round)
                     Filter: (realtime < '2024-01-01 00:00:00'::timestamp without time zone)
 Planning Time: 0.798 ms
 Execution Time: 41.493 ms
(27 rows)
New queries
  • The query associated to the after-time filter took ~40ms
  • The query associated to the before-time filter took ~38ms
                                                                              
explain analyze
SELECT t.round, t.intra, h.realtime
FROM txn t JOIN block_header h ON t.round = h.round                                                                                                                        
WHERE t.round >= (SELECT bh.round FROM block_header bh WHERE bh.realtime > TIMESTAMP '2024-01-01 01:00:00' ORDER BY bh.realtime ASC, bh.round ASC LIMIT 1) AND t.txid IS NOT NULL
ORDER BY t.round, t.intra
LIMIT 10;
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.18..8.87 rows=10 width=20) (actual time=2.814..3.052 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.60 rows=1 width=16) (actual time=0.016..0.016 rows=1 loops=1)
           ->  Index Only Scan using block_header_time_ir on block_header bh  (cost=0.56..291850.15 rows=7647523 width=16) (actual time=0.016..0.016 rows=1 loops=1)
                 Index Cond: (realtime > '2024-01-01 01:00:00'::timestamp without time zone)
                 Heap Fetches: 0
   ->  Nested Loop  (cost=4.58..201379282.24 rows=545944386 width=20) (actual time=2.814..3.050 rows=10 loops=1)
         ->  Merge Append  (cost=4.11..119769931.61 rows=545944386 width=12) (actual time=2.625..2.858 rows=10 loops=1)
               Sort Key: t.round, t.intra
               ->  Index Scan using txn_0_pkey on txn_0 t_1  (cost=0.57..14843459.75 rows=77939068 width=12) (actual time=0.472..0.472 rows=1 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_1_pkey on txn_1 t_2  (cost=0.57..14849576.70 rows=78040488 width=12) (actual time=0.274..0.274 rows=1 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_2_pkey on txn_2 t_3  (cost=0.57..14855480.50 rows=78040203 width=12) (actual time=0.438..0.438 rows=1 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_3_pkey on txn_3 t_4  (cost=0.57..14854967.77 rows=78022475 width=12) (actual time=0.334..0.334 rows=1 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_4_pkey on txn_4 t_5  (cost=0.57..14852137.34 rows=78050405 width=12) (actual time=0.337..0.568 rows=10 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
                     Rows Removed by Filter: 1
               ->  Index Scan using txn_5_pkey on txn_5 t_6  (cost=0.57..14843294.29 rows=78024886 width=12) (actual time=0.371..0.371 rows=1 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_6_pkey on txn_6 t_7  (cost=0.57..14818551.08 rows=77826861 width=12) (actual time=0.397..0.397 rows=1 loops=1)
                     Index Cond: (round >= $0)
                     Filter: (txid IS NOT NULL)
         ->  Memoize  (cost=0.47..0.70 rows=1 width=16) (actual time=0.019..0.019 rows=1 loops=10)
               Cache Key: t.round
               Cache Mode: logical
               Hits: 9  Misses: 1  Evictions: 0  Overflows: 0  Memory Usage: 1kB
               ->  Index Scan using block_header_pkey on block_header h  (cost=0.44..0.67 rows=1 width=16) (actual time=0.184..0.184 rows=1 loops=1)
                     Index Cond: (round = t.round)
 Planning Time: 0.844 ms
 Execution Time: 40.050 ms
(39 rows)

explain analyze
SELECT t.round, t.intra, h.realtime
FROM txn t JOIN block_header h ON t.round = h.round                                                                                                                        
WHERE t.round <= (SELECT bh.round FROM block_header bh WHERE bh.realtime < TIMESTAMP '2024-01-01 01:00:00' ORDER BY bh.realtime DESC, bh.round DESC LIMIT 1) AND t.txid IS NOT NULL
ORDER BY t.round, t.intra
LIMIT 10;

                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.18..8.87 rows=10 width=20) (actual time=0.650..0.704 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.60 rows=1 width=16) (actual time=0.364..0.364 rows=1 loops=1)
           ->  Index Only Scan Backward using block_header_time_ir on block_header bh  (cost=0.56..1330465.04 rows=34921213 width=16) (actual time=0.364..0.364 rows=1 loops=1)
                 Index Cond: (realtime < '2024-01-01 01:00:00'::timestamp without time zone)
                 Heap Fetches: 0
   ->  Nested Loop  (cost=4.58..201379219.99 rows=545944218 width=20) (actual time=0.649..0.703 rows=10 loops=1)
         ->  Merge Append  (cost=4.11..119769894.04 rows=545944218 width=12) (actual time=0.627..0.632 rows=10 loops=1)
               Sort Key: t.round, t.intra
               ->  Index Scan using txn_0_pkey on txn_0 t_1  (cost=0.57..14843454.25 rows=77939040 width=12) (actual time=0.403..0.405 rows=5 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_1_pkey on txn_1 t_2  (cost=0.57..14849571.59 rows=78040462 width=12) (actual time=0.036..0.036 rows=2 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_2_pkey on txn_2 t_3  (cost=0.57..14855477.75 rows=78040188 width=12) (actual time=0.031..0.032 rows=3 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_3_pkey on txn_3 t_4  (cost=0.57..14854961.68 rows=78022441 width=12) (actual time=0.037..0.037 rows=3 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_4_pkey on txn_4 t_5  (cost=0.57..14852133.22 rows=78050384 width=12) (actual time=0.044..0.044 rows=1 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_5_pkey on txn_5 t_6  (cost=0.57..14843287.67 rows=78024853 width=12) (actual time=0.036..0.036 rows=1 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
               ->  Index Scan using txn_6_pkey on txn_6 t_7  (cost=0.57..14818548.58 rows=77826850 width=12) (actual time=0.038..0.038 rows=1 loops=1)
                     Index Cond: (round <= $0)
                     Filter: (txid IS NOT NULL)
         ->  Memoize  (cost=0.47..0.70 rows=1 width=16) (actual time=0.007..0.007 rows=1 loops=10)
               Cache Key: t.round
               Cache Mode: logical
               Hits: 0  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 2kB
               ->  Index Scan using block_header_pkey on block_header h  (cost=0.44..0.67 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=10)
                     Index Cond: (round = t.round)
 Planning Time: 7.779 ms
 Execution Time: 38.747 ms
(38 rows)

Conclusion

  • The before-time had an acceptable performance even before the optimization.
  • The after-time parameter had unnaceptable performance before the optimization. Now it takes around ~40ms to execute.

@agodnic
Copy link
Author

agodnic commented Sep 15, 2024

Performance tests on CockroachDB

I could only test the new queries in a light indexer. Unfortunately, the amount of data in a light indexer database is not large enough to be certain that the execution plans are efficient.

@urtho urtho merged commit 0753471 into main Sep 16, 2024
4 checks passed
@agodnic agodnic deleted the optimize-timestamp-filters branch September 16, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants