docs: ADR 001 general parser architecture #111

dadamu · 2024-03-13T07:06:11Z

Description

This PR introduces a new general parser architecture for any chains.

Checklist

Targeted PR against correct branch.
Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
Wrote unit tests.
Re-reviewed Files changed in the Github PR explorer.

0x7u

I believe we should use a tool like Kafka or RabbitMQ for the queuing. It has several benefits the most important one being that we could support multiple version of chains and also extend the architecture to non cosmos chain.

There's so much more advantages I can think of like: topics, parsers parallelisation, telemetry, fault tolerance, separation of concerns...etc

0x7u · 2024-03-14T06:17:25Z

.docs/architecture/adr-001-general-parser-architecture.md

+
+The provided image offer a comprehensive overview of the Juno parser, detailing its core components and functionalities.
+
+![Architecture](../.img/adr-001-general-architecture.png)


I had something like this in mind.

Yeah, it is great, we can make RabbitMQ or Kafka or other good queue services implement BlockQueue interface if needed.

Besides, cosmos.Tx is different from solana.Tx and ethereum.TX, in addition, transactions must be included inside a block, so I would avoid from transaction stuff inside Juno, make it only as block parser.

Yeah, also I think that the Indexer and Parser part should be separated into their program/process. It will allow for less complexity in the code and some freedom in terms of the technologies used.

For instance, later we find that the Indexer wrote in Go has latency spikes due to garbage collection. Then it will be far easier to replace only that part with Haskell (just kidding obviously 😂). Another use-case would be if, I don't know, Bitcoin has SDKs written in C++, we would be able to write an Indexer just for that and re-use all the rest of the architecture.

Yeah, also I think that the Indexer and Parser part should be separated into their program/process.

I agree with you. If we apply RabbitMQ or Kafka, we can make them sperately. That's why I starts from interfaces, and it brings us more possibilities to try and scale.

For instance, later we find that the Indexer wrote in Go has latency spikes due to garbage collection.

The Juno task would be Network IO bounding, most of time would be waiting for RPC server response, which takes about 200ms if Juno is not the same instance as RPC server. Therefore, gc spikes can be ignores in the case from my side.

Another use-case would be if, I don't know, Bitcoin has SDKs written in C++, we would be able to write an Indexer just for that and re-use all the rest of the architecture.

We can use other languages if it is worth than Go, especially as you said, the native language of the chain provides SDK then we don't rewrite anything.

0x7u

I have play a little with the RPC nodes and I had some instances where transactions were not attached to a particular Block.

I think they are transaction broadcasted on the network but not yet processed in a Block. I think we should also index thoses as mintscan and ethscan also does it. This also means that for one transaction hash we can have multiple state/status.

0x7u · 2024-03-15T01:36:02Z

.docs/architecture/adr-001-general-parser-architecture.md

+    context.Context
+
+    // Database returns the database used to operate blocks by workers
+    Database() Database


Is there a reason not to use a singleton for the Database client?

Yeah, we should use singleton on Context, Node and Database, inherited the previous design.

In golang, Singleton is in the implementation level, interface just a way to abstract the methods.
https://thedevelopercafe.com/articles/singleton-in-golang-839d8610958b

Oh understood 👍.

Why do we need the Context interface though? Can't we just use a function to get the database client like in the article you linked?

Context is mainly a concept to manage concurrency program, and it is shared to every goroutine tasks. Then we can manage all the workers in a easy way. Here is the brief reference.

In addition, putting Database, Node and etc. it follows the original concept from current Juno that including the shared singleton objects among workers.

Do you think using Start(context.Context, Database, Node) would be better?

I don't think Start(context.Context, Database, Node) is any better 😄.

I suggested retrieving the database client by the Singleton getter method like GetCache in the article you linked. I mean from within the function body.

But I think your approach is the recommended one using Golang. I am more used to doing things like that https://medium.com/learnfazz/flawed-singleton-pattern-for-mocking-go-e90794891faf.

Again I think you have the right approach 👍.

0x7u · 2024-03-15T01:43:59Z

.docs/architecture/adr-001-general-parser-architecture.md

+type Database interface {
+    // SaveBlock saves a block into the database.
+    SaveBlock(block Block) error
+
+    // LatestBlock returns the latest block from the database.
+    LatestBlock() (Block, error)
+}


Are we planning on supporting multiple databases?

Going Hexagonal/Clean architecture is sometimes detrimental in terms of performance and code maintainability. In my past experiences, I have seen multiple instances where generalization/composition makes this 10 times harder to reason and maintain.

I am not saying that what you propose is not good. I'm just trying to understand the benefit of an interface rather than the concrete instance.

Are we planning on supporting multiple databases?

This interface can let allow us try different database from my side. As we highly may not use PostgresSQL, and possible try with ScyllaDB or other NoSQL database.

In my past experiences, I have seen multiple instances where generalization/composition makes this 10 times harder to reason and maintain.

I agree with you, however, we need a abstract layer as we gonna support different chains, so to trade off it, I think we should make interfaces as simple as possible.

Besides, I would introduce another benefit. Currently, we cannot test worker, enqueuer and many factors since Juno does not apply dependency injection. Having these interfaces would help Juno to be more testable as well.

This interface can let allow us try different database from my side. As we highly may not use PostgresSQL, and possible try with ScyllaDB or other NoSQL database.

That's my concern. If we plan on supporting multiple databases we have to acknowledge that they might be fundamentally different. Apart from writing a good amount of data mapping code and the performance overhead, we will also prevent ourselves from using some features of each specific database. I believe that's why there's no popular ORM that can do SQL and NoSQL.

I agree with you, however, we need a abstract layer as we gonna support different chains, so to trade off it, I think we should make interfaces as simple as possible.

I think we want to support different inputs and conform them to the same output. I think your approach works for Blocks but not for chain-specific elements. Let's say chains A and B have different proposal data structures, how can we accommodate the Database interface to persist the different shapes of the proposal?

Besides, I would introduce another benefit. Currently, we cannot test worker, enqueuer and many factors since Juno does not apply dependency injection. Having these interfaces would help Juno to be more testable as well.

That's actually a great point, are we not able to mock/spy/stub the Database client?

That's my concern. If we plan on supporting multiple databases we have to acknowledge that they might be fundamentally different. Apart from writing a good amount of data mapping code and the performance overhead, we will also prevent ourselves from using some features of each specific database. I believe that's why there's no popular ORM that can do SQL and NoSQL.

Could you give me samples to make me on the same channel?

I think interfaces would not hurt so much, say if we are using postgresql, then the code implementing the interface may be like:

type PostgresDB struct { pool pq.Pool } func (db *PostgresDB) SaveBlock(block Block) error { err := db.pool.Execute('Save Block query', block) // other operations return err } ...other implementations

And then, the DB implementation for ScyllaDB may be:

type ScyllaDB struct { connection scylla.Connection } func (db *ScyllaDB) SaveBlock(block Block) error { // some operation saving block into ScyllaDB return err }

Both of them shared the same interface so we can reuse the code in our implementation codes without any changes:

func (w Woker) Start(ctx Context, queue BlockQueue) { for block := range queue.Listen() { ...operations err := ctx.SaveBlock(block) // ..other operations } }

Therefore, from my side, I only see the benefit rather than the disadvantages using interface.

I think we want to support different inputs and conform them to the same output. I think your approach works for Blocks but not for chain-specific elements. Let's say chains A and B have different proposal data structures, how can we accommodate the Database interface to persist the different shapes of the proposal?

That is the reasons why the design only serves Block to interface and make Juno only responsible for block indexer. The example from you, Chains A and B haves different proposal data, then we should build chain A gov module in A Callisto and in B chain B gov module to serve two cases from my side.

I have an example how Soljuno define a custom db interface and custom client proxy interface for stake module only for Solana.

So, basically, it would be better to implement module in Callisto to handle the custom things you mentioned.

That's actually a great point, are we not able to mock/spy/stub the Database client?

Yeah, it is not easy to do that since currently worker does not have interface and everything is only for Cosmos. Say, the returning block is *types.ResultBlock inside node.Block method and its interface is too complex as well. And now enqueuer is a function that depends on the *parsing.Context instance, then start function includes many instance structure so we can not test it in a easy way.

Regarding the interface/generalization of the datastore/underlying database.

Let's say you now want your worker to be able to bulk/batch persist your parsed transaction.

In Posgresql there is no such thing (there's COPY but it doesn't fit the example use case). You will end up with either a saving function like saveBulk(tx []Tx) or repeating the call to save(tx Tx).

Now with MongoDB use could have used the bulkWrite methods so your function signature will be save(tx []Tx, bulk bool), so you could conditionally call bulkWrite for your MongoDB implementation of your save method.

But we now have this bulk bool param that is only relevant to MongoDB's implementation, therefore we hindered the abstraction of the save method.

Another example would be Postgres prepared statements, which can help a bit with performances. Same thing if you want to control the underling implementation from the interface you will have to hinder the abstraction layer you've just created.

For the customization of the database repository.

The industry standard for many years now when it comes to the repository pattern is to have one repository per entity. I believe it has passed the test of time.

So we should have BlockRepository, TransactionRepository, etc with each of them having a save, load, etc.

I would refrain from having a Database with methods saveBlock, saveTransaction, etc.

From the research I have conducted online. Our Context struct should only have a reference to the database client.

Yeah, you are right. It would be to better to be renamed into BlockRepository rather than Database name used by legacy Juno to make it clear.

0x7u · 2024-03-15T01:46:52Z

.docs/architecture/adr-001-general-parser-architecture.md

+	// RegisterPeriodicOperations allows to register all the operations that will be run on a periodic basis.
+	// The given scheduler can be used to define the periodicity of each task.
+	// NOTE. This method will only be run ONCE during the module initialization.
+	RegisterPeriodicOperations(scheduler Scheduler) error


What kind of process did we have to launch on a periodic basis, in the last iteration of Juno?

It is actually for Callisto for custom module like gov module, I think It would be better to remove it from Juno interfaces as it is actually related to business logic.

Thanks for pointing it out.

0x7u · 2024-03-15T02:05:03Z

Is this repo public? maybe we should take our ADR to a private one to avoid exposing the company's intellectual properties.

dadamu · 2024-03-18T08:36:29Z

Close this PR because it would be another project.

chore: add adr 001: general parser architecture

2116ee5

dadamu changed the title ~~docs: add adr 001: general parser architecture~~ docs: ADR 001 general parser architecture Mar 13, 2024

docs: finish adr general parser architecture writing

2dbf301

dadamu requested review from 0x7u and icfor March 13, 2024 14:07

dadamu marked this pull request as ready for review March 13, 2024 14:07

dadamu requested a review from leobragaz March 13, 2024 14:07

dadamu added 3 commits March 13, 2024 22:15

chore: fix typo

a94152d

docs: fix Client interface name to Node

600c28f

docs: remove unused generic from block

6d0f1f9

0x7u reviewed Mar 14, 2024

View reviewed changes

fix: fix additional operation module interface

ddc5e73

dadamu requested a review from 0x7u March 14, 2024 07:25

dadamu added 3 commits March 14, 2024 15:49

fix: fix Node and Database interfaces

2cd2f6c

docs: adjust interfaces for shceduler and worker

bc47afa

docs: adjust worker interface by removing unnecessary extension param

9414fe1

0x7u reviewed Mar 15, 2024

View reviewed changes

dadamu closed this Mar 18, 2024

dadamu deleted the paul/prepare-adr-001 branch June 20, 2024 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: ADR 001 general parser architecture #111

docs: ADR 001 general parser architecture #111

dadamu commented Mar 13, 2024 •

edited

Loading

0x7u left a comment

0x7u Mar 14, 2024

dadamu Mar 14, 2024 •

edited

Loading

0x7u Mar 15, 2024

dadamu Mar 15, 2024

0x7u left a comment

0x7u Mar 15, 2024

dadamu Mar 15, 2024

0x7u Mar 15, 2024

dadamu Mar 15, 2024 •

edited

Loading

0x7u Mar 18, 2024

0x7u Mar 15, 2024

dadamu Mar 15, 2024

0x7u Mar 15, 2024

dadamu Mar 15, 2024

0x7u Mar 18, 2024

dadamu Mar 18, 2024

0x7u Mar 15, 2024

dadamu Mar 15, 2024

0x7u commented Mar 15, 2024

dadamu commented Mar 18, 2024


		The provided image offer a comprehensive overview of the Juno parser, detailing its core components and functionalities.

		![Architecture](../.img/adr-001-general-architecture.png)

docs: ADR 001 general parser architecture #111

docs: ADR 001 general parser architecture #111

Conversation

dadamu commented Mar 13, 2024 • edited Loading

Description

Checklist

0x7u left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dadamu Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0x7u left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dadamu Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0x7u commented Mar 15, 2024

dadamu commented Mar 18, 2024

dadamu commented Mar 13, 2024 •

edited

Loading

dadamu Mar 14, 2024 •

edited

Loading

dadamu Mar 15, 2024 •

edited

Loading