Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR 001 general parser architecture #111

Closed
wants to merge 9 commits into from

Conversation

dadamu
Copy link
Contributor

@dadamu dadamu commented Mar 13, 2024

Description

This PR introduces a new general parser architecture for any chains.

Checklist

  • Targeted PR against correct branch.
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Wrote unit tests.
  • Re-reviewed Files changed in the Github PR explorer.

@dadamu dadamu changed the title docs: add adr 001: general parser architecture docs: ADR 001 general parser architecture Mar 13, 2024
@dadamu dadamu requested review from 0x7u and icfor March 13, 2024 14:07
@dadamu dadamu marked this pull request as ready for review March 13, 2024 14:07
@dadamu dadamu requested a review from leobragaz March 13, 2024 14:07
Copy link

@0x7u 0x7u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should use a tool like Kafka or RabbitMQ for the queuing. It has several benefits the most important one being that we could support multiple version of chains and also extend the architecture to non cosmos chain.

There's so much more advantages I can think of like: topics, parsers parallelisation, telemetry, fault tolerance, separation of concerns...etc


The provided image offer a comprehensive overview of the Juno parser, detailing its core components and functionalities.

![Architecture](../.img/adr-001-general-architecture.png)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something like this in mind.

Indexer

Copy link
Contributor Author

@dadamu dadamu Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is great, we can make RabbitMQ or Kafka or other good queue services implement BlockQueue interface if needed.

Besides, cosmos.Tx is different from solana.Tx and ethereum.TX, in addition, transactions must be included inside a block, so I would avoid from transaction stuff inside Juno, make it only as block parser.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also I think that the Indexer and Parser part should be separated into their program/process. It will allow for less complexity in the code and some freedom in terms of the technologies used.

For instance, later we find that the Indexer wrote in Go has latency spikes due to garbage collection. Then it will be far easier to replace only that part with Haskell (just kidding obviously 😂). Another use-case would be if, I don't know, Bitcoin has SDKs written in C++, we would be able to write an Indexer just for that and re-use all the rest of the architecture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also I think that the Indexer and Parser part should be separated into their program/process.

I agree with you. If we apply RabbitMQ or Kafka, we can make them sperately. That's why I starts from interfaces, and it brings us more possibilities to try and scale.

For instance, later we find that the Indexer wrote in Go has latency spikes due to garbage collection.

The Juno task would be Network IO bounding, most of time would be waiting for RPC server response, which takes about 200ms if Juno is not the same instance as RPC server. Therefore, gc spikes can be ignores in the case from my side.

Another use-case would be if, I don't know, Bitcoin has SDKs written in C++, we would be able to write an Indexer just for that and re-use all the rest of the architecture.

We can use other languages if it is worth than Go, especially as you said, the native language of the chain provides SDK then we don't rewrite anything.

@dadamu dadamu requested a review from 0x7u March 14, 2024 07:25
Copy link

@0x7u 0x7u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have play a little with the RPC nodes and I had some instances where transactions were not attached to a particular Block.

I think they are transaction broadcasted on the network but not yet processed in a Block. I think we should also index thoses as mintscan and ethscan also does it. This also means that for one transaction hash we can have multiple state/status.

context.Context

// Database returns the database used to operate blocks by workers
Database() Database
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use a singleton for the Database client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should use singleton on Context, Node and Database, inherited the previous design.

In golang, Singleton is in the implementation level, interface just a way to abstract the methods.
https://thedevelopercafe.com/articles/singleton-in-golang-839d8610958b

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh understood 👍.

Why do we need the Context interface though? Can't we just use a function to get the database client like in the article you linked?

Copy link
Contributor Author

@dadamu dadamu Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context is mainly a concept to manage concurrency program, and it is shared to every goroutine tasks. Then we can manage all the workers in a easy way. Here is the brief reference.

In addition, putting Database, Node and etc. it follows the original concept from current Juno that including the shared singleton objects among workers.

Do you think using Start(context.Context, Database, Node) would be better?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Start(context.Context, Database, Node) is any better 😄.

I suggested retrieving the database client by the Singleton getter method like GetCache in the article you linked. I mean from within the function body.

But I think your approach is the recommended one using Golang. I am more used to doing things like that https://medium.com/learnfazz/flawed-singleton-pattern-for-mocking-go-e90794891faf.

Again I think you have the right approach 👍.

Comment on lines +123 to +129
type Database interface {
// SaveBlock saves a block into the database.
SaveBlock(block Block) error

// LatestBlock returns the latest block from the database.
LatestBlock() (Block, error)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on supporting multiple databases?

Going Hexagonal/Clean architecture is sometimes detrimental in terms of performance and code maintainability. In my past experiences, I have seen multiple instances where generalization/composition makes this 10 times harder to reason and maintain.

I am not saying that what you propose is not good. I'm just trying to understand the benefit of an interface rather than the concrete instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on supporting multiple databases?

This interface can let allow us try different database from my side. As we highly may not use PostgresSQL, and possible try with ScyllaDB or other NoSQL database.

In my past experiences, I have seen multiple instances where generalization/composition makes this 10 times harder to reason and maintain.

I agree with you, however, we need a abstract layer as we gonna support different chains, so to trade off it, I think we should make interfaces as simple as possible.

Besides, I would introduce another benefit. Currently, we cannot test worker, enqueuer and many factors since Juno does not apply dependency injection. Having these interfaces would help Juno to be more testable as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface can let allow us try different database from my side. As we highly may not use PostgresSQL, and possible try with ScyllaDB or other NoSQL database.

That's my concern. If we plan on supporting multiple databases we have to acknowledge that they might be fundamentally different. Apart from writing a good amount of data mapping code and the performance overhead, we will also prevent ourselves from using some features of each specific database. I believe that's why there's no popular ORM that can do SQL and NoSQL.

I agree with you, however, we need a abstract layer as we gonna support different chains, so to trade off it, I think we should make interfaces as simple as possible.

I think we want to support different inputs and conform them to the same output. I think your approach works for Blocks but not for chain-specific elements. Let's say chains A and B have different proposal data structures, how can we accommodate the Database interface to persist the different shapes of the proposal?

Besides, I would introduce another benefit. Currently, we cannot test worker, enqueuer and many factors since Juno does not apply dependency injection. Having these interfaces would help Juno to be more testable as well.

That's actually a great point, are we not able to mock/spy/stub the Database client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my concern. If we plan on supporting multiple databases we have to acknowledge that they might be fundamentally different. Apart from writing a good amount of data mapping code and the performance overhead, we will also prevent ourselves from using some features of each specific database. I believe that's why there's no popular ORM that can do SQL and NoSQL.

Could you give me samples to make me on the same channel?

I think interfaces would not hurt so much, say if we are using postgresql, then the code implementing the interface may be like:

type PostgresDB struct {
  pool pq.Pool
}

func (db *PostgresDB) SaveBlock(block Block) error {
  err := db.pool.Execute('Save Block query', block)
  // other operations
  return err
}

...other implementations

And then, the DB implementation for ScyllaDB may be:

type ScyllaDB struct {
  connection scylla.Connection
}

func (db *ScyllaDB) SaveBlock(block Block) error {
  // some operation saving block into ScyllaDB
  return err
}

Both of them shared the same interface so we can reuse the code in our implementation codes without any changes:

func (w Woker) Start(ctx Context, queue BlockQueue) {
  for block := range queue.Listen() {
    ...operations
    err := ctx.SaveBlock(block)
    // ..other operations
  }
} 

Therefore, from my side, I only see the benefit rather than the disadvantages using interface.

I think we want to support different inputs and conform them to the same output. I think your approach works for Blocks but not for chain-specific elements. Let's say chains A and B have different proposal data structures, how can we accommodate the Database interface to persist the different shapes of the proposal?

That is the reasons why the design only serves Block to interface and make Juno only responsible for block indexer. The example from you, Chains A and B haves different proposal data, then we should build chain A gov module in A Callisto and in B chain B gov module to serve two cases from my side.

I have an example how Soljuno define a custom db interface and custom client proxy interface for stake module only for Solana.

So, basically, it would be better to implement module in Callisto to handle the custom things you mentioned.

That's actually a great point, are we not able to mock/spy/stub the Database client?

Yeah, it is not easy to do that since currently worker does not have interface and everything is only for Cosmos. Say, the returning block is *types.ResultBlock inside node.Block method and its interface is too complex as well. And now enqueuer is a function that depends on the *parsing.Context instance, then start function includes many instance structure so we can not test it in a easy way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the interface/generalization of the datastore/underlying database.

Let's say you now want your worker to be able to bulk/batch persist your parsed transaction.

In Posgresql there is no such thing (there's COPY but it doesn't fit the example use case). You will end up with either a saving function like saveBulk(tx []Tx) or repeating the call to save(tx Tx).

Now with MongoDB use could have used the bulkWrite methods so your function signature will be save(tx []Tx, bulk bool), so you could conditionally call bulkWrite for your MongoDB implementation of your save method.

But we now have this bulk bool param that is only relevant to MongoDB's implementation, therefore we hindered the abstraction of the save method.

Another example would be Postgres prepared statements, which can help a bit with performances. Same thing if you want to control the underling implementation from the interface you will have to hinder the abstraction layer you've just created.


For the customization of the database repository.

The industry standard for many years now when it comes to the repository pattern is to have one repository per entity. I believe it has passed the test of time.

So we should have BlockRepository, TransactionRepository, etc with each of them having a save, load, etc.

I would refrain from having a Database with methods saveBlock, saveTransaction, etc.


From the research I have conducted online. Our Context struct should only have a reference to the database client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. It would be to better to be renamed into BlockRepository rather than Database name used by legacy Juno to make it clear.

Comment on lines +149 to +152
// RegisterPeriodicOperations allows to register all the operations that will be run on a periodic basis.
// The given scheduler can be used to define the periodicity of each task.
// NOTE. This method will only be run ONCE during the module initialization.
RegisterPeriodicOperations(scheduler Scheduler) error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of process did we have to launch on a periodic basis, in the last iteration of Juno?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually for Callisto for custom module like gov module, I think It would be better to remove it from Juno interfaces as it is actually related to business logic.

Thanks for pointing it out.

@0x7u
Copy link

0x7u commented Mar 15, 2024

Is this repo public? maybe we should take our ADR to a private one to avoid exposing the company's intellectual properties.

@dadamu
Copy link
Contributor Author

dadamu commented Mar 18, 2024

Close this PR because it would be another project.

@dadamu dadamu closed this Mar 18, 2024
@dadamu dadamu deleted the paul/prepare-adr-001 branch June 20, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants