Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR 001 general parser architecture #111

Closed
wants to merge 9 commits into from
Binary file added .docs/.img/adr-001-general-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions .docs/architecture/PROCESS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# ADR Creation Process

1. Copy the `adr-template.md` file. Use the following filename pattern: `adr-next_number-title.md`
2. Create a draft Pull Request if you want to get an early feedback.
3. Make sure the context and a solution is clear and well documented.
4. Add an entry to a list in the [README](./README.md) file.
5. Create a Pull Request to propose a new ADR.

## ADR life cycle

ADR creation is an **iterative** process. Instead of trying to solve all decisions in a single ADR pull request, we MUST firstly understand the problem and collect feedback through a GitHub Issue.

1. Every proposal SHOULD start with a new GitHub Issue or be a result of existing Issues. The Issue should contain just a brief proposal summary.

2. Once the motivation is validated, a GitHub Pull Request (PR) is created with a new document based on the `adr-template.md`.

3. An ADR doesn't have to arrive to `master` with an _accepted_ status in a single PR. If the motivation is clear and the solution is sound, we SHOULD be able to merge it and keep a _proposed_ status. It's preferable to have an iterative approach rather than long, not merged Pull Requests.

4. If a _proposed_ ADR is merged, then it should clearly document outstanding issues either in ADR document notes or in a GitHub Issue.

5. The PR SHOULD always be merged. In the case of a faulty ADR, we still prefer to merge it with a _rejected_ status. The only time the ADR SHOULD NOT be merged is if the author abandons it.

6. Merged ADRs SHOULD NOT be pruned.

### ADR status

Status has two components:

```
{CONSENSUS STATUS} {IMPLEMENTATION STATUS}
```

IMPLEMENTATION STATUS is either `Implemented` or `Not Implemented`.

#### Consensus Status

```
DRAFT -> PROPOSED -> LAST CALL yyyy-mm-dd -> ACCEPTED | REJECTED -> SUPERSEEDED by ADR-xxx
\ |
\ |
v v
ABANDONED
```

+ `DRAFT`: [optional] an ADR which is work in progress, not being ready for a general review. This is to present an early work and get an early feedback in a Draft Pull Request form.
+ `PROPOSED`: an ADR covering a full solution architecture and still in the review - project stakeholders haven't reached an agreed yet.
+ `LAST CALL <date for the last call>`: [optional] clear notify that we are close to accept updates. Changing a status to `LAST CALL` means that social consensus (of Juno maintainers) has been reached and we still want to give it a time to let the community react or analyze.
+ `ACCEPTED`: ADR which will represent a currently implemented or to be implemented architecture design.
+ `REJECTED`: ADR can go from PROPOSED or ACCEPTED to rejected if the consensus among project stakeholders will decide so.
+ `SUPERSEEDED by ADR-xxx`: ADR which has been superseded by a new ADR.
+ `ABANDONED`: the ADR is no longer pursued by the original authors.

## Language used in ADR

+ The context/background should be written in the present tense.
+ Avoid using a first, personal form.
43 changes: 43 additions & 0 deletions .docs/architecture/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Architecture Decision Records (ADR)

This is a location to record all high-level architecture decisions in Juno.

An Architectural Decision (**AD**) is a software design choice that addresses a functional or non-functional requirement that is architecturally significant.
An Architecturally Significant Requirement (**ASR**) is a requirement that has a measurable effect on a software system’s architecture and quality.
An Architectural Decision Record (**ADR**) captures a single AD, such as often done when writing personal notes or meeting minutes; the collection of ADRs created and maintained in a project constitute its decision log. All these are within the topic of Architectural Knowledge Management (AKM).

You can read more about the ADR concept in this [blog post](https://product.reverb.com/documenting-architecture-decisions-the-reverb-way-a3563bb24bd0#.78xhdix6t).

## Rationale

ADRs are intended to be the primary mechanism for proposing new feature designs and new processes, for collecting community input on an issue, and for documenting the design decisions.
An ADR should provide:

- Context on the relevant goals and the current state
- Proposed changes to achieve the goals
- Summary of pros and cons
- References
- Changelog

Note the distinction between an ADR and a spec. The ADR provides the context, intuition, reasoning, and
justification for a change in architecture, or for the architecture of something
new. The spec is much more compressed and streamlined summary of everything as
it stands today.

If recorded decisions turned out to be lacking, convene a discussion, record the new decisions here, and then modify the code to match.

## Creating new ADR

Read about the [PROCESS](./PROCESS.md).

#### Use RFC 2119 Keywords

When writing ADRs, follow the same best practices for writing RFCs. When writing RFCs, key words are used to signify the requirements in the specification. These words are often capitalized: "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL. They are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).

## ADR Table of Contents

### Accepted

### Proposed

### Draft
234 changes: 234 additions & 0 deletions .docs/architecture/adr-001-general-parser-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# ADR-001: General parser architecture


## Changelog

- March 11th, 2024: Init draft;
- March 13th, 2024: First draft;

## Status

PROPOSED

## Abstract

This ADR outlines the general parser architecture designed for Juno, addressing the need for enhanced interfaces to support various chains.

## Context

Juno currently lacks the necessary architecture documents and interfaces to effectively support multiple chains. Additionally, it faces challenges in handling messages and transactions, especially when dealing with different blockchain protocols and versions. There is a pressing need to establish a robust and adaptable architecture that can accommodate the diverse requirements of various chains while maintaining efficiency and reliability.

## Decision

To address these challenges, the decision is made to redesign Juno's parser architecture. The architecture will prioritize modularity and extensibility, allowing seamless integration with different blockchain networks and protocols. Key components such as enqueuers, workers, and modules will be defined with clear responsibilities and interfaces to facilitate easy extension and customization. Moreover, the architecture will incorporate mechanisms to handle transactions, messages, and other blockchain operations efficiently.

The provided image offer a comprehensive overview of the Juno parser, detailing its core components and functionalities.

![Architecture](../.img/adr-001-general-architecture.png)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something like this in mind.

Indexer

Copy link
Contributor Author

@dadamu dadamu Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is great, we can make RabbitMQ or Kafka or other good queue services implement BlockQueue interface if needed.

Besides, cosmos.Tx is different from solana.Tx and ethereum.TX, in addition, transactions must be included inside a block, so I would avoid from transaction stuff inside Juno, make it only as block parser.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also I think that the Indexer and Parser part should be separated into their program/process. It will allow for less complexity in the code and some freedom in terms of the technologies used.

For instance, later we find that the Indexer wrote in Go has latency spikes due to garbage collection. Then it will be far easier to replace only that part with Haskell (just kidding obviously 😂). Another use-case would be if, I don't know, Bitcoin has SDKs written in C++, we would be able to write an Indexer just for that and re-use all the rest of the architecture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also I think that the Indexer and Parser part should be separated into their program/process.

I agree with you. If we apply RabbitMQ or Kafka, we can make them sperately. That's why I starts from interfaces, and it brings us more possibilities to try and scale.

For instance, later we find that the Indexer wrote in Go has latency spikes due to garbage collection.

The Juno task would be Network IO bounding, most of time would be waiting for RPC server response, which takes about 200ms if Juno is not the same instance as RPC server. Therefore, gc spikes can be ignores in the case from my side.

Another use-case would be if, I don't know, Bitcoin has SDKs written in C++, we would be able to write an Indexer just for that and re-use all the rest of the architecture.

We can use other languages if it is worth than Go, especially as you said, the native language of the chain provides SDK then we don't rewrite anything.


Let's break down each element briefly by following topics.

### Context

The Context interface acts as a central hub shared among all elements within the parser, including enqueuers, workers, and other components.

```go
type Context interface {
// Context represents the manager for the concurrent jobs
context.Context

// Database returns the database used to operate blocks by workers
Database() Database
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use a singleton for the Database client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should use singleton on Context, Node and Database, inherited the previous design.

In golang, Singleton is in the implementation level, interface just a way to abstract the methods.
https://thedevelopercafe.com/articles/singleton-in-golang-839d8610958b

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh understood 👍.

Why do we need the Context interface though? Can't we just use a function to get the database client like in the article you linked?

Copy link
Contributor Author

@dadamu dadamu Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context is mainly a concept to manage concurrency program, and it is shared to every goroutine tasks. Then we can manage all the workers in a easy way. Here is the brief reference.

In addition, putting Database, Node and etc. it follows the original concept from current Juno that including the shared singleton objects among workers.

Do you think using Start(context.Context, Database, Node) would be better?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Start(context.Context, Database, Node) is any better 😄.

I suggested retrieving the database client by the Singleton getter method like GetCache in the article you linked. I mean from within the function body.

But I think your approach is the recommended one using Golang. I am more used to doing things like that https://medium.com/learnfazz/flawed-singleton-pattern-for-mocking-go-e90794891faf.

Again I think you have the right approach 👍.


// Node returns the client used to listen to blocks by enqueuers
Node() Node

// Modules returns a set of the extensional operations that should be executed inside a worker
Modules() []Module
}
```

### Block

The Block interface serves as a representation of the aggregation of operations on the blockchain and is the fundamental element transferred within the queue for parsing by workers within Juno.

```go
type Block interface {
// Height returns the height of the block.
Height() uint64

// Hash returns the hash of the block.
Hash() string

// Timestamp returns the timestamp of the block.
Timestamp() time.Time

// Proposer returns the address of the block proposer.
Proposer() string
}
```

### Enqueuer

The Enqueuer interface orchestrates the process of listening for available blocks and enqueueing them into the queue for parsing by workers.

In the current version, two types of enqueuers are implemented:

* Missing Blocks Enqueuer: This enqueuer queries missing blocks from the given start block to the current latest block and enqueues them into the queue.
* New Blocks Enqueuer: This enqueuer listens for new blocks from the node and enqueues them into the queue.

```go
type Enqueuer interface {
// ListenAndEnqueueBlocks listens for available blocks and enqueues them into the queue.
ListenAndEnqueueBlocks(ctx Context, queue BlockQueue) (err error)
}
```

### Block Queue

The BlockQueue interface serves as a communication channel between enqueuers and workers within the Juno architecture.

```go
type BlockQueue interface {
// Enqueue enqueues a block into the queue.
Enqueue(block Block)

// Listen returns a channel to listen for blocks.
Listen() <-chan Block
}
```

### Node

The Node interface serves as an abstraction layer for interacting with RPC servers or other clients to obtain block data efficiently.

```go
type Node interface {
// LatestBlock retrieves the latest block from the connected node.
LatestBlock() (Block, error)

// Block retrieves the block by the given height
Block(height uint64) (Block, error)

// SubscribeBlocks returns a channel to subscribe to new blocks from the connected node.
SubscribeBlocks() (<-chan Block, error)
}
```

### Database

The Database interface encapsulates the necessary operations for interacting with the underlying storage system.

```go
type Database interface {
// SaveBlock saves a block into the database.
SaveBlock(block Block) error

// LatestBlock returns the latest block from the database.
LatestBlock() (Block, error)
}
Comment on lines +123 to +129
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on supporting multiple databases?

Going Hexagonal/Clean architecture is sometimes detrimental in terms of performance and code maintainability. In my past experiences, I have seen multiple instances where generalization/composition makes this 10 times harder to reason and maintain.

I am not saying that what you propose is not good. I'm just trying to understand the benefit of an interface rather than the concrete instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on supporting multiple databases?

This interface can let allow us try different database from my side. As we highly may not use PostgresSQL, and possible try with ScyllaDB or other NoSQL database.

In my past experiences, I have seen multiple instances where generalization/composition makes this 10 times harder to reason and maintain.

I agree with you, however, we need a abstract layer as we gonna support different chains, so to trade off it, I think we should make interfaces as simple as possible.

Besides, I would introduce another benefit. Currently, we cannot test worker, enqueuer and many factors since Juno does not apply dependency injection. Having these interfaces would help Juno to be more testable as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface can let allow us try different database from my side. As we highly may not use PostgresSQL, and possible try with ScyllaDB or other NoSQL database.

That's my concern. If we plan on supporting multiple databases we have to acknowledge that they might be fundamentally different. Apart from writing a good amount of data mapping code and the performance overhead, we will also prevent ourselves from using some features of each specific database. I believe that's why there's no popular ORM that can do SQL and NoSQL.

I agree with you, however, we need a abstract layer as we gonna support different chains, so to trade off it, I think we should make interfaces as simple as possible.

I think we want to support different inputs and conform them to the same output. I think your approach works for Blocks but not for chain-specific elements. Let's say chains A and B have different proposal data structures, how can we accommodate the Database interface to persist the different shapes of the proposal?

Besides, I would introduce another benefit. Currently, we cannot test worker, enqueuer and many factors since Juno does not apply dependency injection. Having these interfaces would help Juno to be more testable as well.

That's actually a great point, are we not able to mock/spy/stub the Database client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my concern. If we plan on supporting multiple databases we have to acknowledge that they might be fundamentally different. Apart from writing a good amount of data mapping code and the performance overhead, we will also prevent ourselves from using some features of each specific database. I believe that's why there's no popular ORM that can do SQL and NoSQL.

Could you give me samples to make me on the same channel?

I think interfaces would not hurt so much, say if we are using postgresql, then the code implementing the interface may be like:

type PostgresDB struct {
  pool pq.Pool
}

func (db *PostgresDB) SaveBlock(block Block) error {
  err := db.pool.Execute('Save Block query', block)
  // other operations
  return err
}

...other implementations

And then, the DB implementation for ScyllaDB may be:

type ScyllaDB struct {
  connection scylla.Connection
}

func (db *ScyllaDB) SaveBlock(block Block) error {
  // some operation saving block into ScyllaDB
  return err
}

Both of them shared the same interface so we can reuse the code in our implementation codes without any changes:

func (w Woker) Start(ctx Context, queue BlockQueue) {
  for block := range queue.Listen() {
    ...operations
    err := ctx.SaveBlock(block)
    // ..other operations
  }
} 

Therefore, from my side, I only see the benefit rather than the disadvantages using interface.

I think we want to support different inputs and conform them to the same output. I think your approach works for Blocks but not for chain-specific elements. Let's say chains A and B have different proposal data structures, how can we accommodate the Database interface to persist the different shapes of the proposal?

That is the reasons why the design only serves Block to interface and make Juno only responsible for block indexer. The example from you, Chains A and B haves different proposal data, then we should build chain A gov module in A Callisto and in B chain B gov module to serve two cases from my side.

I have an example how Soljuno define a custom db interface and custom client proxy interface for stake module only for Solana.

So, basically, it would be better to implement module in Callisto to handle the custom things you mentioned.

That's actually a great point, are we not able to mock/spy/stub the Database client?

Yeah, it is not easy to do that since currently worker does not have interface and everything is only for Cosmos. Say, the returning block is *types.ResultBlock inside node.Block method and its interface is too complex as well. And now enqueuer is a function that depends on the *parsing.Context instance, then start function includes many instance structure so we can not test it in a easy way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the interface/generalization of the datastore/underlying database.

Let's say you now want your worker to be able to bulk/batch persist your parsed transaction.

In Posgresql there is no such thing (there's COPY but it doesn't fit the example use case). You will end up with either a saving function like saveBulk(tx []Tx) or repeating the call to save(tx Tx).

Now with MongoDB use could have used the bulkWrite methods so your function signature will be save(tx []Tx, bulk bool), so you could conditionally call bulkWrite for your MongoDB implementation of your save method.

But we now have this bulk bool param that is only relevant to MongoDB's implementation, therefore we hindered the abstraction of the save method.

Another example would be Postgres prepared statements, which can help a bit with performances. Same thing if you want to control the underling implementation from the interface you will have to hinder the abstraction layer you've just created.


For the customization of the database repository.

The industry standard for many years now when it comes to the repository pattern is to have one repository per entity. I believe it has passed the test of time.

So we should have BlockRepository, TransactionRepository, etc with each of them having a save, load, etc.

I would refrain from having a Database with methods saveBlock, saveTransaction, etc.


From the research I have conducted online. Our Context struct should only have a reference to the database client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. It would be to better to be renamed into BlockRepository rather than Database name used by legacy Juno to make it clear.

```

### Worker

The Worker interface defines the functionality for parsing blocks from the queue within Juno's parser architecture.

```go
type Worker interface {
// Start starts the worker to parse blocks from the queue.
Start(ctx Context, queue BlockQueue)
}
```

### Scheduler

The Scheduler interface facilitates the execution of periodic operations within Juno's parser architecture.

```go
type PeriodicOperationsModule interface {
// RegisterPeriodicOperations allows to register all the operations that will be run on a periodic basis.
// The given scheduler can be used to define the periodicity of each task.
// NOTE. This method will only be run ONCE during the module initialization.
RegisterPeriodicOperations(scheduler Scheduler) error
Comment on lines +149 to +152
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of process did we have to launch on a periodic basis, in the last iteration of Juno?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually for Callisto for custom module like gov module, I think It would be better to remove it from Juno interfaces as it is actually related to business logic.

Thanks for pointing it out.

}

type Scheduler interface {
// Register registers a periodic operations module.
Register(module PeriodicOperationsModule)

// RegisterTask register a periodic task to scheduler.
RegisterTask(task func(), duration time.Duration)

// Start starts the scheduler.
Start(ctx Context)
}
```

### Module

The Module interface serves as the foundation for defining and extending functionality within Juno's parser architecture.

Within Juno, a fundamental module called BlockModule is provided, dedicated to handling blocks.

Additionally, developers have the flexibility to create custom modules tailored to specific business requirements, such as TransactionModule, GovModule, and others.

```go
type Module interface {
// Name returns the name of the module.
Name() string
}

type BlockModule interface {
// HandleBlock handles a block parsed by workers.
HandleBlock(block Block) error
}
```

### Additional operator

The Additional Operator interface provides functionality for executing additional operations on a additional basis within Juno's parser architecture.

Currently, we utilize the additional operation module to host a Prometheus server.

```go
type AdditionalOperationModule interface {
// RunAdditionalOperations runs all the additional operations required by the module.
// This is the perfect place where to initialize all the operations that subscribe to websockets or other
// external sources.
// NOTE. This method will only be run ONCE before starting the parsing of the blocks.
RunAdditionalOperations() error
}

type AdditionalOperator interface {
// Register registers an additional operation module.
Register(module AdditionalOperationModule)

// Start starts the additional operator.
Start(ctx Context)
}
```

## Consequences

### Backwards Compatibility

The decision to redesign Juno's parser architecture is not backwards compatible, as the current implementation only supports Cosmos SDK chains. The introduction of a new architecture with enhanced interfaces may require adjustments to existing codebases and integrations. However, this trade-off is necessary to accommodate the future expansion of Juno to support a wider range of blockchain networks beyond Cosmos SDK chains. Despite the lack of backwards compatibility, the long-term benefits of the redesigned architecture justify the need for this transition.

### Positive

* Improved Flexibility: The redesigned architecture will enable Juno to support a wider range of blockchain networks and adapt to evolving protocols more effectively.
* Enhanced Scalability: With a modular and extensible design, Juno will be better equipped to handle increasing volumes of transactions and messages from multiple chains.


### Negative

* Increased Development Complexity: Implementing a more sophisticated architecture may require additional development effort and resources.
* Potential Performance Overhead: Depending on the implementation, there may be a slight performance overhead associated with the modular design. However, this is outweighed by the benefits of improved flexibility and scalability.

### Neutral

unknown

## References

- {reference link}
60 changes: 60 additions & 0 deletions .docs/architecture/adr-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# ADR {ADR-NUMBER}: {TITLE}

## Changelog

- {date}: {changelog}

## Status

{DRAFT | PROPOSED} Not Implemented

> Please have a look at the [PROCESS](./PROCESS.md#adr-status) page.
> Use DRAFT if the ADR is in a draft stage (draft PR) or PROPOSED if it's in review.

## Abstract

> "If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the ADR.
> A short (~200 word) description of the issue being addressed.

## Context

> This section describes the forces at play, including technological, political, social, and project local. These forces are probably in tension, and should be called out as such. The language in this section is value-neutral. It is simply describing facts. It should clearly explain the problem and motivation that the proposal aims to resolve.
> {context body}

## Decision

> This section describes our response to these forces. It is stated in full sentences, with active voice. "We will ..."
> {decision body}

## Consequences

> This section describes the resulting context, after applying the decision. All consequences should be listed here, not just the "positive" ones. A particular decision may have positive, negative, and neutral consequences, but all of them affect the team and project in the future.

### Backwards Compatibility

> All ADRs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The ADR must explain how the author proposes to deal with these incompatibilities. ADR submissions without a sufficient backwards compatibility treatise may be rejected outright.

### Positive

{positive consequences}

### Negative

{negative consequences}

### Neutral

{neutral consequences}

## Further Discussions

While an ADR is in the DRAFT or PROPOSED stage, this section should contain a summary of issues to be solved in future iterations (usually referencing comments from a pull-request discussion).
Later, this section can optionally list ideas or improvements the author or reviewers found during the analysis of this ADR.

## Test Cases [optional]

Test cases for an implementation are mandatory for ADRs that are affecting consensus changes. Other ADRs can choose to include links to test cases if applicable.

## References

- {reference link}
Loading