Analyzing the DAG reader

It is comprised of the group of structures in the `unixfs.io` package capable of doing a DFS graph traversal of a file DAG. It reads (and performs similar operations) in the same sequence the DAG was built, so the `importer` is a good reference (once #5118 gets merged) to better understand it. I leave some notes here on what could be improved to make it more accessible to the new reader.

The recursive approach is hidden behind the `PBDagReader` (parent) and its `ReadSeekCloser` (child) attribute. The first doesn't call the second, instead it interacts with the interface which the second implements. The interface, although it provides a nice generalization, it obscures the very simple algorithm to traverse a file DAG that only has internal `ProtoNode`s and leaf `RawNode`s (represented at the UnixFS layer as `File` and `Raw` types). This could be made more explicit to leverage the concepts that the reader could already have of what a file DAG is (e.g., if it's already familiarized with the `importer`). Could an iterative algorithm help clarify how the reader operates? Instead of the implicit call stack with the `PBDagReader`s at different depths of the tree (making up the path to the current leaf being read) we could have an explicit stack with the information of each edge of the path at the different depths (this stack would pretty much be the reader itself with a few other attributes of control).

Node promises. Many nodes are being requested at the same time (preloaded) to streamline the graph traversal, this mechanism is abstracted through the `ipld.NodePromise` structure and it should be made very clear to the reader what a promise is, *why* is it used (see https://github.com/ipfs/go-ipfs/pull/5162#issuecomment-402213069). The name of the `promises` attribute of the `PBDagReader` should be changed to reflect that those are actually the child nodes (that will be visited in the future), to focus on *what* they represent  instead of *how* those nodes are fetched (through promises).

The direct manipulation of the protobuf `unixfs.pb.Data` structure in `PBDagReader.pbdata` should be replaced with the more refined `unixfs.FSNode` structure (which contains `unixfs.pb.Data`).

Remove unused `node` attribute (`*mdag.ProtoNode`) of the `PBDagReader`.

As said before the role of the buffer (`buf` attribute) may be misleading, it should be clear that this is the child node one level below being accessed by another `PBDagReader` (or `BufDagReader` in case we've reached the leaf). Using the `ReadSeekCloser` interface instead of `DagReader` (which contains it) is correct but may be confusing if not clarified well enough. (Revisit this point.)

The `WriteTo` method has almost an identical logic as `CtxReadFull` but its code is structured differently (especially the error handling part) which makes it harder to assimilate what it does (or more precisely, *how* it does it).

`Read` and `Seek` (from `io.SeekStart`) do very similar traversal operations, the first one counting sizes and the second reading file content (actually both are performing reading operations on UnixFS nodes), could that part of the code be merged? Maybe a new structure could be introduced (at the `unixfs` level) that would handle *how* to traverse a DAG, this structure could receive a method of *what* to do in each node it visits during the DFS (something like `DFSTraversalFileDAG` or `DFSFileDAG`).

The name `io` of the package that contains it is a bit misleading, the DAG modifier that would normally be also considered as standard I/O has its own `mod` package, maybe the DAG reader could be moved to a separate `reader` package and rethink what the `io` package should contain (if anything).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analyzing the DAG reader #5192

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Analyzing the DAG reader #5192

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions