Skip to content

Analyzing the DAG reader #5192

@schomatis

Description

@schomatis

It is comprised of the group of structures in the unixfs.io package capable of doing a DFS graph traversal of a file DAG. It reads (and performs similar operations) in the same sequence the DAG was built, so the importer is a good reference (once #5118 gets merged) to better understand it. I leave some notes here on what could be improved to make it more accessible to the new reader.

The recursive approach is hidden behind the PBDagReader (parent) and its ReadSeekCloser (child) attribute. The first doesn't call the second, instead it interacts with the interface which the second implements. The interface, although it provides a nice generalization, it obscures the very simple algorithm to traverse a file DAG that only has internal ProtoNodes and leaf RawNodes (represented at the UnixFS layer as File and Raw types). This could be made more explicit to leverage the concepts that the reader could already have of what a file DAG is (e.g., if it's already familiarized with the importer). Could an iterative algorithm help clarify how the reader operates? Instead of the implicit call stack with the PBDagReaders at different depths of the tree (making up the path to the current leaf being read) we could have an explicit stack with the information of each edge of the path at the different depths (this stack would pretty much be the reader itself with a few other attributes of control).

Node promises. Many nodes are being requested at the same time (preloaded) to streamline the graph traversal, this mechanism is abstracted through the ipld.NodePromise structure and it should be made very clear to the reader what a promise is, why is it used (see #5162 (comment)). The name of the promises attribute of the PBDagReader should be changed to reflect that those are actually the child nodes (that will be visited in the future), to focus on what they represent instead of how those nodes are fetched (through promises).

The direct manipulation of the protobuf unixfs.pb.Data structure in PBDagReader.pbdata should be replaced with the more refined unixfs.FSNode structure (which contains unixfs.pb.Data).

Remove unused node attribute (*mdag.ProtoNode) of the PBDagReader.

As said before the role of the buffer (buf attribute) may be misleading, it should be clear that this is the child node one level below being accessed by another PBDagReader (or BufDagReader in case we've reached the leaf). Using the ReadSeekCloser interface instead of DagReader (which contains it) is correct but may be confusing if not clarified well enough. (Revisit this point.)

The WriteTo method has almost an identical logic as CtxReadFull but its code is structured differently (especially the error handling part) which makes it harder to assimilate what it does (or more precisely, how it does it).

Read and Seek (from io.SeekStart) do very similar traversal operations, the first one counting sizes and the second reading file content (actually both are performing reading operations on UnixFS nodes), could that part of the code be merged? Maybe a new structure could be introduced (at the unixfs level) that would handle how to traverse a DAG, this structure could receive a method of what to do in each node it visits during the DFS (something like DFSTraversalFileDAG or DFSFileDAG).

The name io of the package that contains it is a bit misleading, the DAG modifier that would normally be also considered as standard I/O has its own mod package, maybe the DAG reader could be moved to a separate reader package and rethink what the io package should contain (if anything).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions