Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go lang transformers of content tree #51

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Go lang transformers of content tree #51

wants to merge 5 commits into from

Conversation

epavlova
Copy link

This is another draft iteration on the topic of Go transformers of a content tree. Any feedback is welcome.

Implementation of content tree with Go stucts:

  • The structs in content_tree.go are semi-automatically created using https://github.com/a-h/generate based on the JSON schema of full content tree.
  • The current implementation tries so solve the problem of having the whole content tree unmarshalled in memory, in Go objects, so that subsequent transformer can work with it.
  • Representing a tree with nodes from different types lead me to two possible solutions. One is to create a mega type which contain every possible field a node can have and use that as content tree node (possibly much more simple, no so pretty). The other is common interface, choose this one as it's more strict.
  • The other challenging bit was having heterogenous children, solved by introducing artificial node types with embedded types.
  • This big content_tree.go file can be split, just the receivers need to be in the same package as the structs.

Potential improvements:

  • There are other options on how the different language files can co-exist in the folder structure, I don't have strong feelings, happy to change the current implementation.
  • No CI mechanism is implemented for the Go code.
  • A lot more...

Questions and cries for help:

  • Can a content tree really contain "undefined" nodes? This PR doesn't handle them, the code is panicking.
  • Updated a test case containing only body of a tree. Made it starting form the root. Is it safe to assume that the transformers will always work on the root?
  • Removed double spaces from the outputs of stringify test cases, is this okay?
  • One of the old post test cases contains <pull-quote-image> html tag which don't have content tree node representation?
  • I gave up on the getting tags from content tree, no idea how this works. Any help is welcome. The content tree to external body xml is in progress.

The Go structs allow representation of full content tree
and converting a JSON content tree to Go objects (unmarshalling).
Add Go mod definition in the main dir of the project.

The implementation does not handle "undefined" content tree nodes.
The Go transformer (JSON tree -> plain text) uses the contenttree
package and its base content tree representation. The whole content
tree is unmashalled into Go objects thus the input of the
transformer needs to comply to the contenttree package
implementation.
The update is moving from content tree body as input of the test
to content tree root. This assumes that the transformer should
receive the whole content tree, starting from the root.
Remove double spaces from the output of the test cases for
content tree to plain text transformers.
The trasformer relies on the content tree representation in the
main contenttree package.
@chee
Copy link
Member

chee commented Apr 3, 2024

Can a content tree really contain "undefined" nodes? This PR doesn't handle them, the code is panicking.
no! that isn't really allowed, if you mean the __unknown__ nodes from that other lib, that is just there so the code can carry on if it finds something unexpected to help the developer understand and debug the tree. they shouldn't ever be published

Updated a test case containing only body of a tree. Made it starting form the root. Is it safe to assume that the transformers will always work on the root?

yes, that's a good idea.

Removed double spaces from the outputs of stringify test cases, is this okay?

yeah, i can't remember why those were there.

One of the old post test cases contains html tag which don't have content tree node representation?
I gave up on the getting tags from content tree, no idea how this works. Any help is welcome. The content tree to external body xml is in progress.

ah yes. so one of the rules of content-tree was "if it can't be done in spark it can't be done at all" and in spark (and ft.com) there is not actually such a thing as pull-quote-image? i don't know what it was historically. so i think i'm just throwing it away, i don't know what else to do !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants