Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: glue table creation with some docs on testing #59

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

wolfeidau
Copy link
Contributor

@wolfeidau wolfeidau commented Feb 2, 2024

This change adds table creation which for the most part replicates the functionality in the iceberg python library.

I have verified this works with the glue service now by using spark, and the python iceberg client to interact with the table.

@wolfeidau wolfeidau force-pushed the feat_create_table_v2 branch from 5768681 to 6cb048b Compare February 20, 2024 19:51
@wolfeidau wolfeidau marked this pull request as ready for review February 20, 2024 19:52
Comment on lines 201 to 212
func getMetadataPath(locationPath string, newVersion int) (string, error) {
if newVersion < 0 {
return "", fmt.Errorf("invalid table version: %d must be a non-negative integer", newVersion)
}

metaDataPath, err := url.JoinPath(strings.TrimLeft(locationPath, "/"), "metadata", fmt.Sprintf("%05d-%s.metadata.json", newVersion, uuid.New().String()))
if err != nil {
return "", fmt.Errorf("failed to build metadata path: %w", err)
}

return metaDataPath, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these specific to glue?

And wouldn't generating the uuid and version etc. be something that should be in the table package rather than in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think there are a few couple of things like this that came out of me "using it in anger", I was interested in your thoughts but wasn't super stressed as it is private.

Hopefully we can push some of this down to the table.

Comment on lines 214 to 213
func getLocationForTable(location, defaultLocation, database, tableName string) (*url.URL, error) {
if location != "" {
return url.Parse(location)
}

if defaultLocation == "" {
return nil, fmt.Errorf("no default path is set, please specify a location when creating a table")
}

u, err := url.Parse(defaultLocation)
if err != nil {
return nil, fmt.Errorf("failed to parse location URL: %w", err)
}

return u.JoinPath(fmt.Sprintf("%s.db", database), tableName), nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the *.db syntax path specific to glue? I don't think it's used in the REST catalog at all. Should this go into glue.go instead of here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroshade I found this pattern in the python library, but I have seen it when I created a table using spark as well I think.

I thought it was quite neat, and wanted to dig into where it came from but didn't get a chance.

-->

# AWS integration testing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why /docs/cfn/ ? What is cfn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroshade good question, it was currently a bit adhoc, and mostly my "notes" and code related to manaul testing so I put it in docs.

Happy to move it up a level, it was my view that it would probably form some examples or reference infrastructure at some point.

Again interested in your thoughts, I like to provide lots of resources to consumers, giving them as much as possible to ensure they succeed.

@@ -399,3 +400,32 @@ func (m *MetadataV2) UnmarshalJSON(b []byte) error {
m.preValidate()
return m.validate()
}

func NewMetadataV2(schema *iceberg.Schema, partitionSpec iceberg.PartitionSpec, sortOrder SortOrder, location string, tableUUID uuid.UUID, properties iceberg.Properties) (*MetadataV2, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a builder pattern like we do for manifests? Instead of a single func with a ton of params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroshade Yeah i thought about that, the only "optional" thing is properties, some have "empty values" like sort order and partition spec so it may work OK.

I will try out a builder and post an update.

@zeroshade
Copy link
Member

any updates @wolfeidau?

@wolfeidau
Copy link
Contributor Author

@zeroshade I will dig into this in the next few days, I have implemented the builder for tables but need a way to have a generic path builder.

Given I have time off next week I can dig into it more.

wolfeidau added 4 commits June 9, 2024 13:15
* moved cfn to root of the project and ditched the docs folder
* moved table meta data name generation to table package
* added a builder for the table
* isolated and generified the metadata related writing
* remove the default location as it isn't neccessary at the moment
@wolfeidau wolfeidau force-pushed the feat_create_table_v2 branch from 16ac16d to 0495687 Compare June 9, 2024 03:20
@wolfeidau
Copy link
Contributor Author

wolfeidau commented Jun 9, 2024

@zeroshade I have had a shot at tidying this feature and addressing your feedback.

  • moved cfn to root of the project and ditched the docs folder
  • moved table meta data name generation to table package
  • added a builder for the table
  • isolated and generified the metadata related writing
  • remove the default location as it isn't necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants