-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: glue table creation with some docs on testing #59
base: main
Are you sure you want to change the base?
Conversation
5768681
to
6cb048b
Compare
catalog/catalog.go
Outdated
func getMetadataPath(locationPath string, newVersion int) (string, error) { | ||
if newVersion < 0 { | ||
return "", fmt.Errorf("invalid table version: %d must be a non-negative integer", newVersion) | ||
} | ||
|
||
metaDataPath, err := url.JoinPath(strings.TrimLeft(locationPath, "/"), "metadata", fmt.Sprintf("%05d-%s.metadata.json", newVersion, uuid.New().String())) | ||
if err != nil { | ||
return "", fmt.Errorf("failed to build metadata path: %w", err) | ||
} | ||
|
||
return metaDataPath, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these specific to glue?
And wouldn't generating the uuid and version etc. be something that should be in the table package rather than in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think there are a few couple of things like this that came out of me "using it in anger", I was interested in your thoughts but wasn't super stressed as it is private.
Hopefully we can push some of this down to the table.
catalog/catalog.go
Outdated
func getLocationForTable(location, defaultLocation, database, tableName string) (*url.URL, error) { | ||
if location != "" { | ||
return url.Parse(location) | ||
} | ||
|
||
if defaultLocation == "" { | ||
return nil, fmt.Errorf("no default path is set, please specify a location when creating a table") | ||
} | ||
|
||
u, err := url.Parse(defaultLocation) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to parse location URL: %w", err) | ||
} | ||
|
||
return u.JoinPath(fmt.Sprintf("%s.db", database), tableName), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the *.db
syntax path specific to glue? I don't think it's used in the REST catalog at all. Should this go into glue.go
instead of here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade I found this pattern in the python library, but I have seen it when I created a table using spark as well I think.
I thought it was quite neat, and wanted to dig into where it came from but didn't get a chance.
docs/cfn/AWS_TESTING.md
Outdated
--> | ||
|
||
# AWS integration testing | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why /docs/cfn/
? What is cfn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade good question, it was currently a bit adhoc, and mostly my "notes" and code related to manaul testing so I put it in docs.
Happy to move it up a level, it was my view that it would probably form some examples or reference infrastructure at some point.
Again interested in your thoughts, I like to provide lots of resources to consumers, giving them as much as possible to ensure they succeed.
@@ -399,3 +400,32 @@ func (m *MetadataV2) UnmarshalJSON(b []byte) error { | |||
m.preValidate() | |||
return m.validate() | |||
} | |||
|
|||
func NewMetadataV2(schema *iceberg.Schema, partitionSpec iceberg.PartitionSpec, sortOrder SortOrder, location string, tableUUID uuid.UUID, properties iceberg.Properties) (*MetadataV2, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a builder pattern like we do for manifests? Instead of a single func with a ton of params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade Yeah i thought about that, the only "optional" thing is properties, some have "empty values" like sort order and partition spec so it may work OK.
I will try out a builder and post an update.
any updates @wolfeidau? |
@zeroshade I will dig into this in the next few days, I have implemented the builder for tables but need a way to have a generic path builder. Given I have time off next week I can dig into it more. |
* moved cfn to root of the project and ditched the docs folder * moved table meta data name generation to table package * added a builder for the table * isolated and generified the metadata related writing * remove the default location as it isn't neccessary at the moment
16ac16d
to
0495687
Compare
@zeroshade I have had a shot at tidying this feature and addressing your feedback.
|
This change adds table creation which for the most part replicates the functionality in the iceberg python library.
I have verified this works with the glue service now by using spark, and the python iceberg client to interact with the table.