Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA HMS support #752

Merged
merged 11 commits into from
Sep 25, 2024
Merged

HA HMS support #752

merged 11 commits into from
Sep 25, 2024

Conversation

awdavidson
Copy link
Contributor

@awdavidson awdavidson commented May 20, 2024

Changes Proposed in this PR

Support a HA HMS
URI such as uri: thrift://hms-1:9083,thrift://hms-2:9083 currently is not supported. This change supports HA HMS were each entry will be tried until a successful connection

Closes #135

@awdavidson awdavidson changed the title HA and kerberos HMS support HA HMS support Jun 27, 2024
@awdavidson awdavidson marked this pull request as ready for review June 27, 2024 08:34
@awdavidson
Copy link
Contributor Author

@Fokko @yothinix are you interested in this too?

@Fokko
Copy link
Contributor

Fokko commented Jun 28, 2024

@awdavidson This looks like a nice addition. Could you add a test as well?

Comment on lines 274 to 284
@staticmethod
def _create_hive_client(properties: Dict[str, str]) -> _HiveClient:
uris = properties["uri"].split(",")
for idx, uri in enumerate(uris):
try:
return _HiveClient(uri, properties.get("ugi"))
except BaseException as e:
if idx + 1 == len(uris):
raise e
else:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I make a suggestion to make it a bit more Pythonic (and easier to follow IMHO):

Suggested change
@staticmethod
def _create_hive_client(properties: Dict[str, str]) -> _HiveClient:
uris = properties["uri"].split(",")
for idx, uri in enumerate(uris):
try:
return _HiveClient(uri, properties.get("ugi"))
except BaseException as e:
if idx + 1 == len(uris):
raise e
else:
continue
@staticmethod
def _create_hive_client(properties: Dict[str, str]) -> _HiveClient:
uris = properties["uri"]
e = ValueError(f"Invalid URI: {uris}")
for uri in uris.split(","):
try:
return _HiveClient(uri, properties.get("ugi"))
except BaseException as e:
pass
raise e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than throwing a generic ValueError I have updated to capture and throw the last exception if that code path is reached. Let me know what you think

@awdavidson
Copy link
Contributor Author

@awdavidson This looks like a nice addition. Could you add a test as well?

Unit tests have been added

@awdavidson awdavidson requested a review from Fokko July 5, 2024 07:25
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Regarding tests, I think we should cover the following scenario

  • 1 uri
  • 2 uris; both passes
  • 2 uris; 1 fail, fallback to the second
  • 2 uris; both fail, error
  • (optional) 2 uris; 1 pass, second one not tried

}

with patch('pyiceberg.catalog.hive._HiveClient') as mock_hive_client:
mock_hive_client.side_effect = [Exception("Connection failed"), MagicMock()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean the first connection will fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments. Yes this means the first connections will fail.

Scenario 2 uris; both passes will never be a valid case, if the first uri passes the second is never tried. So the only additional test would be to add one covering (optional) 2 uris; 1 pass, second one not tried

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i think that should cover all the cases

Comment on lines 1221 to 1222
mock_hive_client.assert_any_call("thrift://localhost:10000", "user")
mock_hive_client.assert_any_call("thrift://localhost:10001", "user")
Copy link
Contributor

@kevinjqliu kevinjqliu Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert that both 10000 and 10001 are called once, assert_called_once_with

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for covering the test scenarios

Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
if last_exception is not None:
raise e
else:
raise ValueError(f"Unable to connect to hive using uri: {properties["uri"]}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another lint issue with f-string!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 sorry - tomorrow will setup and double check from personal laptop (should have done this to start with)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kevinjqliu kevinjqliu merged commit e5a58b3 into apache:main Sep 25, 2024
8 checks passed
@awdavidson awdavidson deleted the hive-improvements branch September 25, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

using pyiceberg with kerberized hive metastore
3 participants