Skip to content

Conversation

@dlmarion
Copy link
Contributor

As part of #5775 ThriftTransportPool.getAnyCachedTransport was modified to perform a ZooKeeper lookup to get the ResourceGroup for a server because that information is not available on the ThriftTransportKey. ThriftTransportKey contains the HostAndPort only. The reason for this change is to make the ResourceGroup available for the ResourceGroupPredicate that is passed into ThriftTransportPool.getAnyCachedTransport so that the ZooKeeper lookup can be removed.

The server address is currently represented by either a String or a HostAndPort object in the code. Additionally the addresses are persisted in file paths in HDFS and in the root and metadata table. I opted to use the ServerId object as a replacement as it already contains the necessary information and is part of the public API, but making the change touched a lot of code.

Closes #5848

As part of apache#5775 ThriftTransportPool.getAnyCachedTransport was modified
to perform a ZooKeeper lookup to get the ResourceGroup for a server
because that information is not available on the ThriftTransportKey.
ThriftTransportKey contains the HostAndPort only. The reason for this
change is to make the ResourceGroup available for the ResourceGroupPredicate
that is passed into `ThriftTransportPool.getAnyCachedTransport` so that
the ZooKeeper lookup can be removed.

The server address is currently represented by either a String or a
HostAndPort object in the code. Additionally the addresses are persisted
in file paths in HDFS and in the root and metadata table. I opted to use
the ServerId object as a replacement as it already contains the necessary
information and is part of the public API, but making the change touched
a lot of code.

Closes apache#5848
@dlmarion dlmarion added this to the 4.0.0 milestone Sep 10, 2025
@dlmarion dlmarion self-assigned this Sep 10, 2025
MANAGER, MINI, MONITOR, GARBAGE_COLLECTOR, COMPACTOR, SCAN_SERVER, TABLET_SERVER;
}

public static record ServerIdInfo(String type, String resourceGroup, String host, int port) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerIdInfo seems like its needed for implementation of caching and serialization, but maybe not intended to be in the public API? Could move it outside f the public API or make it package private if that is the case.

No idea if this makes sense or would be workable, but looking at this wonder if ServerId should be a record type itself w/ supporting static methods some that are public API and some that are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured some of my changes to ServerId might need to be re-worked (this, as well as HostAndPort). I'm trying to get all of the ITs to pass ( I had 6 fail last night).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerIdInfo seems like its needed for implementation of caching and serialization, but maybe not intended to be in the public API?

I moved this and other internal implementation methods to a new class, ServerIdUtil, in fb6acbb.

No idea if this makes sense or would be workable, but looking at this wonder if ServerId should be a record

I looked at this, and I don't think it can be done right now due to the transient field hostPort.

private transient HostAndPort hostPort;

public ServerId(Type type, ResourceGroupId resourceGroup, String host, int port) {
private ServerId(Type type, ResourceGroupId resourceGroup, String host, int port) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could also get the session id added to this type then we would know everything about a server. Not a change for this PR, but I suspect this PR would make that easier to do. May be able to replace TServerInstance type w/ ServerId if that were done.

+ ", port= " + port + "]";
}

public String toWalFileName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an implementation thing that should be moved out of the public API, maybe to a static method somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to break ServerId into a client interface and an Impl object, but it causes a ton of changes. I'm going to take your suggestion and create a utility class with static methods to get the implementation details out of the client class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this and other internal implementation methods to a new class, ServerIdUtil, in fb6acbb.

return port;
}

public synchronized HostAndPort getHostPort() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not want to introduce the HostAndPort type into the public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need an API class and an internal class. There is a lot of server-side code that uses HostAndPort and we don't want to be constantly translating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed HostAndPort from the ServerId public methods in 7f11006

public interface TabletServerId extends Comparable<TabletServerId> {
String getHost();

int getPort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had the session in the ServerId, then this whole type in the spi could go away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TServerInstance could go away as well.

Iterator<TServerInstance> find = tLists.destinations
.tailMap(new TServerInstance(tm.getSuspend().server, " ")).keySet().iterator();
Iterator<TServerInstance> find =
tLists.destinations.tailMap(tm.getSuspend().server).keySet().iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably still need to create the TServerInstance w/ the empty session string. This code is trying to find any server where the host and port match, but the sesssion id does not matter. The empty session string sorts before all. This code needs to be refactored so it can be unit tested in another PR. Wonder if the existing code works as it depends heavily on how those sort, which could have changed after this code was written.

Suggested change
tLists.destinations.tailMap(tm.getSuspend().server).keySet().iterator();
tLists.destinations.tailMap(new TServerInstance(tm.getSuspend().server, " ")).keySet().iterator();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started working on a little unit test for this code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the code to use zero when the session is unknown. I think I have done that in this branch, but maybe not in all places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #5885

I think we should change the code to use zero when the session is unknown

hmm, I don't know if the session can be negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into handling of negative and zero for that code, it sorts using unsigned hex so zero would be minimum even if there are negative numbers. In #5885 updated the impl to use zero and the test use a negative session and it all works. The test should catch any changes in sorting of zero relative to negative values.

*
*/
public Collection<String> getErrorServers() {
public Collection<ServerId> getErrorServers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is public API. Its in a dark corner of the API, could change it. May be easiest to just leave itas string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, my concern about leaving these types of things as String, is that there is not enough information to reconstitute a ServerId in the event that it's needed.

}

public Set<String> getTimedOutSevers() {
public Set<ServerId> getTimedOutSevers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could leave this as string since this is a public API method

* @return A tablet server location in the form of {@code <host>:<port>}
*/
String getTabletLocation(TabletId tabletId);
ServerId getTabletLocation(TabletId tabletId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could leave this as string also since its public API

}

private final Map<String,ServerQueue> serverQueues;
private final Map<ServerId,ServerQueue> serverQueues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice, changing from string to ServerId. These changes can be tricky though because some of the map methods take Object so it will still compile when a string is passed. My IDE warns on these types of mismatches. Wonder if anything in the maven build will do that.

"Invalid server id in wal file: " + name);
if (parts.length == 2) {
// return an uncached tserver object
return tserver(parts[0], Integer.parseInt(parts[1]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called from LogEntry.validatedLogEntry which returns a LogEntry object. The log path is stored in the tablet metadata in LogColumnFamily. This code here handles the current (not new) serialization of the server being just host+port. Creating the ServerId object here assumes the default ResourceGroup because we don't have any further information. This assumption could be dangerous if the LogEntry's ResourceGroup is used for comparison purposes in code that uses LogEntry.

Comment on lines +58 to +61
// TODO: WAL marker serializations using the old format could present a
// problem. If we change the code here to handle the old format, then
// we have to make a guess at the resource group, which could affect
// callers of this method (GcWalsFilter, WalStateManager, LiveTServerSet)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another area of concern for dealing with serialized information.

@dlmarion dlmarion marked this pull request as ready for review September 19, 2025 15:01
@dlmarion dlmarion marked this pull request as draft September 19, 2025 16:20
@dlmarion
Copy link
Contributor Author

#5899 renders the need for these changes obsolete. I'm going to leave this here in case we decide to use some of these changes w/r/t replacing server address (String) with ServerId.

@keith-turner
Copy link
Contributor

#5899 renders the need for these changes obsolete. I'm going to leave this here in case we decide to use some of these changes w/r/t replacing server address (String) with ServerId.

Using a more specific type than string for that code would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize filtering of cached transports for resource group

2 participants