Skip to content

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Sep 18, 2025

What changes were proposed in this pull request?

This PR fixes the thread-safetye issue in SortShuffleManager.unregisterShuffle by enforcing synchronous lock on mapTaskIds's iteration. Besides, this PR also addresses the concern to enfore the type of taskIdMapsForShuffle as ConcurrentHashMap to ensure its thread-safety.

Why are the changes needed?

Fix the potential thread-safety issue as pointed at #52337 (comment) and also the concern.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A

Was this patch authored or co-authored using generative AI tooling?

No.

// var for testing
var _blockManager: BlockManager,
val taskIdMapsForShuffle: JMap[Int, OpenHashSet[Long]])
val taskIdMapsForShuffle: ConcurrentHashMap[Int, OpenHashSet[Long]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the type be the ConcurrentMap interface?


def this(conf: SparkConf) = {
this(conf, null, Collections.emptyMap())
this(conf, null, new ConcurrentHashMap[Int, OpenHashSet[Long]]())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    class EmptyConcurrentMap<K, V> implements ConcurrentMap<K, V> {
        private EmptyConcurrentMap() {
        }

        public V putIfAbsent(K key, V value) {
            return null;
        }

        public boolean remove(Object key, Object value) {
            return false;
        }

        public boolean replace(K key, V oldValue, V newValue) {
            return false;
        }

        public V replace(K key, V value) {
            return null;
        }

        public int size() {
            return 0;
        }

        public boolean isEmpty() {
            return true;
        }

        public boolean containsKey(Object key) {
            return false;
        }

        public boolean containsValue(Object value) {
            return false;
        }

        public V get(Object key) {
            return null;
        }

        public V put(K key, V value) {
            return null;
        }

        public V remove(Object key) {
            return null;
        }

        public void putAll(Map<? extends K, ? extends V> m) {
        }

        public void clear() {
        }

        public Set<K> keySet() {
            return Collections.emptySet();
        }

        public Collection<V> values() {
            return Collections.emptySet();
        }

        public Set<Map.Entry<K, V>> entrySet() {
            return Collections.emptySet();
        }
    }

Is it possible to define a similar EmptyConcurrentMap and also use a singleton pattern for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems to be overkill to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants