Skip to content

Read SFP temperature from TRANSCEIVER_DOM_TEMPERATURE table#60

Merged
judyjoseph merged 1 commit intoAzure:202412from
vvolam:get_sfp_thermals
Feb 25, 2026
Merged

Read SFP temperature from TRANSCEIVER_DOM_TEMPERATURE table#60
judyjoseph merged 1 commit intoAzure:202412from
vvolam:get_sfp_thermals

Conversation

@vvolam
Copy link

@vvolam vvolam commented Feb 24, 2026

Cherry-pick: sonic-net/sonic-platform-daemons#747

Description

This change modifies thermalctld to read SFP temperature and threshold data from Redis tables instead of making direct platform API calls to the hardware.

Changes:

  • Read SFP temperature from TRANSCEIVER_DOM_TEMPERATURE table first, falling back to TRANSCEIVER_DOM_SENSOR table if not present
  • Read SFP thresholds from TRANSCEIVER_DOM_THRESHOLD table first, falling back to TRANSCEIVER_DOM_SENSOR table if not present
  • Use SfpUtilHelper.get_physical_to_logical() API to map SFP physical index to logical port name for Redis table lookup
  • Add new _init_sfp_util_helper() method to initialize port mappings
  • Add new _get_sfp_temperature_from_db() method to read temperature from Redis
  • Add new _refresh_sfp_temperature_status() method to update SFP thermal status from Redis data

Motivation and Context

The xcvrd daemon already reads SFP DOM sensor data from hardware and populates Redis tables in STATE_DB. Having thermalctld also read directly from hardware causes:

  1. Duplicate hardware access - Both xcvrd and thermalctld poll the same I2C devices
  2. I2C bus contention - Multiple processes accessing the same bus can cause delays and errors
  3. Inconsistent data - Different polling intervals can result in different temperature readings

This change makes thermalctld consume the cached data from xcvrd instead of polling hardware directly, reducing I2C traffic and ensuring consistent temperature data across SONiC components.

Related HLD: https://github.com/sonic-net/SONiC/blob/master/doc/nvidia-thermal-algorithm/improve-sonic-thermal-algo.md

How Has This Been Tested?

Tested on Nvidia SN5640 platform with 66 SFP modules:

  1. Verified temperature values match Redis source:

    redis-cli -n 6 hgetall "TRANSCEIVER_DOM_SENSOR|Ethernet0" | grep -A1 temperature
    # Returns: 53.809
    
    show platform temperature | grep "xSFP module 1"
    # Shows: 53.809
  2. Verified thresholds are read from TRANSCEIVER_DOM_THRESHOLD table:

 redis-cli -n 6 hgetall "TRANSCEIVER_DOM_THRESHOLD|Ethernet0"
 # Contains: temphighalarm, templowalarm, temphighwarning, templowwarning

 show platform temperature
 # Shows correct thresholds: High TH=75.0, Low TH=-5.0, Crit High=80.0, Crit Low=-10.0
  1. Verified all 66 SFP modules show temperature data in show platform temperature
  2. Verified TEMPERATURE_INFO table is correctly populated:
    redis-cli -n 6 hgetall "TEMPERATURE_INFO|xSFP module 1 Temp"
    # Contains: temperature, high_threshold, low_threshold, critical_high_threshold, etc.

Additional Information (Optional)
The implementation follows the HLD design which specifies:

Temperature: Read from STATE_DB::TRANSCEIVER_DOM_TEMPERATURE|Ethernet*.temperature
Thresholds: Read from STATE_DB::TRANSCEIVER_DOM_THRESHOLD|Ethernet*.temphighwarning/temphighalarm
Fallback to TRANSCEIVER_DOM_SENSOR table is provided for backward compatibility with platforms that don't have the new tables populated yet.

Command output after the changes:

$ show plat temp
                Sensor    Temperature    High TH    Low TH    Crit High TH    Crit Low TH    Warning          Timestamp
----------------------  -------------  ---------  --------  --------------  -------------  ---------  -----------------
                  ASIC           80.0        105       N/A             120            N/A      False  20260211 19:13:38
 Ambient Fan Side Temp         41.937        N/A       N/A             N/A            N/A      False  20260211 19:13:38
Ambient Port Side Temp         42.187        N/A       N/A             N/A            N/A      False  20260211 19:13:38
         CPU Pack Temp          43.25       95.0       N/A           100.0            N/A      False  20260211 19:13:38
            PSU-1 Temp            N/A        N/A       N/A             N/A            N/A      False  20260211 19:13:38
            PSU-2 Temp           42.5       63.0       N/A             N/A            N/A      False  20260211 19:13:38
            PSU-3 Temp            N/A        N/A       N/A             N/A            N/A      False  20260211 19:13:38
            PSU-4 Temp           41.0       63.0       N/A             N/A            N/A      False  20260211 19:13:38
         SODIMM 2 Temp          43.25       85.0       N/A            95.0            N/A      False  20260211 19:13:38
    xSFP module 1 Temp         53.809       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 2 Temp         59.086       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 3 Temp         56.512       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 4 Temp         53.934       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 5 Temp          54.75       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 6 Temp         64.375       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 7 Temp         63.812       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 8 Temp         55.645       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
    xSFP module 9 Temp         56.277       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38
   xSFP module 10 Temp         64.066       75.0      -5.0            80.0          -10.0      False  20260211 19:13:38

The xcvrd daemon reads SFP DOM sensor data from hardware and populates
Redis tables in STATE_DB. This change modifies thermalctld to read SFP
temperature data from these Redis tables instead of making direct
platform API calls to the hardware.

Temperature reading:
- First tries TRANSCEIVER_DOM_TEMPERATURE table
- Falls back to TRANSCEIVER_DOM_SENSOR table if not present

Threshold reading:
- First tries TRANSCEIVER_DOM_THRESHOLD table
- Falls back to TRANSCEIVER_DOM_SENSOR table if not present

Port mapping:
- Uses SfpUtilHelper.get_physical_to_logical() API to map SFP index
  to logical port name for Redis table lookup

Benefits:
- Avoids duplicate hardware access (xcvrd already reads this data)
- Reduces I2C bus contention
- Uses cached data from xcvrd which is already available

Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
@vvolam vvolam requested review from judyjoseph and r12f February 24, 2026 22:45
Copy link

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@judyjoseph judyjoseph merged commit 9ad4c7f into Azure:202412 Feb 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants