Skip to content

🔴 CRITICAL CRASH: VPP aborts on ping to local address - ip4-load-balance reads wrong pool causing assertion failure #3667

@killerstemp

Description

@killerstemp

🚨 CRITICAL BUG - VPP Process Termination

CONFIRMED: VPP crashes and terminates when receiving ICMP echo request to local interface addresses. This is a 100% reproducible crash that causes DoS.

Crash Evidence

💥 VPP Process Aborted with Assertion Failure

Full stack trace:

#0  0x00007ffff6a66eef in ?? () from /usr/lib64/libc.so.6
#1  0x00007ffff6a1ad36 in raise () from /usr/lib64/libc.so.6
#2  0x00007ffff6a06177 in abort () from /usr/lib64/libc.so.6
#3  0x0000000000407763 in os_panic () at /test/fdio-vpp/src/vpp/vnet/main.c:456
#4  0x00007ffff6cf4ed9 in debugger () at /test/fdio-vpp/src/vppinfra/error.c:84
#5  0x00007ffff6cf4c92 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, 
    fmt=0x7ffff6ec519c "%s:%d (%s) assertion `%s' fails")
    at /test/fdio-vpp/src/vppinfra/error.c:143
#6  0x00007ffff6e14e65 in vlib_node_runtime_get_next_frame (vm=0x7fffb68958c0, 
    n=0x7fffb7570d00, next_index=14) at /test/fdio-vpp/src/vlib/node_funcs.h:370
                                      ^^^^^^^^^^^^^^^^ INVALID (garbage from wrong pool)
#7  0x00007ffff6e14be6 in vlib_get_next_frame_internal (vm=0x7fffb68958c0, 
    node=0x7fffb7570d00, next_index=14, allocate_new_next_frame=0)
    at /test/fdio-vpp/src/vlib/main.c:341
...
#12 0x00007ffff782c44f in ip4_load_balance_node_fn_hsw (vm=0x7fffb68958c0, 
    node=0x7fffb7570d00, frame=0x7fffb8e6d300) 
    at /test/fdio-vpp/src/vnet/ip/ip4_forward.c:261
                                                ^^^ CRASH HERE

Crash Point: src/vnet/ip/ip4_forward.c:261 in ip4_load_balance_node_fn_hsw()
Assertion: Invalid next_index=14 from corrupted load_balance object

Root Cause: Type Confusion Between DPO Pools

The Bug

vnet_buffer(b)->ip.adj_index[VLIB_TX] is used to store two different types of indices in different contexts:

  1. In ip4-lookup: Stores dpo0->dpoi_index which may be an index to receive_dpo_pool (for local addresses)
  2. In ip4-load-balance: Reads it as an index to load_balance_pool

These are completely different memory pools → memory corruption → crash.

Detailed Flow

Step 1: Incoming Ping Request (ip4-lookup)
─────────────────────────────────────────────────
dst = 192.168.1.1 (local interface)
lbi0 = ip4_fib_forwarding_lookup() → returns 50 (load_balance_pool index)
lb0 = load_balance_get(50)         → valid load_balance object
dpo0 = lb0->buckets[0]             → {type=DPO_RECEIVE, index=7}
                                            ^^^^^^^^^^^^^^^^^^
                                            index to receive_dpo_pool

❌ BUG: Store wrong index type
vnet_buffer(b)->ip.adj_index[VLIB_TX] = 7;  ← receive_dpo_pool index!
                                              NOT load_balance_pool index!
Source: src/vnet/ip/ip4_forward.h:355

Step 2: ICMP Echo Request Processing
─────────────────────────────────────────────────
ip4-icmp-echo-request node:
  - Swaps src/dst IP addresses
  - Does NOT update adj_index[VLIB_TX] (still = 7)
  - next_node = "ip4-load-balance"
Source: src/plugins/ping/ping.c:506-508

Step 3: 💥 CRASH in ip4-load-balance
─────────────────────────────────────────────────
lbi0 = vnet_buffer(b)->ip.adj_index[VLIB_TX];  // = 7
lb0 = load_balance_get(7);  // ❌ WRONG! Treats 7 as load_balance_pool index!
                            // Gets random/invalid memory

// lb0 now contains garbage data
dpo0 = load_balance_get_bucket_i(lb0, 0);  // Read garbage
next[0] = dpo0->dpoi_next_node;            // next_index = 14 (invalid)

vlib_buffer_enqueue_to_next(...);  // 💥 ASSERTION FAILURE → abort()
Source: src/vnet/ip/ip4_forward.c:225-227, 261

Visual Representation

Memory Layout:
┌───────────────────────────────┐
│ receive_dpo_pool              │
│   [0] = {...}                 │
│   [1] = {...}                 │
│   ...                         │
│   [7] = {sw_if_index=1, ...}  │ ← Valid receive_dpo object
│   ...                         │
└───────────────────────────────┘

┌───────────────────────────────┐
│ load_balance_pool             │
│   [0] = {...}                 │
│   [1] = {...}                 │
│   ...                         │
│   [7] = GARBAGE or unrelated  │ ← ❌ Wrong interpretation!
│   ...                         │
│   [50] = {n_buckets=1, ...}   │ ← Should use THIS one
└───────────────────────────────┘

Bug: index 7 is valid for receive_dpo_pool
     but INVALID/GARBAGE for load_balance_pool!

Impact Assessment

Aspect Severity
Crash Severity 🔴 CRITICAL - Process termination (abort)
Reproducibility 🔴 100% - Every ping to local address
Affected Feature ICMP echo reply (ping) to local interfaces
Security Impact 🔴 DoS Attack Vector - Remote crash trigger
Data Loss All in-flight packets dropped
Service Impact Complete VPP outage requires restart

Reproduction Steps

# 1. Start VPP with local interface
vpp# create loopback interface
vpp# set interface ip address loop0 192.168.1.1/24
vpp# set interface state loop0 up

# 2. Ping the local address
$ ping 192.168.1.1

# Result: 💥 VPP crashes immediately with assertion failure

GDB Debug Evidence

Before crash:

(gdb) p *dpo0
$15 = {{{dpoi_type = DPO_RECEIVE,      ← Correct: this is a receive DPO
         dpoi_proto = DPO_PROTO_IP4, 
         dpoi_next_node = 12, 
         dpoi_index = 7},                ← This goes to adj_index[VLIB_TX]
       as_u64 = 30065557516}}

Then crash when reading load_balance_pool[7] instead of receive_dpo_pool[7].

Proposed Fix (Immediate Patch Needed)

Solution: Re-lookup in FIB

File: src/plugins/ping/ping.c
Function: ip4_icmp_echo_request()
Location: After swapping addresses (around line 454), before vlib_put_next_frame

/* After swapping src/dst addresses */
ip0->src_address.data_u32 = dst0;
ip0->dst_address.data_u32 = src0;

/* ✅ FIX: Perform FIB lookup for reply packet */
ip_lookup_set_buffer_fib_index (i4m->fib_index_by_sw_if_index, p0);
u32 lbi0 = ip4_fib_forwarding_lookup (vnet_buffer (p0)->ip.fib_index,
                                       &ip0->dst_address);
vnet_buffer (p0)->ip.adj_index[VLIB_TX] = lbi0;  // Store correct load_balance index

/* Update checksums... */

Alternative: Change Next Node

// In src/plugins/ping/ping.c:506-508
.n_next_nodes = 1,
.next_nodes = {
  [0] = "ip4-lookup",  // Changed from "ip4-load-balance"
},

Code References

File Line Description
src/vnet/ip/ip4_forward.c 261 💥 Crash location
src/vnet/ip/ip4_forward.c 225-227 Wrong pool access
src/vnet/ip/ip4_forward.h 355 Stores wrong index type
src/plugins/ping/ping.c 506-508 Wrong next node config
src/vlib/node_funcs.h 370 Assertion failure
src/vnet/dpo/receive_dpo.h 62 receive_dpo_pool definition
src/vnet/dpo/load_balance.h 172 load_balance definition

Request for Urgent Action

⚠️ This is a critical bug requiring immediate attention:

  1. Confirmed crash with 100% reproducibility
  2. DoS attack vector (external trigger)
  3. Process termination (complete service outage)
  4. Root cause identified with fix proposal

Questions for Maintainers:

  • Can you reproduce on your setup?
  • Which branch should the fix target?
  • Should we also fix IPv6 (similar pattern)?
  • Any existing test infrastructure for local ping?

Environment: Linux x86_64
Reporter: Available for testing patches
Priority: 🔴 CRITICAL
Workaround: Do not ping local interface addresses

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions