Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions docs/perf-analysis/ANALYSIS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Performance Analysis - Issue #19132

## Problem Statement

Building a synthetic F# project with 5,000+ modules takes excessive time and memory.

Related Issue: https://github.com/dotnet/fsharp/issues/19132

## Test Environment

- **F# Compiler**: Built from main branch
- **Compiler Path**: `/home/runner/work/fsharp/fsharp/artifacts/bin/fsc/Release/net10.0/fsc.dll`
- **FSharp.Core**: `/home/runner/work/fsharp/fsharp/artifacts/bin/FSharp.Core/Release/netstandard2.0/FSharp.Core.dll`
- **.NET SDK**: 10.0.100-rc.2
- **Platform**: Linux (Ubuntu)

## Test Project Structure

Each module (`FooN.fs`) contains:
```fsharp
namespace TestProject

[<RequireQualifiedAccess>]
type FooN = Foo of int | Bar

[<RequireQualifiedAccess>]
module FooN =
let foo: FooN = FooN.Bar
```

## 5000 Module Build - Detailed Analysis

### Build Result
- **Total Time**: 14 minutes 11 seconds (851.19s)
- **Configuration**: Release, ParallelCompilation=true
- **Result**: Build succeeded

### Memory Growth Over Time (Measured Every Minute)

| Elapsed Time | CPU % | Memory % | RSS (MB) |
|--------------|-------|----------|----------|
| 1 min | 380% | 6.0% | 969 |
| 2 min | 387% | 6.5% | 1,050 |
| 3 min | 336% | 14.2% | 2,287 |
| 4 min | 286% | 23.7% | 3,805 |
| 5 min | 255% | 32.1% | 5,144 |
| 6 min | 234% | 39.5% | 6,331 |
| 7 min | 218% | 46.9% | 7,513 |
| 8 min | 207% | 53.5% | 8,561 |
| 9 min | 197% | 60.4% | 9,664 |
| 10 min | 189% | 67.1% | 10,746 |
| 11 min | 183% | 79.6% | 12,748 |
| 12 min | 175% | 90.4% | 14,473 |
| 13 min | 165% | 90.6% | 14,498 |
| 14 min | - | - | Build completed |

### Key Observations From Measured Data

1. **Memory growth is linear**: ~1.1 GB per minute for first 12 minutes
2. **Peak memory**: 14.5 GB (90.6% of system RAM)
3. **CPU utilization decreases over time**: 380% → 165%
4. **Memory plateaus at ~90%**: Possible GC pressure at 12-13 min mark

### Build Log Evidence
```
Determining projects to restore...
Restored /tmp/perf-testing/fsharp-5000/src/TestProject.fsproj (in 78 ms).
TestProject -> /tmp/perf-testing/fsharp-5000/src/bin/Release/net8.0/TestProject.dll

Build succeeded.
0 Warning(s)
0 Error(s)

Time Elapsed 00:14:11.19
```

### Compiler Process Evidence
```
runner 35321 ... /usr/share/dotnet/dotnet /home/runner/work/fsharp/fsharp/artifacts/bin/fsc/Release/net10.0/fsc.dll @/tmp/MSBuildTempQZbd6p/tmp24fcc0624ca6474f8fc7ddd8ab0874ef.rsp
```

## Trace Collection

A 2-minute trace was collected during the build using:
```bash
dotnet-trace collect --process-id <FSC_PID> --format speedscope --output fsc-trace --duration 00:02:00
```

Trace file: `fsc-trace` (44,537 bytes)
Converted to speedscope format: `fsc-trace.speedscope.speedscope.json` (92,797 bytes)

Note: Trace shows high proportion of unmanaged code time, indicating native code execution or JIT compilation overhead.

## Experiment Reproduction Steps

### Using Compiler Timing Output

The F# compiler supports detailed timing output via project properties:

1. **Add to project file** (or pass via command line):
```xml
<PropertyGroup>
<OtherFlags>--times --times:compilationTiming.csv</OtherFlags>
</PropertyGroup>
```

Or via command line:
```bash
dotnet build -c Release -p:OtherFlags="--times --times:compilationTiming.csv"
```

2. **Collect traces and dumps**:
```bash
# Start build with tracing
dotnet-trace collect --output build-trace.nettrace -- dotnet build -c Release -p:OtherFlags="--times --times:compilationTiming.csv"

# Or attach to running fsc.dll process
FSC_PID=$(pgrep -f fsc.dll)
dotnet-trace collect -p $FSC_PID --output fsc-trace.nettrace --duration 00:05:00 &
dotnet-gcdump collect -p $FSC_PID --output heap.gcdump

# For memory dump at specific time (e.g., 15 min mark)
sleep 900 && dotnet-dump collect -p $FSC_PID --output memory.dmp
```

3. **Analyze results**:
```bash
# Convert trace to speedscope format
dotnet-trace convert fsc-trace.nettrace --format speedscope

# Report on GC dump
dotnet-gcdump report heap.gcdump

# Analyze timing CSV
python3 -c "
import csv
with open('compilationTiming.csv') as f:
reader = csv.DictReader(f)
for row in sorted(reader, key=lambda x: float(x.get('Duration', 0)), reverse=True)[:20]:
print(row)
"
```

### Validation After Memory Leak Fix

After CI builds the fixed compiler, repeat the 5000-module experiment with:

1. Same build configuration (Release, ParallelCompilation=true)
2. Same measurement methodology (memory/CPU every minute)
3. Enable `--times` output for detailed phase breakdown
4. Collect GC dumps at 1, 5, and 10 minute marks
5. Compare `ImportILTypeDef@*` closure counts with previous run
148 changes: 148 additions & 0 deletions docs/perf-analysis/GC_ANALYSIS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# GC Analysis - Issue #19132

## Overview

GC heap analysis during 5000-module F# project build using `dotnet-gcdump`.

Related Issue: https://github.com/dotnet/fsharp/issues/19132

## Collection Methodology

```bash
dotnet-gcdump collect -p <FSC_PID> -o <output>.gcdump
dotnet-gcdump report <gcdump_file>
```

## GC Dump Timeline

| Dump | Elapsed Time | RSS Memory | GC Heap Size | GC Heap Objects |
|------|--------------|------------|--------------|-----------------|
| 1 | ~1 min | 1,058 MB | 361 MB | 10,003,470 |
| 2 | ~4 min | 3,828 MB | 291 MB | 10,000,000 |
| 3 | ~10 min | 10,406 MB | 302 MB | 10,000,000 |

## Key Observation

**RSS grows from 1 GB to 10 GB but GC Heap remains ~300 MB**

This indicates memory is being consumed outside the managed GC heap, likely in:
- Native allocations
- Memory-mapped files
- Large Object Heap (LOH) that was collected between dumps
- Unmanaged code buffers

## GC Dump 1 - Top Types (1 min mark, 1 GB RSS)

```
361,256,014 GC Heap bytes
10,003,470 GC Heap objects

Object Bytes Count Type
2,376,216 1 System.Byte[] (Bytes > 1M)
131,096 11 System.Byte[] (Bytes > 100K)
80,848 1 VolatileNode<System.String,FSharp.Compiler.Parser+token>[]
64,808 16 Entry<FSharp.Compiler.ParseAndCheckInputs+NodeToTypeCheck>[]
49,200 1,848 System.Int32[]
40,048 1 System.Tuple<FSharp.Compiler.Syntax.ParsedInput,...>[]
40,048 1 FSharp.Compiler.Syntax.ParsedInput[]
40,048 1 FSharp.Compiler.GraphChecking.FileInProject[]
```

## GC Dump 2 - Top Types (4 min mark, 3.8 GB RSS)

```
290,770,269 GC Heap bytes
10,000,000 GC Heap objects

Object Bytes Count Type
12,140 824 System.Int32[]
11,288 2,237 NodeToTypeCheck[]
136 713 TcEnv
136 702 NameResolutionEnv
136 677 ILTypeDef
128 5,671 ValOptionalData
128 2,431 Entity
120 729 FSharp.Compiler.Syntax.SynBinding
112 2,446 ModuleOrNamespaceType
96 1,183 EntityOptionalData
72 28,691 Val
```

## GC Dump 3 - Top Types (10 min mark, 10.4 GB RSS)

```
301,948,795 GC Heap bytes
10,000,000 GC Heap objects

Object Bytes Count Type
45,080 1 System.String[]
35,200 2 System.UInt16[]
15,656 2,258 NodeToTypeCheck[]
10,016 1,416 System.Int32[]
136 1,225 ILTypeDef
136 686 TcEnv
136 674 NameResolutionEnv
128 5,480 ValOptionalData
128 2,941 Entity
120 1,144 FSharp.Compiler.Syntax.SynBinding
112 2,385 ModuleOrNamespaceType
88 5,890 Match
88 4,964 Lambda
88 2,659 TyconAugmentation
72 27,888 Val
```

## Type Growth Analysis

Comparing object counts between Dump 2 (4 min) and Dump 3 (10 min):

| Type | Dump 2 Count | Dump 3 Count | Change |
|------|--------------|--------------|--------|
| Val | 28,691 | 27,888 | -803 |
| Entity | 2,431 | 2,941 | +510 |
| ILTypeDef | 677 | 1,225 | +548 |
| TcEnv | 713 | 686 | -27 |
| ModuleOrNamespaceType | 2,446 | 2,385 | -61 |
| Match | 6,052 | 5,890 | -162 |
| Lambda | 5,100 | 4,964 | -136 |
| TyconAugmentation | 2,142 | 2,659 | +517 |
| SynBinding | 729 | 1,144 | +415 |
| ImportILTypeDef@712 | 674 | 983 | +309 |

## Memory Leak Identified

**ImportILTypeDef@712** (closures from ImportILTypeDef function) showing growth indicates a potential memory leak.

### Root Cause
In `src/Compiler/Checking/import.fs`, the `ImportILTypeDef` function was storing `tdef.CustomAttrsStored` reference in `AttributesFromIL`, which kept entire `ILTypeDef` objects alive via closure.

### Fix Applied
Modified `ImportILTypeDef` to:
1. Added type annotation `(amap: ImportMap)` to fix type inference
2. Check if nullness features are enabled (`amap.g.langFeatureNullness && amap.g.checkNullness`)
3. If enabled: immediately read attrs with `tdef.CustomAttrsStored.GetCustomAttrs(tdef.MetadataIndex)` and wrap in `Given`
4. If disabled: use empty attributes to avoid any reference

This prevents the closure from keeping large `ILTypeDef` objects alive.

### Validation Status
**Pending**: After CI builds the fixed compiler, the 5000-module experiment should be repeated to verify:
- Reduced memory growth rate
- Stable or reduced `ImportILTypeDef@712` closure count
- Improved CPU utilization over time

## Key Findings

1. **GC Heap is stable at ~300 MB** despite RSS growing to 10+ GB
2. **ImportILTypeDef closures growing** - memory leak via CustomAttrsStored references
3. **RSS growth not reflected in GC heap** - indicates native/unmanaged memory consumption
4. **Val type has 27-28k instances** - F# value definitions
5. **Entity and ILTypeDef counts grow** - IL metadata accumulation
6. **TcEnv and NameResolutionEnv** - type checking environments (~700 instances)

## Build Environment

- Compiler: `/home/runner/work/fsharp/fsharp/artifacts/bin/fsc/Release/net10.0/fsc.dll`
- Configuration: Release, ParallelCompilation=true
- Modules: 5000
- Platform: Linux (Ubuntu)
62 changes: 62 additions & 0 deletions docs/perf-analysis/HOT_PATHS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Hot Paths Analysis - Issue #19132

## Overview

This document contains hot path analysis from performance traces collected during large F# project builds.

Related Issue: https://github.com/dotnet/fsharp/issues/19132

## Trace Collection

### Method
```bash
dotnet-trace collect --process-id <FSC_PID> --format speedscope --output fsc-trace --duration 00:02:00
```

### Collected Trace
- **Process**: fsc.dll (PID 35798)
- **Duration**: 2 minutes (captured during 5000-module build)
- **Trace file**: `fsc-trace` (44,537 bytes)
- **Speedscope file**: `fsc-trace.speedscope.speedscope.json` (92,797 bytes)

## Trace Analysis Results

### Thread Activity
The trace captured 28 active threads:
- Thread 35798 (main)
- Threads 35810-35838 (worker threads)

### Time Distribution
From speedscope conversion:
- Main thread (35798): 17.88ms - 267.12ms captured
- High proportion of `UNMANAGED_CODE_TIME` frames
- Many `?!?` (unresolved) stack frames

### Observations
1. **Unresolved symbols**: Many stack frames show as "?!?" indicating native code or missing debug symbols
2. **Multi-threaded**: 28 threads active confirms ParallelCompilation is engaged
3. **Unmanaged code**: Significant time in unmanaged code (possibly JIT, GC, or native runtime)

### Limitations
- Trace symbols not fully resolved
- 2-minute sample may not capture all phases
- Native/unmanaged code not fully visible

## Raw Trace Data

### Speedscope Frame Names (from trace)
```
"Process64 Process(35798) (35798) Args: "
"(Non-Activities)"
"Threads"
"Thread (35798)"
"?!?"
"UNMANAGED_CODE_TIME"
"Thread (35810)" ... "Thread (35838)"
"CPU_TIME"
```

## Next Steps for Deeper Analysis
1. Build with debug symbols for better stack resolution
2. Use longer trace duration to capture full build
3. Consider using PerfView or dotnet-gcdump for memory analysis
Loading
Loading