Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ node_modules/
.claude-session-id

squawk/
eugene/

# Auto generated treesitter files
crates/pgt_treesitter_grammar/src/grammar.json
crates/pgt_treesitter_grammar/src/node-types.json
crates/pgt_treesitter_grammar/src/parser.c
crates/pgt_treesitter_grammar/src/parser.c
378 changes: 378 additions & 0 deletions agentic/port_eugene_rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
# Porting Rules from Eugene to postgres-language-server

This document tracks the progress of porting lint rules from [Eugene](https://github.com/kaaveland/eugene) to the postgres-language-server analyzer.

## Overview

Eugene is a PostgreSQL migration linter that detects dangerous operations. We are porting its rules to the postgres-language-server to provide similar safety checks within the language server environment.

**Eugene source location**: `eugene/eugene/src/lints/rules.rs`
**Hint metadata location**: `eugene/eugene/src/hint_data.rs`
**Example SQL files**: `eugene/eugene/examples/*/`

## Step-by-Step Porting Process

### 1. Understand the Rule

1. **Read Eugene's implementation** in `eugene/eugene/src/lints/rules.rs`
- Find the rule function (e.g., `added_serial_column`)
- Understand the AST patterns it matches
- Note any special logic (e.g., checking previous statements)

2. **Read the hint metadata** in `eugene/eugene/src/hint_data.rs`
- Get the ID (e.g., "E11")
- Get name, condition, effect, and workaround text
- This provides documentation content

3. **Review example SQL** in `eugene/eugene/examples/<ID>/`
- `bad.sql` - Invalid cases that should trigger the rule
- `good.sql` - Valid cases that should NOT trigger

### 2. Create the Rule

```bash
# Create rule with appropriate severity (error/warn)
just new-lintrule safety <ruleName> <severity>

# Example:
just new-lintrule safety addSerialColumn error
```

This generates:
- `crates/pgt_analyser/src/lint/safety/<rule_name>.rs`
- `crates/pgt_analyser/tests/specs/safety/<ruleName>/basic.sql`
- Updates to configuration files
- Diagnostic category registration

### 3. Implement the Rule

**File**: `crates/pgt_analyser/src/lint/safety/<rule_name>.rs`

#### Key Components:

```rust
use pgt_analyse::{context::RuleContext, declare_lint_rule, Rule, RuleDiagnostic, RuleSource};
use pgt_console::markup;
use pgt_diagnostics::Severity;

declare_lint_rule! {
/// Brief one-line description (shown in lists).
///
/// Detailed explanation of what the rule detects and why it's problematic.
/// Explain the PostgreSQL behavior and performance/safety implications.
///
/// ## Examples
///
/// ### Invalid
///
/// ```sql,expect_diagnostic
/// -- SQL that should trigger the rule
/// ALTER TABLE users ADD COLUMN id serial;
/// ```
///
/// ### Valid
///
/// ```sql
/// -- SQL that should NOT trigger
/// CREATE TABLE users (id serial PRIMARY KEY);
/// ```
///
pub RuleName {
version: "next",
name: "ruleName",
severity: Severity::Error, // or Warning
recommended: true, // or false
sources: &[RuleSource::Eugene("<ID>")], // e.g., "E11"
}
}

impl Rule for RuleName {
type Options = ();

fn run(ctx: &RuleContext<Self>) -> Vec<RuleDiagnostic> {
let mut diagnostics = Vec::new();

// Pattern match on the statement type
if let pgt_query::NodeEnum::AlterTableStmt(stmt) = &ctx.stmt() {
// Rule logic here

if /* condition */ {
diagnostics.push(
RuleDiagnostic::new(
rule_category!(),
None,
markup! {
"Error message with "<Emphasis>"formatting"</Emphasis>"."
},
)
.detail(None, "Additional context about the problem.")
.note("Suggested fix or workaround."),
);
}
}

diagnostics
}
}
```

#### Important Patterns:

**Accessing previous statements** (for rules like `multipleAlterTable`):
```rust
let file_ctx = ctx.file_context();
let previous = file_ctx.previous_stmts();
```

**Schema normalization** (treating empty schema as "public"):
```rust
let schema_normalized = if schema.is_empty() {
"public"
} else {
schema.as_str()
};
```

**Checking for specific ALTER TABLE actions**:
```rust
for cmd in &stmt.cmds {
if let Some(pgt_query::NodeEnum::AlterTableCmd(cmd)) = &cmd.node {
if cmd.subtype() == pgt_query::protobuf::AlterTableType::AtAddColumn {
// Handle ADD COLUMN
}
}
}
```

**Extracting column type**:
```rust
if let Some(pgt_query::NodeEnum::ColumnDef(col_def)) = &cmd.def.as_ref().and_then(|d| d.node.as_ref()) {
if let Some(type_name) = &col_def.type_name {
let type_str = get_type_name(type_name);
}
}

fn get_type_name(type_name: &pgt_query::protobuf::TypeName) -> String {
type_name
.names
.iter()
.filter_map(|n| {
if let Some(pgt_query::NodeEnum::String(s)) = &n.node {
Some(s.sval.as_str())
} else {
None
}
})
.collect::<Vec<_>>()
.join(".")
}
```

### 4. Create Comprehensive Tests

**Directory**: `crates/pgt_analyser/tests/specs/safety/<ruleName>/`

Create multiple test files covering:

#### Invalid Cases (should trigger):
```sql
-- expect_lint/safety/<ruleName>
-- Description of what this tests
<SQL that should trigger>
```

#### Valid Cases (should NOT trigger):
```sql
-- Description of what this tests
-- expect_no_diagnostics
<SQL that should NOT trigger>
```

#### Example Test Structure:
```
tests/specs/safety/addSerialColumn/
├── basic.sql # Basic case triggering the rule
├── bigserial.sql # Variant (bigserial type)
├── generated_stored.sql # Another variant (GENERATED)
├── valid_regular_column.sql # Valid: regular column
└── valid_create_table.sql # Valid: CREATE TABLE context
```

**Run tests and accept snapshots**:
```bash
cargo insta test -p pgt_analyser --accept
```

### 5. Verify and Generate Code

```bash
# Check compilation
cargo check

# Generate lint code and documentation
just gen-lint

# Run all tests
cargo test -p pgt_analyser --test rules_tests

# Final verification
just ready
```

### 6. Test the Rule Manually

Create a test SQL file:
```sql
-- test.sql
ALTER TABLE users ADD COLUMN id serial;
```

Run the CLI:
```bash
cargo run -p pgt_cli -- check /path/to/test.sql
```

## Common Pitfalls and Solutions

### 1. Schema Matching

**Problem**: Need to match tables across statements with different schema notations.

**Solution**: Normalize schema names:
```rust
let schema_normalized = if schema.is_empty() { "public" } else { schema.as_str() };
```

### 2. AST Navigation

**Problem**: Eugene uses a simplified AST (`StatementSummary`), but pgt uses the full PostgreSQL AST.

**Solution**: Use pattern matching and helper functions. Look at existing rules for examples.

### 3. File Context Rules

**Problem**: Rules that need to track state across statements (like `multipleAlterTable`).

**Solution**: Use `ctx.file_context()` to access `AnalysedFileContext`:
```rust
let file_ctx = ctx.file_context();
let previous_stmts = file_ctx.previous_stmts();
```

### 4. Transaction State

**Problem**: Some Eugene rules check transaction state (e.g., `RUNNING_STATEMENT_WHILE_HOLDING_ACCESS_EXCLUSIVE`).

**Solution**: This is more complex and may require extending the `AnalysedFileContext` to track transaction state. Consider implementing simpler rules first.

### 5. Test Expectations

**Problem**: `expect_lint` expects exactly ONE diagnostic, but rule generates multiple.

**Solution**: Either:
- Split into separate test files (one diagnostic each)
- Adjust the test to only trigger once
- Use expect_diagnostic for each occurrence (check existing tests)

## Rule Mapping Considerations

### Overlapping Rules

Some Eugene rules may overlap with existing pgt rules:

| Eugene Rule | Potential PGT Overlap | Action |
|-------------|----------------------|--------|
| `SET_COLUMN_TYPE_TO_JSON` | `preferJsonb` | Review both, may enhance existing |
| `CREATE_INDEX_NONCONCURRENTLY` | `requireConcurrentIndexCreation` | Review both |
| `CHANGE_COLUMN_TYPE` | `changingColumnType` | Review both |
| `ADD_NEW_UNIQUE_CONSTRAINT_WITHOUT_USING_INDEX` | `disallowUniqueConstraint` | Review both |

**When overlap exists**:
1. Compare implementations
2. If Eugene's is more comprehensive, consider updating the existing rule
3. If they cover different aspects, keep both
4. Document any differences

### Transaction-Aware Rules

These rules require tracking transaction state across multiple statements:

- `RUNNING_STATEMENT_WHILE_HOLDING_ACCESS_EXCLUSIVE` (E4)
- `LOCKTIMEOUT_WARNING` (E9)

**Approach**:
1. First implement simpler, statement-level rules
2. Design transaction state tracking in `AnalysedFileContext`
3. Add fields to track:
- Current transaction state (BEGIN/COMMIT/ROLLBACK)
- Locks held
- Lock timeout settings
4. Update state as statements are processed

## Eugene AST vs PostgreSQL AST

### Eugene's Simplified AST

Eugene uses `StatementSummary` enum with simplified representations:
```rust
enum StatementSummary {
AlterTable { schema: String, name: String, actions: Vec<AlterTableAction> },
CreateIndex { schema: String, idxname: String, concurrently: bool, target: String },
// ...
}

enum AlterTableAction {
AddColumn { column: String, type_name: String, stored_generated: bool, ... },
SetType { column: String, type_name: String },
// ...
}
```

### PostgreSQL AST (pgt_query)

We use the full PostgreSQL protobuf AST:
```rust
pgt_query::NodeEnum::AlterTableStmt(stmt)
-> stmt.cmds: Vec<Node>
-> NodeEnum::AlterTableCmd(cmd)
-> cmd.subtype: AlterTableType
-> cmd.def: Option<Node>
-> NodeEnum::ColumnDef(col_def)
```

**Translation Strategy**:
1. Look at Eugene's simplified logic
2. Map to corresponding PostgreSQL AST nodes
3. Use existing pgt rules as references
4. Check `pgt_query::protobuf` for available types/enums

## Useful References

- **Eugene source**: `eugene/eugene/src/lints/rules.rs`
- **Existing pgt rules**: `crates/pgt_analyser/src/lint/safety/`
- **Contributing guide**: `crates/pgt_analyser/CONTRIBUTING.md`
- **AST types**: `crates/pgt_query/src/lib.rs`
- **PostgreSQL protobuf**: `pgt_query::protobuf` module

## Next Steps

1. **Priority**: Port high-priority safety rules (E1, E2, E6, E7, E9)
2. **Review overlaps**: Check if E3, E5, E6, E7 overlap with existing rules
3. **Transaction tracking**: Design transaction state tracking for E4, E9
4. **Documentation**: Update Eugene source attribution in ported rules
5. **Testing**: Ensure comprehensive test coverage for all ported rules

## Template Checklist

When porting a new rule, ensure:

- [ ] Rule implementation in `src/lint/safety/<rule>.rs`
- [ ] Documentation with examples (invalid and valid cases)
- [ ] `sources: &[RuleSource::Eugene("<ID>")]` attribution
- [ ] At least 3-5 test files (mix of invalid and valid)
- [ ] Snapshot tests accepted with `cargo insta test --accept`
- [ ] All tests pass: `cargo test -p pgt_analyser --test rules_tests`
- [ ] Compilation clean: `cargo check`
- [ ] Code generation: `just gen-lint`
- [ ] Manual CLI test with sample SQL
- [ ] Update this document with completed rule
Loading