Skip to content

feat: enhance accuracy of validation and converter#26

Merged
ikjeong merged 6 commits intomainfrom
feauture/enhance-accuracy
Nov 19, 2025
Merged

feat: enhance accuracy of validation and converter#26
ikjeong merged 6 commits intomainfrom
feauture/enhance-accuracy

Conversation

@sehwan505
Copy link
Contributor

  • 컨버터 engine에 외부 linter가 들어가도록 변경
  • code-policy에 rbac enable 추가
  • mcp 검증 결과 저장하도록 변경
  • validation 병렬화 적용

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the converter and validator to use external linters with LLM-based routing, adds parallel validation processing, implements MCP validation result persistence, and enhances RBAC enforcement in code policies.

Key Changes:

  • Replaced inference-based converter with LLM-driven linter routing and parallel conversion
  • Added goroutine-based parallelization for LLM validation to improve performance
  • Implemented validation result history tracking in .sym/validation-results.json

Reviewed Changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
internal/validator/validator.go Added parallel processing for LLM validator using goroutines and sync primitives
internal/validator/llm_validator.go Parallelized rule validation with goroutines and mutex-protected result aggregation
internal/validator/llm_validator_test.go Formatting alignment changes (whitespace only)
internal/mcp/server.go Added validation result persistence, changed to OpenAI-only API key, enabled verbose mode
internal/server/server.go Updated to use new converter API with simplified file writing
internal/converter/converter.go Complete refactor to LLM-based linter routing with parallel conversion
internal/converter/linters/*.go New LLM-powered linter converters replacing inference-based approach
internal/llm/types.go Removed (inference types no longer needed)
internal/llm/inference.go Removed (replaced by direct LLM integration in converters)
internal/llm/inference_test.go Removed (tests for deleted functionality)
internal/cmd/convert.go Simplified to use new converter with language-based routing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Create unified validator that handles all engines + RBAC
v := validator.NewValidator(validationPolicy, false) // verbose=false for MCP
v := validator.NewValidator(validationPolicy, true) // verbose=true for debugging
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing verbose from false to true for debugging will produce verbose output in production MCP usage. Consider making this configurable through an environment variable or configuration option rather than hardcoding it.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +76
mu.Lock()
result.Checked++

violation, err := v.CheckRule(ctx, change, addedLines, rule)
if err != nil {
// Log error but continue
fmt.Printf("Warning: failed to check rule %s: %v\n", rule.ID, err)
continue
}

if violation != nil {
result.Failed++
result.Violations = append(result.Violations, *violation)
} else {
result.Passed++
}
mu.Unlock()
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result.Checked counter is incremented before launching goroutines (line 75), but if validation fails (line 86 returns early), this counter won't accurately reflect actual checks performed. Consider moving the counter increment inside the goroutine after successful validation, or adjust the counter on error.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +93
var errors []string

for result := range results {
if result.err != nil {
errors = append(errors, fmt.Sprintf("Rule %d: %v", result.index+1, result.err))
continue
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The error slice collection from parallel goroutines (line 92) is not ordered, which makes error messages less deterministic. Consider using a map with indices or sorting errors before reporting to provide consistent error output across runs.

Copilot uses AI. Check for mistakes.
Comment on lines +298 to +361
systemPrompt := fmt.Sprintf(`You are a code quality expert. Analyze the given coding rule and determine which linters can ACTUALLY enforce it using their NATIVE rules (without plugins).

type inferenceJob struct {
ruleWithIndex
intent *llm.RuleIntent
err error
warning string
}
Available linters and NATIVE capabilities:
- eslint: ONLY native ESLint rules (no-console, no-unused-vars, eqeqeq, no-var, camelcase, new-cap, max-len, max-lines, no-eval, etc.)
- CAN: Simple syntax checks, variable naming, console usage, basic patterns
- CANNOT: Complex business logic, context-aware rules, file naming, advanced async patterns
- prettier: Code formatting ONLY (quotes, semicolons, indentation, line length, trailing commas)
- tsc: TypeScript type checking ONLY (strict modes, noImplicitAny, strictNullChecks, type inference)
- checkstyle: Java style checks (naming, whitespace, imports, line length, complexity)
- pmd: Java code quality (unused code, empty blocks, naming conventions, design issues)

// Create worker pool with concurrency limit
maxWorkers := 5 // Limit concurrent LLM API calls
jobs := make(chan ruleWithIndex, len(applicableRules))
results := make(chan inferenceJob, len(applicableRules))
STRICT Rules for selection:
1. ONLY select if the linter has a NATIVE rule that can enforce this
2. If the rule requires understanding business logic or context → return []
3. If the rule requires custom plugins → return []
4. If the rule is about file naming → return []
5. If the rule requires deep semantic analysis → return []
6. When in doubt, return [] (better to use llm-validator than fail)

// Start workers
for w := 0; w < maxWorkers; w++ {
go func() {
for job := range jobs {
inferResult, err := c.inferencer.InferFromUserRule(ctx, &job.rule)
Available linters for this rule: %s

jobResult := inferenceJob{
ruleWithIndex: job,
}
Return ONLY a JSON array of linter names (no markdown):
["linter1", "linter2"] or []

if err != nil {
jobResult.err = err
jobResult.warning = fmt.Sprintf("Rule %d (%s): %v", job.index+1, job.rule.ID, err)
} else {
jobResult.intent = inferResult.Intent

// Check confidence threshold
if inferResult.Intent.Confidence < confidenceThreshold {
jobResult.warning = fmt.Sprintf("Rule %d (%s): low confidence %.2f",
job.index+1, job.rule.ID, inferResult.Intent.Confidence)
}
}
Examples:

results <- jobResult
}
}()
}
Input: "Use single quotes for strings"
Output: ["prettier"]

// Send jobs
for _, rule := range applicableRules {
jobs <- rule
}
close(jobs)
Input: "No console.log allowed"
Output: ["eslint"]

// Collect results
inferenceResults := make(map[int]inferenceJob)
for i := 0; i < len(applicableRules); i++ {
jobResult := <-results
inferenceResults[jobResult.index] = jobResult
}
close(results)
Input: "Classes start with capital letter"
Output: ["eslint"]

// Process results in original order
for _, ruleInfo := range applicableRules {
jobResult := inferenceResults[ruleInfo.index]
Input: "Maximum line length is 120"
Output: ["prettier"]

if jobResult.err != nil {
result.Warnings = append(result.Warnings, jobResult.warning)
continue
}
Input: "No implicit any types"
Output: ["tsc"]

if jobResult.warning != "" {
result.Warnings = append(result.Warnings, jobResult.warning)
}
Input: "All async functions must have try-catch"
Output: []
Reason: Requires semantic understanding of error handling

// Convert to linter-specific rule
linterRule, err := converter.Convert(&ruleInfo.rule, jobResult.intent)
if err != nil {
result.Errors = append(result.Errors, fmt.Errorf("rule %d: %w", ruleInfo.index+1, err))
continue
}
Input: "File names must be kebab-case"
Output: []
Reason: File naming requires plugin

result.Rules = append(result.Rules, linterRule)

// Track which engine will validate this rule
// Check if the rule has meaningful configuration content
hasContent := false
if len(linterRule.Config) > 0 {
// Check if config has actual rule content (not just empty nested structures)
for key, value := range linterRule.Config {
if key == "modules" && value != nil {
// For Checkstyle, check if modules slice is not empty
v := reflect.ValueOf(value)
if v.Kind() == reflect.Slice && v.Len() > 0 {
hasContent = true
break
}
} else if key == "rules" && value != nil {
// For PMD, check if rules array/slice is not empty
v := reflect.ValueOf(value)
if v.Kind() == reflect.Slice && v.Len() > 0 {
hasContent = true
break
}
} else if key != "" && value != nil {
// For ESLint and other formats with direct config
hasContent = true
break
}
}
}
Input: "API handlers must return proper status codes"
Output: []
Reason: Requires business logic understanding

if hasContent {
result.RuleEngineMap[ruleInfo.rule.ID] = converter.Name()
} else {
result.RuleEngineMap[ruleInfo.rule.ID] = "llm-validator"
}
}
Input: "Database queries must use parameterized queries"
Output: []
Reason: Requires understanding SQL injection context

// Generate final configuration
config, err := converter.GenerateConfig(result.Rules)
if err != nil {
return nil, fmt.Errorf("failed to generate config: %w", err)
}
Input: "No hardcoded API keys or passwords"
Output: []
Reason: Requires semantic analysis of what constitutes secrets

result.Config = config
Input: "Imports from large packages must be specific"
Output: []
Reason: Requires knowing which packages are "large"`, availableLinters)
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM routing prompt is extremely long (lines 298-361) and duplicates logic that should be in the linter converters themselves. This creates a maintenance burden - if a linter's capabilities change, this prompt must be updated. Consider moving capability detection to the linter converters and having them return a confidence score, or use a more maintainable data structure for linter capabilities.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +82
for i, rule := range rules {
wg.Add(1)
go func(idx int, r schema.UserRule) {
defer wg.Done()

if err != nil {
return nil, fmt.Errorf("failed to convert rule: %w", err)
module, err := c.convertSingleRule(ctx, r, llmClient)
results <- moduleResult{
index: idx,
module: module,
err: err,
}
}(i, rule)
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goroutines for Checkstyle module conversion (lines 73-82) have no concurrency limit. For large policies, this creates unbounded parallelism with LLM API calls. The same issue exists in PMD converter (lines 69-78). Consider implementing a worker pool pattern to limit concurrent LLM API calls across all linter converters.

Copilot uses AI. Check for mistakes.
Comment on lines +502 to +518
// Increment counter and launch goroutine
mu.Lock()
result.Checked++
mu.Unlock()

wg.Add(1)
go func(ch GitChange, lines []string, r schema.PolicyRule) {
defer wg.Done()

violation, err := llmValidator.CheckRule(ctx, ch, lines, r)
if err != nil {
fmt.Printf("⚠️ Validation failed for rule %s: %v\n", r.ID, err)
return
}

mu.Lock()
defer mu.Unlock()
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result.Checked counter is incremented before launching the goroutine (line 504), but if the goroutine encounters an error and returns early (line 514), the counter will be inaccurate because it won't correspond to actual checks performed. Consider incrementing the counter only when the check completes successfully, or decrement it on error.

Suggested change
// Increment counter and launch goroutine
mu.Lock()
result.Checked++
mu.Unlock()
wg.Add(1)
go func(ch GitChange, lines []string, r schema.PolicyRule) {
defer wg.Done()
violation, err := llmValidator.CheckRule(ctx, ch, lines, r)
if err != nil {
fmt.Printf("⚠️ Validation failed for rule %s: %v\n", r.ID, err)
return
}
mu.Lock()
defer mu.Unlock()
// Launch goroutine for check
wg.Add(1)
go func(ch GitChange, lines []string, r schema.PolicyRule) {
defer wg.Done()
violation, err := llmValidator.CheckRule(ctx, ch, lines, r)
mu.Lock()
defer mu.Unlock()
result.Checked++
if err != nil {
fmt.Printf("⚠️ Validation failed for rule %s: %v\n", r.ID, err)
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +1048 to +1052
// Add to history (keep last 50 results)
history.Results = append(history.Results, record)
if len(history.Results) > 50 {
history.Results = history.Results[len(history.Results)-50:]
}
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The validation history is capped at 50 results (lines 1048-1052), but this magic number is not configurable. Consider making this limit configurable through an environment variable or configuration file, or document why 50 is the appropriate limit.

Copilot uses AI. Check for mistakes.
Comment on lines +366 to 370
response, err := c.llmClient.Complete(ctx, systemPrompt, userPrompt)
if err != nil {
fmt.Printf("Warning: LLM routing failed for rule %s: %v\n", rule.ID, err)
return []string{} // Will fall back to llm-validator
}
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] When LLM routing fails (line 368), an empty array is returned, which silently falls back to llm-validator. While this is handled correctly, the warning message is only printed to stdout and not logged or returned as part of the result warnings. Consider collecting these warnings so they can be reported to the user in the final conversion result.

Copilot uses AI. Check for mistakes.
Comment on lines +76 to 90
for result := range results {
if result.err != nil {
errors = append(errors, fmt.Sprintf("Rule %d: %v", result.index+1, result.err))
fmt.Printf("⚠️ ESLint rule %d conversion error: %v\n", result.index+1, result.err)
continue
}

// Merge all rule configs
for _, rule := range rules {
for ruleID, ruleConfig := range rule.Config {
rulesMap[ruleID] = ruleConfig
if result.ruleName != "" {
eslintRules[result.ruleName] = result.config
fmt.Printf("✓ ESLint rule %d → %s\n", result.index+1, result.ruleName)
} else {
skippedCount++
fmt.Printf("⊘ ESLint rule %d skipped (cannot be enforced by ESLint)\n", result.index+1)
}
}
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The results are collected from a channel (lines 76-89) but the order is not preserved due to concurrent execution. The error messages include the rule index (line 78), but the final output may not be in order. Consider using a slice with indices or sorting the results to provide deterministic output.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +102
if len(pmdRules) == 0 {
return nil, fmt.Errorf("no rules converted: %v", errors)
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The error slice is not used when returning the error (line 102). The error message just stringifies all errors, but this information is lost. Consider either including the detailed errors in the returned error, or storing them in a field that can be accessed by callers.

Copilot uses AI. Check for mistakes.
@ikjeong ikjeong merged commit b789d9f into main Nov 19, 2025
1 of 2 checks passed
@sehwan505 sehwan505 deleted the feauture/enhance-accuracy branch December 10, 2025 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants