feat: enhance accuracy of validation and converter#26
Conversation
sehwan505
commented
Nov 18, 2025
- 컨버터 engine에 외부 linter가 들어가도록 변경
- code-policy에 rbac enable 추가
- mcp 검증 결과 저장하도록 변경
- validation 병렬화 적용
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the converter and validator to use external linters with LLM-based routing, adds parallel validation processing, implements MCP validation result persistence, and enhances RBAC enforcement in code policies.
Key Changes:
- Replaced inference-based converter with LLM-driven linter routing and parallel conversion
- Added goroutine-based parallelization for LLM validation to improve performance
- Implemented validation result history tracking in
.sym/validation-results.json
Reviewed Changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/validator/validator.go | Added parallel processing for LLM validator using goroutines and sync primitives |
| internal/validator/llm_validator.go | Parallelized rule validation with goroutines and mutex-protected result aggregation |
| internal/validator/llm_validator_test.go | Formatting alignment changes (whitespace only) |
| internal/mcp/server.go | Added validation result persistence, changed to OpenAI-only API key, enabled verbose mode |
| internal/server/server.go | Updated to use new converter API with simplified file writing |
| internal/converter/converter.go | Complete refactor to LLM-based linter routing with parallel conversion |
| internal/converter/linters/*.go | New LLM-powered linter converters replacing inference-based approach |
| internal/llm/types.go | Removed (inference types no longer needed) |
| internal/llm/inference.go | Removed (replaced by direct LLM integration in converters) |
| internal/llm/inference_test.go | Removed (tests for deleted functionality) |
| internal/cmd/convert.go | Simplified to use new converter with language-based routing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/mcp/server.go
Outdated
|
|
||
| // Create unified validator that handles all engines + RBAC | ||
| v := validator.NewValidator(validationPolicy, false) // verbose=false for MCP | ||
| v := validator.NewValidator(validationPolicy, true) // verbose=true for debugging |
There was a problem hiding this comment.
Changing verbose from false to true for debugging will produce verbose output in production MCP usage. Consider making this configurable through an environment variable or configuration option rather than hardcoding it.
| mu.Lock() | ||
| result.Checked++ | ||
|
|
||
| violation, err := v.CheckRule(ctx, change, addedLines, rule) | ||
| if err != nil { | ||
| // Log error but continue | ||
| fmt.Printf("Warning: failed to check rule %s: %v\n", rule.ID, err) | ||
| continue | ||
| } | ||
|
|
||
| if violation != nil { | ||
| result.Failed++ | ||
| result.Violations = append(result.Violations, *violation) | ||
| } else { | ||
| result.Passed++ | ||
| } | ||
| mu.Unlock() |
There was a problem hiding this comment.
The result.Checked counter is incremented before launching goroutines (line 75), but if validation fails (line 86 returns early), this counter won't accurately reflect actual checks performed. Consider moving the counter increment inside the goroutine after successful validation, or adjust the counter on error.
| var errors []string | ||
|
|
||
| for result := range results { | ||
| if result.err != nil { | ||
| errors = append(errors, fmt.Sprintf("Rule %d: %v", result.index+1, result.err)) | ||
| continue |
There was a problem hiding this comment.
[nitpick] The error slice collection from parallel goroutines (line 92) is not ordered, which makes error messages less deterministic. Consider using a map with indices or sorting errors before reporting to provide consistent error output across runs.
| systemPrompt := fmt.Sprintf(`You are a code quality expert. Analyze the given coding rule and determine which linters can ACTUALLY enforce it using their NATIVE rules (without plugins). | ||
|
|
||
| type inferenceJob struct { | ||
| ruleWithIndex | ||
| intent *llm.RuleIntent | ||
| err error | ||
| warning string | ||
| } | ||
| Available linters and NATIVE capabilities: | ||
| - eslint: ONLY native ESLint rules (no-console, no-unused-vars, eqeqeq, no-var, camelcase, new-cap, max-len, max-lines, no-eval, etc.) | ||
| - CAN: Simple syntax checks, variable naming, console usage, basic patterns | ||
| - CANNOT: Complex business logic, context-aware rules, file naming, advanced async patterns | ||
| - prettier: Code formatting ONLY (quotes, semicolons, indentation, line length, trailing commas) | ||
| - tsc: TypeScript type checking ONLY (strict modes, noImplicitAny, strictNullChecks, type inference) | ||
| - checkstyle: Java style checks (naming, whitespace, imports, line length, complexity) | ||
| - pmd: Java code quality (unused code, empty blocks, naming conventions, design issues) | ||
|
|
||
| // Create worker pool with concurrency limit | ||
| maxWorkers := 5 // Limit concurrent LLM API calls | ||
| jobs := make(chan ruleWithIndex, len(applicableRules)) | ||
| results := make(chan inferenceJob, len(applicableRules)) | ||
| STRICT Rules for selection: | ||
| 1. ONLY select if the linter has a NATIVE rule that can enforce this | ||
| 2. If the rule requires understanding business logic or context → return [] | ||
| 3. If the rule requires custom plugins → return [] | ||
| 4. If the rule is about file naming → return [] | ||
| 5. If the rule requires deep semantic analysis → return [] | ||
| 6. When in doubt, return [] (better to use llm-validator than fail) | ||
|
|
||
| // Start workers | ||
| for w := 0; w < maxWorkers; w++ { | ||
| go func() { | ||
| for job := range jobs { | ||
| inferResult, err := c.inferencer.InferFromUserRule(ctx, &job.rule) | ||
| Available linters for this rule: %s | ||
|
|
||
| jobResult := inferenceJob{ | ||
| ruleWithIndex: job, | ||
| } | ||
| Return ONLY a JSON array of linter names (no markdown): | ||
| ["linter1", "linter2"] or [] | ||
|
|
||
| if err != nil { | ||
| jobResult.err = err | ||
| jobResult.warning = fmt.Sprintf("Rule %d (%s): %v", job.index+1, job.rule.ID, err) | ||
| } else { | ||
| jobResult.intent = inferResult.Intent | ||
|
|
||
| // Check confidence threshold | ||
| if inferResult.Intent.Confidence < confidenceThreshold { | ||
| jobResult.warning = fmt.Sprintf("Rule %d (%s): low confidence %.2f", | ||
| job.index+1, job.rule.ID, inferResult.Intent.Confidence) | ||
| } | ||
| } | ||
| Examples: | ||
|
|
||
| results <- jobResult | ||
| } | ||
| }() | ||
| } | ||
| Input: "Use single quotes for strings" | ||
| Output: ["prettier"] | ||
|
|
||
| // Send jobs | ||
| for _, rule := range applicableRules { | ||
| jobs <- rule | ||
| } | ||
| close(jobs) | ||
| Input: "No console.log allowed" | ||
| Output: ["eslint"] | ||
|
|
||
| // Collect results | ||
| inferenceResults := make(map[int]inferenceJob) | ||
| for i := 0; i < len(applicableRules); i++ { | ||
| jobResult := <-results | ||
| inferenceResults[jobResult.index] = jobResult | ||
| } | ||
| close(results) | ||
| Input: "Classes start with capital letter" | ||
| Output: ["eslint"] | ||
|
|
||
| // Process results in original order | ||
| for _, ruleInfo := range applicableRules { | ||
| jobResult := inferenceResults[ruleInfo.index] | ||
| Input: "Maximum line length is 120" | ||
| Output: ["prettier"] | ||
|
|
||
| if jobResult.err != nil { | ||
| result.Warnings = append(result.Warnings, jobResult.warning) | ||
| continue | ||
| } | ||
| Input: "No implicit any types" | ||
| Output: ["tsc"] | ||
|
|
||
| if jobResult.warning != "" { | ||
| result.Warnings = append(result.Warnings, jobResult.warning) | ||
| } | ||
| Input: "All async functions must have try-catch" | ||
| Output: [] | ||
| Reason: Requires semantic understanding of error handling | ||
|
|
||
| // Convert to linter-specific rule | ||
| linterRule, err := converter.Convert(&ruleInfo.rule, jobResult.intent) | ||
| if err != nil { | ||
| result.Errors = append(result.Errors, fmt.Errorf("rule %d: %w", ruleInfo.index+1, err)) | ||
| continue | ||
| } | ||
| Input: "File names must be kebab-case" | ||
| Output: [] | ||
| Reason: File naming requires plugin | ||
|
|
||
| result.Rules = append(result.Rules, linterRule) | ||
|
|
||
| // Track which engine will validate this rule | ||
| // Check if the rule has meaningful configuration content | ||
| hasContent := false | ||
| if len(linterRule.Config) > 0 { | ||
| // Check if config has actual rule content (not just empty nested structures) | ||
| for key, value := range linterRule.Config { | ||
| if key == "modules" && value != nil { | ||
| // For Checkstyle, check if modules slice is not empty | ||
| v := reflect.ValueOf(value) | ||
| if v.Kind() == reflect.Slice && v.Len() > 0 { | ||
| hasContent = true | ||
| break | ||
| } | ||
| } else if key == "rules" && value != nil { | ||
| // For PMD, check if rules array/slice is not empty | ||
| v := reflect.ValueOf(value) | ||
| if v.Kind() == reflect.Slice && v.Len() > 0 { | ||
| hasContent = true | ||
| break | ||
| } | ||
| } else if key != "" && value != nil { | ||
| // For ESLint and other formats with direct config | ||
| hasContent = true | ||
| break | ||
| } | ||
| } | ||
| } | ||
| Input: "API handlers must return proper status codes" | ||
| Output: [] | ||
| Reason: Requires business logic understanding | ||
|
|
||
| if hasContent { | ||
| result.RuleEngineMap[ruleInfo.rule.ID] = converter.Name() | ||
| } else { | ||
| result.RuleEngineMap[ruleInfo.rule.ID] = "llm-validator" | ||
| } | ||
| } | ||
| Input: "Database queries must use parameterized queries" | ||
| Output: [] | ||
| Reason: Requires understanding SQL injection context | ||
|
|
||
| // Generate final configuration | ||
| config, err := converter.GenerateConfig(result.Rules) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to generate config: %w", err) | ||
| } | ||
| Input: "No hardcoded API keys or passwords" | ||
| Output: [] | ||
| Reason: Requires semantic analysis of what constitutes secrets | ||
|
|
||
| result.Config = config | ||
| Input: "Imports from large packages must be specific" | ||
| Output: [] | ||
| Reason: Requires knowing which packages are "large"`, availableLinters) |
There was a problem hiding this comment.
The LLM routing prompt is extremely long (lines 298-361) and duplicates logic that should be in the linter converters themselves. This creates a maintenance burden - if a linter's capabilities change, this prompt must be updated. Consider moving capability detection to the linter converters and having them return a confidence score, or use a more maintainable data structure for linter capabilities.
| for i, rule := range rules { | ||
| wg.Add(1) | ||
| go func(idx int, r schema.UserRule) { | ||
| defer wg.Done() | ||
|
|
||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to convert rule: %w", err) | ||
| module, err := c.convertSingleRule(ctx, r, llmClient) | ||
| results <- moduleResult{ | ||
| index: idx, | ||
| module: module, | ||
| err: err, | ||
| } | ||
| }(i, rule) |
There was a problem hiding this comment.
The goroutines for Checkstyle module conversion (lines 73-82) have no concurrency limit. For large policies, this creates unbounded parallelism with LLM API calls. The same issue exists in PMD converter (lines 69-78). Consider implementing a worker pool pattern to limit concurrent LLM API calls across all linter converters.
| // Increment counter and launch goroutine | ||
| mu.Lock() | ||
| result.Checked++ | ||
| mu.Unlock() | ||
|
|
||
| wg.Add(1) | ||
| go func(ch GitChange, lines []string, r schema.PolicyRule) { | ||
| defer wg.Done() | ||
|
|
||
| violation, err := llmValidator.CheckRule(ctx, ch, lines, r) | ||
| if err != nil { | ||
| fmt.Printf("⚠️ Validation failed for rule %s: %v\n", r.ID, err) | ||
| return | ||
| } | ||
|
|
||
| mu.Lock() | ||
| defer mu.Unlock() |
There was a problem hiding this comment.
The result.Checked counter is incremented before launching the goroutine (line 504), but if the goroutine encounters an error and returns early (line 514), the counter will be inaccurate because it won't correspond to actual checks performed. Consider incrementing the counter only when the check completes successfully, or decrement it on error.
| // Increment counter and launch goroutine | |
| mu.Lock() | |
| result.Checked++ | |
| mu.Unlock() | |
| wg.Add(1) | |
| go func(ch GitChange, lines []string, r schema.PolicyRule) { | |
| defer wg.Done() | |
| violation, err := llmValidator.CheckRule(ctx, ch, lines, r) | |
| if err != nil { | |
| fmt.Printf("⚠️ Validation failed for rule %s: %v\n", r.ID, err) | |
| return | |
| } | |
| mu.Lock() | |
| defer mu.Unlock() | |
| // Launch goroutine for check | |
| wg.Add(1) | |
| go func(ch GitChange, lines []string, r schema.PolicyRule) { | |
| defer wg.Done() | |
| violation, err := llmValidator.CheckRule(ctx, ch, lines, r) | |
| mu.Lock() | |
| defer mu.Unlock() | |
| result.Checked++ | |
| if err != nil { | |
| fmt.Printf("⚠️ Validation failed for rule %s: %v\n", r.ID, err) | |
| return | |
| } |
| // Add to history (keep last 50 results) | ||
| history.Results = append(history.Results, record) | ||
| if len(history.Results) > 50 { | ||
| history.Results = history.Results[len(history.Results)-50:] | ||
| } |
There was a problem hiding this comment.
[nitpick] The validation history is capped at 50 results (lines 1048-1052), but this magic number is not configurable. Consider making this limit configurable through an environment variable or configuration file, or document why 50 is the appropriate limit.
| response, err := c.llmClient.Complete(ctx, systemPrompt, userPrompt) | ||
| if err != nil { | ||
| fmt.Printf("Warning: LLM routing failed for rule %s: %v\n", rule.ID, err) | ||
| return []string{} // Will fall back to llm-validator | ||
| } |
There was a problem hiding this comment.
[nitpick] When LLM routing fails (line 368), an empty array is returned, which silently falls back to llm-validator. While this is handled correctly, the warning message is only printed to stdout and not logged or returned as part of the result warnings. Consider collecting these warnings so they can be reported to the user in the final conversion result.
| for result := range results { | ||
| if result.err != nil { | ||
| errors = append(errors, fmt.Sprintf("Rule %d: %v", result.index+1, result.err)) | ||
| fmt.Printf("⚠️ ESLint rule %d conversion error: %v\n", result.index+1, result.err) | ||
| continue | ||
| } | ||
|
|
||
| // Merge all rule configs | ||
| for _, rule := range rules { | ||
| for ruleID, ruleConfig := range rule.Config { | ||
| rulesMap[ruleID] = ruleConfig | ||
| if result.ruleName != "" { | ||
| eslintRules[result.ruleName] = result.config | ||
| fmt.Printf("✓ ESLint rule %d → %s\n", result.index+1, result.ruleName) | ||
| } else { | ||
| skippedCount++ | ||
| fmt.Printf("⊘ ESLint rule %d skipped (cannot be enforced by ESLint)\n", result.index+1) | ||
| } | ||
| } |
There was a problem hiding this comment.
[nitpick] The results are collected from a channel (lines 76-89) but the order is not preserved due to concurrent execution. The error messages include the rule index (line 78), but the final output may not be in order. Consider using a slice with indices or sorting the results to provide deterministic output.
| if len(pmdRules) == 0 { | ||
| return nil, fmt.Errorf("no rules converted: %v", errors) |
There was a problem hiding this comment.
[nitpick] The error slice is not used when returning the error (line 102). The error message just stringifies all errors, but this information is lost. Consider either including the detailed errors in the returned error, or storing them in a field that can be accessed by callers.
…to-end workflow tests