Bayesian Redesign

Mathematical Bayesian Redesign for Format Detection

Problem Analysis

The current system is failing because of a mathematical design flaw, not parameter tuning issues.

Root Cause Evidence

JIRA_XRAY format has unique indicators: testExecutions (UNIQUE weight 5), xrayInfo (STRONG weight 3)
Other formats have only generic indicators: ZEPHYR has executionId, version (STRONG weight 3 each)
Yet likelihood ratios are only 1.69:1, far below the required 2.0:1

This indicates that the Bayesian scoring function is not properly weighting unique evidence.

Mathematical Foundation

Current (Flawed) Approach

likelihood = w₁·completeness^p₁ + w₂·quality^p₂ + w₃·uniqueness^p₃ + interactions

Problems:

Power functions compress high values: 0.8^0.45 ≈ 0.8^0.5 ≈ 0.89
Linear combination dilutes discriminative power: 65% uniqueness still gets diluted by 35% other factors
No proper probabilistic interpretation: This is an ad-hoc scoring function, not a likelihood

Mathematically Rigorous Solution

Option 1: Proper Hierarchical Bayesian Model

Stage 1: Test Data Detection (Binary Classification)

P(is_test_data|evidence) = Bernoulli(θ)
θ ~ Beta(α, β)  # Prior on test data prevalence

P(evidence|is_test_data=True) = Multinomial(evidence_counts | λ_test)
P(evidence|is_test_data=False) = Multinomial(evidence_counts | λ_not_test)

Posterior: P(is_test_data|evidence) = f(evidence_counts, λ_test, λ_not_test, α, β)

Stage 2: Format Discrimination (Multi-class Classification)

P(format|evidence, is_test_data=True) = Categorical(π)
π ~ Dirichlet(α₁, α₂, ..., αₖ)  # Format priors

P(evidence|format=i) = Product over evidence_types:
  - Unique indicators: Bernoulli(p_unique_i)
  - Strong indicators: Bernoulli(p_strong_i)
  - Moderate indicators: Bernoulli(p_moderate_i)

Posterior: P(format=i|evidence) ∝ P(evidence|format=i) * P(format=i)

Option 2: Information-Theoretic Approach

Mutual Information-Based Scoring

I(format; evidence) = Σ P(format,evidence) log(P(format,evidence) / (P(format)P(evidence)))

Discriminative power = KL(P(evidence|format) || P(evidence|other_formats))

Posterior probability derived from information gain

Option 3: Logistic Regression with Feature Engineering

Multi-class Logistic Model

P(format=i|features) = softmax(Wᵢ · features + bᵢ)

Features = [
  unique_indicator_count,
  strong_indicator_count,
  moderate_indicator_count,
  field_specificity_score,
  structural_complexity,
  evidence_quality_score
]

Regularization: L1 for feature selection, L2 for stability

Recommended Implementation

Phase 1: Fix Current Bayesian Model

Immediate fix using proper probability theory:

Evidence Independence Assumption

# Current: Arbitrary weighted combination
likelihood = w₁·C^p₁ + w₂·Q^p₂ + w₃·U^p₃

# Proper: Independent evidence multiplication
likelihood = P(completeness|evidence) * P(quality|evidence) * P(uniqueness|evidence)

Log-Likelihood for Numerical Stability

log_likelihood = log(P(completeness|evidence)) + log(P(quality|evidence)) + log(P(uniqueness|evidence))

Proper P(Evidence|Format) Models

# For unique indicators (presence/absence)
P(unique_present|format) = Beta(α_unique_present, β_unique_present)

# For count-based evidence
P(count|format) = Poisson(λ_count) or NegativeBinomial

# For quality scores
P(quality|format) = Beta(α_quality, β_quality)

Phase 2: Implement Hierarchical Model

Stage 1: Test data detection using:

Evidence count models (Poisson/NegativeBinomial)
Structural quality assessment
Field specificity scoring

Stage 2: Format discrimination with:

Dirichlet priors on format frequencies
Bernoulli/Beta models for indicator presence
Logarithmic scaling for rare indicators

Phase 3: Long-Term Exploration

Hierarchical Bayesian modeling with PyMC or NumPyro for full posterior inference.
Mutual information feature selection to identify the most discriminative indicators.
Hybrid Bayesian + logistic regression approach where Bayesian priors inform logistic regression features.

Risks and Mitigations

Data sparsity: Some formats may lack sufficient training data. Mitigate with informative priors and synthetic augmentation.
Model interpretability: Keep explanations grounded in evidence metrics and publish derived parameters.
Computational overhead: Pre-compute format-specific parameters and cache likelihood components.

Next Steps

Implement evidence independence in the current scorer.
Replace ad-hoc likelihoods with Beta/Bernoulli models per evidence type.
Establish benchmarking suite aligned with wiki/benchmarks/format_detection_benchmark.json.

Uh oh!

Bayesian Redesign

Mathematical Bayesian Redesign for Format Detection

Problem Analysis

Root Cause Evidence

Mathematical Foundation

Current (Flawed) Approach

Mathematically Rigorous Solution

Option 1: Proper Hierarchical Bayesian Model

Stage 1: Test Data Detection (Binary Classification)

Stage 2: Format Discrimination (Multi-class Classification)

Option 2: Information-Theoretic Approach

Mutual Information-Based Scoring

Option 3: Logistic Regression with Feature Engineering

Multi-class Logistic Model

Recommended Implementation

Phase 1: Fix Current Bayesian Model

Phase 2: Implement Hierarchical Model

Phase 3: Long-Term Exploration

Risks and Mitigations

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally