Skip to content

Conversation

@solresol
Copy link
Owner

@solresol solresol commented Nov 9, 2025

Summary

  • apply a logarithmic transform to the tag-rank axis for the non-zero coefficient chart on fold pages and refresh its labeling
  • update the fold chart caption to clarify the new log-log comparison
  • surface the strongest logistic-regression tag signals on the index taxonomy overview table

Testing

  • uv run -m pytest -q

Codex Task

@claude
Copy link

claude bot commented Nov 9, 2025

PR Review: Use log rank for p-adic fold chart and surface top tags

Summary

This PR makes two related improvements to the site's visualization and analysis capabilities:

  1. Applies a logarithmic transformation to the tag-rank axis in the non-zero coefficient chart
  2. Adds a new table on the index page to surface tags with the strongest logistic regression signals

Code Quality & Best Practices

Strengths:

  • Clean variable renaming (x_positions → log_rank_positions) accurately reflects the transformation
  • Clear, informative comments explaining the fallback logic for non-positive ranks (build_site.py:1774-1775)
  • Appropriate use of HTML escaping for user-generated content (build_site.py:2253)
  • Consistent formatting with existing codebase style

Areas for Improvement:

  1. Edge case handling (build_site.py:1776-1777): The fallback to i + 1 when current_rank <= 0 is good defensive programming, but this could mask data quality issues. Consider logging a warning when this fallback is used.

  2. Magic number: The limit of 10 tags on line 2251 ([:10]) should be extracted to a named constant for maintainability.

Potential Bugs & Issues

Critical: None identified - the code appears functionally correct.

Minor:

  1. Null safety (build_site.py:2253): The code uses .get() with defaults, which is good. However, if top_taxonomy_path is None (not just missing), it will display "None" instead of "Unknown". Consider using: top_path = row.get('top_taxonomy_path') or 'Unknown'

Performance Considerations

  1. Logarithm computation: The change from linear to logarithmic x-axis adds math.log() calls, but this is negligible - O(n) remains O(n) and the dataset size is small.

  2. Query performance: The new table pulls data from taxonomy_summary.get("top_tags") which is already loaded, so no additional database overhead.

Overall: No performance concerns.

Security Considerations

Good practices observed:

  • HTML escaping via html.escape() prevents XSS vulnerabilities (build_site.py:2253)
  • No SQL injection risks - this code uses pre-loaded data structures

No security issues identified.

Test Coverage

Concern: The _generate_log_nonzero_proportion_chart function lacks unit tests. Given the mathematical transformation involved (logarithms with edge cases), this would benefit from test coverage.

Suggested test cases:

  • test_generate_log_nonzero_proportion_chart_handles_zero_rank: Verify fallback to index+1 when rank is 0 or negative
  • test_generate_log_nonzero_proportion_chart_skips_zero_proportion: Verify points with 0 proportion are skipped (can't log 0)
  • test_generate_log_nonzero_proportion_chart_returns_none_insufficient_data: Verify None is returned when fewer than 2 valid points

The new HTML table rendering in _build_index_html could also use a test to verify proper escaping and formatting.

Additional Observations

  1. Documentation: The updated figure caption (build_site.py:2009) is excellent - it clearly explains the log-log nature of the visualization.

  2. Axis labeling: The x-axis label "log(Tag Rank)" (line 1789) is mathematically precise and clear.

  3. Consistency: The change from "Tag Position (ordered by battle rank)" to "log(Tag Rank)" aligns the visualization more closely with its actual data representation.

Recommendation

Approve with minor suggestions

The code is production-ready. The suggested improvements are enhancements rather than blockers:

  • Consider adding the logging for edge cases
  • Extract the magic number to a constant
  • Add unit tests for the chart generation function

Great work improving the data visualization clarity! The log-log transformation will make power-law relationships much more apparent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants