Use log rank for p-adic fold chart and surface top tags #69

solresol · 2025-11-09T20:57:21Z

Summary

apply a logarithmic transform to the tag-rank axis for the non-zero coefficient chart on fold pages and refresh its labeling
update the fold chart caption to clarify the new log-log comparison
surface the strongest logistic-regression tag signals on the index taxonomy overview table

Testing

uv run -m pytest -q

Codex Task

claude · 2025-11-09T20:59:37Z

PR Review: Use log rank for p-adic fold chart and surface top tags

Summary

This PR makes two related improvements to the site's visualization and analysis capabilities:

Applies a logarithmic transformation to the tag-rank axis in the non-zero coefficient chart
Adds a new table on the index page to surface tags with the strongest logistic regression signals

Code Quality & Best Practices

Strengths:

Clean variable renaming (x_positions → log_rank_positions) accurately reflects the transformation
Clear, informative comments explaining the fallback logic for non-positive ranks (build_site.py:1774-1775)
Appropriate use of HTML escaping for user-generated content (build_site.py:2253)
Consistent formatting with existing codebase style

Areas for Improvement:

Edge case handling (build_site.py:1776-1777): The fallback to i + 1 when current_rank <= 0 is good defensive programming, but this could mask data quality issues. Consider logging a warning when this fallback is used.
Magic number: The limit of 10 tags on line 2251 ([:10]) should be extracted to a named constant for maintainability.

Potential Bugs & Issues

Critical: None identified - the code appears functionally correct.

Minor:

Null safety (build_site.py:2253): The code uses .get() with defaults, which is good. However, if top_taxonomy_path is None (not just missing), it will display "None" instead of "Unknown". Consider using: top_path = row.get('top_taxonomy_path') or 'Unknown'

Performance Considerations

Logarithm computation: The change from linear to logarithmic x-axis adds math.log() calls, but this is negligible - O(n) remains O(n) and the dataset size is small.
Query performance: The new table pulls data from taxonomy_summary.get("top_tags") which is already loaded, so no additional database overhead.

Overall: No performance concerns.

Security Considerations

Good practices observed:

HTML escaping via html.escape() prevents XSS vulnerabilities (build_site.py:2253)
No SQL injection risks - this code uses pre-loaded data structures

No security issues identified.

Test Coverage

Concern: The _generate_log_nonzero_proportion_chart function lacks unit tests. Given the mathematical transformation involved (logarithms with edge cases), this would benefit from test coverage.

Suggested test cases:

test_generate_log_nonzero_proportion_chart_handles_zero_rank: Verify fallback to index+1 when rank is 0 or negative
test_generate_log_nonzero_proportion_chart_skips_zero_proportion: Verify points with 0 proportion are skipped (can't log 0)
test_generate_log_nonzero_proportion_chart_returns_none_insufficient_data: Verify None is returned when fewer than 2 valid points

The new HTML table rendering in _build_index_html could also use a test to verify proper escaping and formatting.

Additional Observations

Documentation: The updated figure caption (build_site.py:2009) is excellent - it clearly explains the log-log nature of the visualization.
Axis labeling: The x-axis label "log(Tag Rank)" (line 1789) is mathematically precise and clear.
Consistency: The change from "Tag Position (ordered by battle rank)" to "log(Tag Rank)" aligns the visualization more closely with its actual data representation.

Recommendation

Approve with minor suggestions

The code is production-ready. The suggested improvements are enhancements rather than blockers:

Consider adding the logging for edge cases
Extract the magic number to a constant
Add unit tests for the chart generation function

Great work improving the data visualization clarity! The log-log transformation will make power-law relationships much more apparent.

Make log proportion chart use log rank and surface top tags

64f4f30

solresol added the codex label Nov 9, 2025 — with ChatGPT Codex Connector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use log rank for p-adic fold chart and surface top tags #69

Use log rank for p-adic fold chart and surface top tags #69

Uh oh!

solresol commented Nov 9, 2025

Uh oh!

claude bot commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use log rank for p-adic fold chart and surface top tags #69

Are you sure you want to change the base?

Use log rank for p-adic fold chart and surface top tags #69

Uh oh!

Conversation

solresol commented Nov 9, 2025

Summary

Testing

Uh oh!

claude bot commented Nov 9, 2025

PR Review: Use log rank for p-adic fold chart and surface top tags

Summary

Code Quality & Best Practices

Potential Bugs & Issues

Performance Considerations

Security Considerations

Test Coverage

Additional Observations

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants