Skip to content

Conversation

@romankurnovskii
Copy link
Owner

@romankurnovskii romankurnovskii commented Dec 8, 2025

Summary

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Performance improvement
  • Documentation/Tests

Objective

For new features and performance improvements: Clearly describe the objective and rationale for this change.

Testing

  • Unit tests added/updated
  • Integration tests added/updated

Breaking Changes

  • This PR contains breaking changes

If this is a breaking change, describe:

  • What functionality is affected
  • Migration path for existing users

Checklist

  • Code follows project style guidelines
  • Documentation updated where necessary
  • No secrets or sensitive information committed

Related Issues

Closes #[issue number]

Summary by Sourcery

Adjust JSON normalization formatting to support multi-value-per-line numeric arrays with a configurable print width and propagate this setting through JSON value formatting.

Enhancements:

  • Change JSON number array formatting to pack multiple values per line based on a target print width instead of one per line.
  • Add a configurable print_width parameter to the JSON formatter and propagate it through nested formatting calls.

Summary by CodeRabbit

  • Chores
    • Updated internal JSON formatting utilities for improved layout handling.

✏️ Tip: You can customize this high-level summary in your review settings.

…er line

- Updated format_json_value to format number arrays with multiple numbers per line
- Numbers wrap at printWidth (150) to match Prettier style
- Matches the formatting style in book-sets.json
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 8, 2025

Reviewer's Guide

Updates JSON normalization to support configurable line-wrapping for numeric arrays and reapplies formatting to book-sets.json to match the new style.

Flow diagram for updated JSON normalization and numeric array wrapping

flowchart TD
  A["sort_json_by_numeric_keys called with input_file and output_file"] --> B["Read raw JSON from input_file"]
  B --> C["Parse JSON into data structure"]
  C --> D["Sort top level keys numerically into sorted_data"]
  D --> E["Initialize lines list with '{'"]
  E --> F["For each key,value in sorted_data"]

  subgraph G["Formatting each value"]
    direction TB
    F --> G1["Call format_json_value(value, indent_level 1, print_width 150)"]
    G1 --> H{Value type?}
    H --> I["None"]
    H --> J["bool"]
    H --> K["int or float"]
    H --> L["str"]
    H --> M["dict"]
    H --> N["list"]

    M --> M1["For each item in sorted dict: format_json_value(v, indent_level+1, print_width)"]
    M1 --> M2["Join formatted items with commas and newlines, wrap in '{ }'"]

    N --> N1{"List empty?"}
    N1 --> N2["Return '[]'"]
    N1 --> N3{"First element is number?"}

    N3 --> O["Numeric array formatting with wrapping"]

    subgraph P["Numeric array wrapping algorithm"]
      direction TB
      O --> P1["Compute available_width = print_width - len(next_indent) - 2"]
      P1 --> P2["Initialize lines, current_line, current_length = 0"]
      P2 --> P3["For each item in list"]
      P3 --> P4["Compute item_str and item_length (include comma+space if needed)"]
      P4 --> P5{"current_length + item_length > available_width and current_line not empty?"}
      P5 --> P6["Append current_line joined by ', ' to lines and start new line with item"]
      P5 --> P7["Append item to current_line and update current_length"]
      P6 --> P8["After loop, append remaining current_line to lines"]
      P7 --> P8
      P8 --> P9["Indent each line with next_indent and wrap with '[ ]'"]
    end

    N3 --> Q["Non numeric list formatting"]
    Q --> Q1["formatted_items = format_json_value(item, indent_level+1, print_width) for each item"]
    Q1 --> Q2["Compute total_length of formatted_items"]
    Q2 --> Q3{"total_length < 100 and len(list) <= 5?"}
    Q3 --> Q4["Render on single line"]
    Q3 --> Q5["Render one item per line with indentation"]
  end

  G --> R["Append '  key: formatted_value' to lines"]
  R --> S{"More keys?"}
  S --> F
  S --> T["Append '}' to lines"]
  T --> U["Join lines with newlines"]
  U --> V["Write normalized JSON to output_file"]
Loading

File-Level Changes

Change Details Files
Enhance JSON formatter to wrap numeric arrays across multiple elements per line based on a configurable print width.
  • Add a print_width parameter to format_json_value and propagate it through all recursive calls.
  • Change numeric list formatting from one-number-per-line to multi-number-per-line with width-based line breaking.
  • Introduce width-based packing logic that builds comma-separated lines without exceeding the configured print width.
  • Update sort_json_by_numeric_keys to call format_json_value with an explicit print width of 150 characters.
scripts/normalize_json.py
Regenerate normalized JSON data using the updated formatting rules for numeric arrays.
  • Reformat JSON structure to use the new multi-number-per-line style for numeric arrays, maintaining key ordering and indentation.
  • Ensure the data file remains semantically equivalent while matching the updated normalization script output.
data/book-sets.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

A new optional print_width parameter (default 150) was added to the format_json_value() function to control output width during JSON formatting. The parameter is threaded through recursive calls and applied to numeric list formatting for multi-line, width-constrained layouts.

Changes

Cohort / File(s) Summary
JSON formatting enhancement
scripts/normalize_json.py
Added print_width parameter to format_json_value() function signature; threaded parameter through recursive calls for dictionaries and lists; updated numeric list formatting logic to apply width constraints; updated sort_json_by_numeric_keys() to invoke format_json_value() with print_width=150

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Single-file change with consistent parameter-threading pattern across recursive calls
  • Review focus: validate that print_width is correctly propagated through all call paths and that list formatting logic properly applies width constraints without unintended side effects

Poem

🐰 A width-aware whisker-twitch of delight!
The JSON now flows with precision just right,
One-fifty characters, perfectly spaced,
Lists are constrained—no more traces misplaced!
Small change, big format—the normalizer's might!

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch problems-1925-3100-3133-3164-3197-3228-3260-3291-3320-3351-3380-3413-3444-3471

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1f41ed and 849b266.

📒 Files selected for processing (1)
  • scripts/normalize_json.py (3 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@romankurnovskii romankurnovskii merged commit 66c5a52 into main Dec 8, 2025
1 of 4 checks passed
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In the numeric list branch you switched from json.dumps to str(item), which may change how floats, large ints, or non-ASCII values are serialized compared to the rest of the JSON; consider using json.dumps(item, ensure_ascii=False) consistently to avoid subtle formatting differences.
  • The print_width is currently hard-coded as 150 in sort_json_by_numeric_keys; if different widths might be useful, consider threading this through as a parameter rather than a constant.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the numeric list branch you switched from `json.dumps` to `str(item)`, which may change how floats, large ints, or non-ASCII values are serialized compared to the rest of the JSON; consider using `json.dumps(item, ensure_ascii=False)` consistently to avoid subtle formatting differences.
- The `print_width` is currently hard-coded as 150 in `sort_json_by_numeric_keys`; if different widths might be useful, consider threading this through as a parameter rather than a constant.

## Individual Comments

### Comment 1
<location> `scripts/normalize_json.py:42-51` </location>
<code_context>
+            for i, item in enumerate(value):
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider using `json.dumps` instead of `str` for numeric items to keep JSON formatting consistent.

In this numeric-list branch you're using `str(item)`/`str(x)` for both measuring and emitting, whereas the previous code used `json.dumps(item, ensure_ascii=False)`. That change can alter JSON formatting and make behavior diverge from other branches. Consider computing `item_str = json.dumps(item, ensure_ascii=False)` once and reusing it for both width calculation and output.

Suggested implementation:

```python
            for i, item in enumerate(value):
                item_str = json.dumps(item, ensure_ascii=False)
                # Add comma and space length (2) if not first item on line
                item_length = len(item_str) + (2 if current_line else 0)

                if current_length + item_length > available_width and current_line:
                    # Start a new line
                    lines.append(", ".join(current_line))
                    current_line = [item_str]
                    current_length = len(item_str)
                else:

```

1. Ensure `import json` is present at the top of `scripts/normalize_json.py` if it's not already imported.
2. In the `else:` branch (not shown), make sure you append `item_str` to `current_line` (not the raw `item`) and keep `current_length` updated with `len(item_str)` plus the comma/space when applicable.
</issue_to_address>

### Comment 2
<location> `scripts/normalize_json.py:17-20` </location>
<code_context>


-def format_json_value(value, indent_level=0):
+def format_json_value(value, indent_level=0, print_width=150):
     """Format a JSON value with custom formatting following Prettier style."""
     indent = "  " * indent_level
</code_context>

<issue_to_address>
**suggestion:** The `print_width` parameter isn’t applied consistently, e.g. the non-numeric list single-line threshold is still hardcoded.

`print_width` is used for numeric lists, but other arrays still rely on the hardcoded `total_length < 100` check. Consider basing that threshold on `print_width` (optionally with a margin) so the width configuration is applied consistently across all list types.

Suggested implementation:

```python
def format_json_value(value, indent_level=0, print_width=150):

```

```python
    elif isinstance(value, list):
        if not value:

```

```python
        total_length = len(indent) + 2 + sum(len(item) + 2 for item in items) - 2
        max_width_for_list = max(print_width - len(indent), 0)
        if total_length <= max_width_for_list:

```

I assumed the non-numeric list single-line decision currently uses `if total_length < 100:` after building `items`. If the condition appears in multiple branches (e.g. for different list element types), apply the same replacement to each occurrence so all non-numeric array formatting respects `print_width`. Also ensure that all recursive calls to `format_json_value` in list handling pass the `print_width` argument through (as is already done in the dict branch shown). If the `total_length` calculation differs, keep it as-is and only adjust the width check to use `print_width` via `max_width_for_list`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +42 to +51
for i, item in enumerate(value):
item_str = str(item)
# Add comma and space length (2) if not first item on line
item_length = len(item_str) + (2 if current_line else 0)

if current_length + item_length > available_width and current_line:
# Start a new line
lines.append(", ".join(str(x) for x in current_line))
current_line = [item]
current_length = len(item_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider using json.dumps instead of str for numeric items to keep JSON formatting consistent.

In this numeric-list branch you're using str(item)/str(x) for both measuring and emitting, whereas the previous code used json.dumps(item, ensure_ascii=False). That change can alter JSON formatting and make behavior diverge from other branches. Consider computing item_str = json.dumps(item, ensure_ascii=False) once and reusing it for both width calculation and output.

Suggested implementation:

            for i, item in enumerate(value):
                item_str = json.dumps(item, ensure_ascii=False)
                # Add comma and space length (2) if not first item on line
                item_length = len(item_str) + (2 if current_line else 0)

                if current_length + item_length > available_width and current_line:
                    # Start a new line
                    lines.append(", ".join(current_line))
                    current_line = [item_str]
                    current_length = len(item_str)
                else:
  1. Ensure import json is present at the top of scripts/normalize_json.py if it's not already imported.
  2. In the else: branch (not shown), make sure you append item_str to current_line (not the raw item) and keep current_length updated with len(item_str) plus the comma/space when applicable.

Comment on lines +17 to 20
def format_json_value(value, indent_level=0, print_width=150):
"""Format a JSON value with custom formatting following Prettier style."""
indent = " " * indent_level
next_indent = " " * (indent_level + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The print_width parameter isn’t applied consistently, e.g. the non-numeric list single-line threshold is still hardcoded.

print_width is used for numeric lists, but other arrays still rely on the hardcoded total_length < 100 check. Consider basing that threshold on print_width (optionally with a margin) so the width configuration is applied consistently across all list types.

Suggested implementation:

def format_json_value(value, indent_level=0, print_width=150):
    elif isinstance(value, list):
        if not value:
        total_length = len(indent) + 2 + sum(len(item) + 2 for item in items) - 2
        max_width_for_list = max(print_width - len(indent), 0)
        if total_length <= max_width_for_list:

I assumed the non-numeric list single-line decision currently uses if total_length < 100: after building items. If the condition appears in multiple branches (e.g. for different list element types), apply the same replacement to each occurrence so all non-numeric array formatting respects print_width. Also ensure that all recursive calls to format_json_value in list handling pass the print_width argument through (as is already done in the dict branch shown). If the total_length calculation differs, keep it as-is and only adjust the width check to use print_width via max_width_for_list.

@romankurnovskii romankurnovskii deleted the problems-1925-3100-3133-3164-3197-3228-3260-3291-3320-3351-3380-3413-3444-3471 branch December 8, 2025 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants