Skip to content

Conversation

daquinteroflex
Copy link
Collaborator

@daquinteroflex daquinteroflex commented Sep 23, 2025

Greptile Overview

Updated On: 2025-09-24 14:40:31 UTC

Summary

This PR improves error handling for ComponentModeler validation failures by adding detailed error reporting and extracting hardcoded constants to module level.

  • Extracted terminal_errors and run_statuses constants to module level to eliminate code duplication
  • Added validateErrors field to BatchDetail class to capture validation error details
  • Enhanced estimate_cost function to provide detailed logging for validation failures, parsing JSON error responses and showing specific subtask failures
  • Improved maintainability by removing duplicate constant definitions across multiple functions

The changes focus on better user experience when validation fails, providing more actionable error messages instead of generic failure notifications.

Confidence Score: 3/5

  • This PR has a critical control flow bug that prevents proper error handling
  • Score reflects a serious logical error in the control flow where status checking happens inside the waiting loop instead of after it, which breaks the intended error handling behavior. The code refactoring and new features are good, but the control flow bug needs to be fixed.
  • tidy3d/web/api/webapi.py requires immediate attention due to incorrect control flow in estimate_cost function

Important Files Changed

File Analysis

Filename        Score        Overview
tidy3d/web/api/webapi.py 4/5 Extracted hardcoded constants to module level and improved error handling for validation failures with detailed logging
tidy3d/web/core/task_info.py 5/5 Added validateErrors field to BatchDetail class to support detailed error reporting

Sequence Diagram

sequenceDiagram
    participant Client
    participant estimate_cost
    participant _batch_detail
    participant Logger as log
    
    Client->>estimate_cost: estimate_cost(task_id, verbose)
    estimate_cost->>_batch_detail: get batch details
    _batch_detail-->>estimate_cost: BatchDetail with status
    
    loop Wait for termination status
        estimate_cost->>_batch_detail: poll for status update
        _batch_detail-->>estimate_cost: updated status
    end
    
    alt status in ["validate_success", "success"]
        estimate_cost->>Client: return est_flex_unit
    else status in terminal_errors
        estimate_cost->>Logger: log.error(failure message)
        
        alt status == "validate_fail"
            estimate_cost->>_batch_detail: get validateErrors
            _batch_detail-->>estimate_cost: validateErrors dict
            
            loop for each validation error
                estimate_cost->>estimate_cost: parse JSON error
                estimate_cost->>Logger: log.error(detailed validation error)
            end
        end
        
        estimate_cost->>estimate_cost: break from loop
    end
Loading

@daquinteroflex daquinteroflex force-pushed the dario/fix_cm_web_bug branch 4 times, most recently from 76b8c00 to e59ffd4 Compare September 24, 2025 14:37
@daquinteroflex
Copy link
Collaborator Author

Screenshot from 2025-09-24 16-25-46

@daquinteroflex daquinteroflex marked this pull request as ready for review September 24, 2025 14:38
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/web/api/webapi.py (8.3%): Missing lines 1098,1110,1177,1198,1358-1359,1364-1365,1367-1370,1375-1377,1379-1381,1383-1385,1391
  • tidy3d/web/core/task_info.py (100%)

Summary

  • Total: 25 lines
  • Missing: 22 lines
  • Coverage: 12%

tidy3d/web/api/webapi.py

  1094             d = _batch_detail(batch_id)
  1095             s = d.totalStatus.value
  1096             total = d.totalTask or 0
  1097             r = d.runSuccess or 0
! 1098             if s in TERMINAL_ERRORS:
  1099                 raise WebError(f"Batch {batch_id} terminated: {s}")
  1100             if total and r >= total:
  1101                 break
  1102             time.sleep(REFRESH_TIME)

  1106             d = _batch_detail(batch_id)
  1107             postprocess_status = d.postprocessStatus
  1108             if postprocess_status == "success":
  1109                 break
! 1110             elif postprocess_status in TERMINAL_ERRORS:
  1111                 raise WebError(
  1112                     f"Batch {batch_id} terminated. Please contact customer support and provide this Component Modeler batch ID: '{batch_id}'"
  1113                 )
  1114             time.sleep(REFRESH_TIME)

  1173                     progress.update(pbar, completed=completed, description=desc, refresh=False)
  1174 
  1175             if total and r >= total:
  1176                 break
! 1177             if status in TERMINAL_ERRORS:
  1178                 raise WebError(f"Batch {batch_id} terminated: {status}")
  1179             progress.refresh()
  1180             time.sleep(REFRESH_TIME)

  1194             elif postprocess_status == "preprocess":
  1195                 progress.update(p_post, completed=0.33)
  1196             elif postprocess_status == "running":
  1197                 progress.update(p_post, completed=0.55)
! 1198             elif postprocess_status in TERMINAL_ERRORS:
  1199                 raise WebError(
  1200                     f"Batch {batch_id} terminated. Please contact customer support and provide this Component Modeler batch ID: '{batch_id}'"
  1201                 )
  1202             progress.refresh()

  1354 
  1355     console = get_logging_console() if verbose else None
  1356 
  1357     if _is_modeler_batch(task_id):
! 1358         d = _batch_detail(task_id)
! 1359         status = d.totalStatus.value
  1360 
  1361         # Wait for a termination status
  1362         while status not in ["validate_success", "success", "error", "failed"]:
  1363             time.sleep(REFRESH_TIME)
! 1364             d = _batch_detail(task_id)
! 1365             status = d.totalStatus.value
  1366 
! 1367             if status in ["validate_success", "success"]:
! 1368                 est_flex_unit = _batch_detail(task_id).estFlexUnit
! 1369                 if verbose:
! 1370                     console.log(
  1371                         f"Maximum FlexCredit cost: {est_flex_unit:1.3f}. Minimum cost depends on "
  1372                         "task execution details. Use 'web.real_cost(task_id)' to get the billed FlexCredit "
  1373                         "cost after a simulation run."
  1374                     )
! 1375                 return est_flex_unit
! 1376             elif status in TERMINAL_ERRORS:
! 1377                 log.error(f"The ComponentModeler '{task_id}' has failed: {status}")
  1378 
! 1379                 if status == "validate_fail":
! 1380                     assert d.validateErrors is not None
! 1381                     for key, error in d.validateErrors.items():
  1382                         # I don't like this ideally but would like to control the endpoint to make this better
! 1383                         error_dict = json.loads(error)
! 1384                         validation_error = error_dict["validation_error"]
! 1385                         log.error(
  1386                             f"Subtask '{key}' has failed to validate:"
  1387                             f" \n {validation_error} \n "
  1388                             f"Fix your component modeler configuration. "
  1389                             f"Generate subtask simulations locally using `ComponentModelerType.sim_dict`."

  1387                             f" \n {validation_error} \n "
  1388                             f"Fix your component modeler configuration. "
  1389                             f"Generate subtask simulations locally using `ComponentModelerType.sim_dict`."
  1390                         )
! 1391                 break
  1392     else:
  1393         task = SimulationTask.get(task_id)
  1394 
  1395         if not task:

Copy link
Contributor

@yuanshen-flexcompute yuanshen-flexcompute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with a dummy notebook with intentional error (too many time steps) and get this output:

image

Is there a way to get a more descriptive error message?

@daquinteroflex
Copy link
Collaborator Author

Are you running in dev? In dev, it has the new field that contains the error detail per task as shown above

@daquinteroflex daquinteroflex added this pull request to the merge queue Sep 30, 2025
Merged via the queue into develop with commit 8edc753 Sep 30, 2025
34 checks passed
@daquinteroflex daquinteroflex deleted the dario/fix_cm_web_bug branch September 30, 2025 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants