Skip to content

Conversation

@Dimios45
Copy link

This PR addresses the issue reported in Keras Issue #21700, where nested output structures and corresponding losses are supported, but nested metrics are not. This caused failures when attempting to compile models with nested metrics configurations.

Changes

  • Implemented recursive handling of nested metric configurations within CompileMetrics.
  • Ensured compatibility with both nested and flat metric structures.
  • Updated internal methods to process nested metrics appropriately.

Testing

  • Added tests to verify correct handling of nested metrics during model compilation and training.
  • Tests cover nested dictionaries and lists to ensure the recursive metric construction behaves as expected and raises appropriate errors for invalid configurations.

Example Usage:

model.compile(
    optimizer='adam',
    loss={'a': 'mse', 'b': {'c': 'mse', 'd': 'mse'}},
    metrics={'a': ['mae'], 'b': {'c': 'mse', 'd': 'mae'}}

@google-cla
Copy link

google-cla bot commented Oct 20, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Dimios45, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical limitation in Keras where models with nested output structures failed to compile when provided with corresponding nested metric configurations. By introducing a comprehensive recursive processing logic within the CompileMetrics class, the system can now seamlessly interpret and apply metrics to complex, hierarchical model outputs. This enhancement ensures that Keras users can define metrics in a structure that mirrors their model's output, providing greater flexibility and consistency in model definition and training.

Highlights

  • Nested Metrics Support: The CompileMetrics utility now fully supports nested dictionary and list structures for metrics, aligning with existing support for nested output structures and losses in Keras models.
  • Recursive Metric Processing: A new recursive mechanism (_build_nested_metrics) has been implemented to correctly parse and build metrics for complex, hierarchical output configurations, ensuring proper assignment and initialization.
  • New Helper Methods: Several new internal methods, including _has_nested_structure, _build_single_output_metrics, _build_flat_placeholders, _build_metrics_set_for_nested, and _flatten_dict_keys, were introduced to facilitate the robust handling of nested metric configurations.
  • Comprehensive Testing: New unit tests have been added to validate the correct behavior of nested metrics during model compilation and training, covering various nested dictionary and list scenarios, as well as compatibility with symbolic tensors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for nested metrics in CompileMetrics, aligning its functionality with the existing support for nested losses. The implementation correctly uses recursion to handle complex nested structures. My review focuses on improving the robustness of the error handling, ensuring consistent behavior for user inputs, and strengthening the new tests to fully validate the feature. Overall, this is a valuable addition that enhances Keras's flexibility.

Comment on lines +610 to +634
def test_nested_dict_metrics():
import numpy as np
from keras.src import layers
from keras.src import Input
from keras.src import Model

X = np.random.rand(100, 32)
Y1 = np.random.rand(100, 10)
Y2 = np.random.rand(100, 10)
Y3 = np.random.rand(100, 10)

def create_model():
x = Input(shape=(32,))
y1 = layers.Dense(10)(x)
y2 = layers.Dense(10)(x)
y3 = layers.Dense(10)(x)
return Model(inputs=x, outputs={'a': y1, 'b': {'c': y2, 'd': y3}})

model = create_model()
model.compile(
optimizer='adam',
loss={'a': 'mse', 'b': {'c': 'mse', 'd': 'mse'}},
metrics={'a': ['mae'], 'b': {'c': 'mse', 'd': 'mae'}},
)
model.train_on_batch(X, {'a': Y1, 'b': {'c': Y2, 'd': Y3}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test function test_nested_dict_metrics lacks assertions, which means it only verifies that the code runs without raising an exception. A test should validate the output or behavior. Please add assertions to check the metrics returned by train_on_batch or the model.metrics_names attribute after compilation.

Additionally, having a standalone function with the same name as a class method in the same file is confusing. It would be better to integrate this into a test class and give it a more descriptive name, such as test_nested_metrics_with_model_compile.

elif isinstance(metrics_cfg, (list, tuple)) and isinstance(yp, (list, tuple)):

flat_metrics = []
for i, (m_cfg, y_t_elem, y_p_elem) in enumerate(zip(metrics_cfg, yt, yp)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When handling lists/tuples in build_recursive_metrics, using zip can silently truncate sequences if metrics_cfg, yt, and yp have different lengths. This is inconsistent with _build_metrics_set and _build_metrics_set_for_nested, which explicitly check for length mismatches and raise a ValueError. For a consistent and predictable user experience, you should add a length check here to ensure the structures are compatible and raise an informative error if they are not.

value = compile_loss(y_true, y_pred)
self.assertAllClose(value, 1.07666, atol=1e-5)

def test_nested_dict_metrics(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test method test_nested_dict_metrics is defined within the TestCompileLoss class, but it is testing the functionality of CompileMetrics. For better code organization and clarity, this test should be moved to the TestCompileMetrics class.

Comment on lines +585 to +591
expected_metric_names = []
for key in result.keys():
if 'mean_squared_error' in key or 'mean_absolute_error' in key:
expected_metric_names.append(key)

# At least some metrics should be computed
self.assertGreater(len(expected_metric_names), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertions in this test are too weak. It only checks that len(expected_metric_names) is greater than zero, which doesn't confirm that the correct metrics are created with the correct names for the nested structure. The test should be more specific by asserting the presence of expected metric names (e.g., a_mean_squared_error, b_c_mean_squared_error). This would provide a much stronger validation of the implementation.

Suggested change
expected_metric_names = []
for key in result.keys():
if 'mean_squared_error' in key or 'mean_absolute_error' in key:
expected_metric_names.append(key)
# At least some metrics should be computed
self.assertGreater(len(expected_metric_names), 0)
self.assertIn("a_mean_squared_error", result)
self.assertIn("b_c_mean_squared_error", result)
self.assertIn("b_c_mean_absolute_error", result)
self.assertIn("b_d_mean_squared_error", result)
self.assertEqual(len(result), 4)

@codecov-commenter
Copy link

codecov-commenter commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 6.03448% with 109 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.37%. Comparing base (f78fd8c) to head (017c2e3).

Files with missing lines Patch % Lines
keras/src/trainers/compile_utils.py 6.03% 109 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (f78fd8c) and HEAD (017c2e3). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (f78fd8c) HEAD (017c2e3)
keras 5 1
keras-numpy 1 0
keras-tensorflow 1 0
keras-torch 1 0
keras-jax 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #21761       +/-   ##
===========================================
- Coverage   82.70%   34.37%   -48.33%     
===========================================
  Files         573      573               
  Lines       58817    58932      +115     
  Branches     9202     9243       +41     
===========================================
- Hits        48643    20257    -28386     
- Misses       7837    37768    +29931     
+ Partials     2337      907     -1430     
Flag Coverage Δ
keras 34.37% <6.03%> (-48.14%) ⬇️
keras-jax ?
keras-numpy ?
keras-openvino 34.37% <6.03%> (-0.06%) ⬇️
keras-tensorflow ?
keras-torch ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants