Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go from one step being an epoch to one step being a batch #1802

Closed
wants to merge 71 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
2a49b60
Move replica loop into generate_nn function
APJansen Dec 8, 2023
9c23fa5
Simplify handling of dropout
APJansen Dec 8, 2023
683e354
Factor out layer_generator in generate_dense_network
APJansen Dec 8, 2023
d02e118
Refactor dense_per_flavor_network
APJansen Dec 8, 2023
f907d7f
Move setting of last nodes to generate_nn
APJansen Dec 8, 2023
aa76bc7
Add constant arguments
APJansen Dec 8, 2023
1ad7960
Add constant arguments
APJansen Dec 8, 2023
7758497
Move dropout to generate_nn
APJansen Dec 8, 2023
2d388d9
Move concatenation of per_flavor layers into generate_nn
APJansen Dec 8, 2023
1ef87dc
Make the two layer generators almost equal
APJansen Dec 8, 2023
806e2c1
remove separate dense and dense_per_flavor functions
APJansen Dec 8, 2023
bdbc3c3
Add documentation.
APJansen Dec 8, 2023
e3f9f0c
Simplify per_flavor layer concatenation
APJansen Dec 8, 2023
3d9070f
Reverse order of loops over replicas and layers
APJansen Dec 8, 2023
c8300c8
Fixes for dropout
APJansen Dec 8, 2023
0cf23f2
Fixes for per_flavour
APJansen Dec 8, 2023
b0a8e3b
Fix issue with copying over nodes for per_flavour layer
APJansen Dec 11, 2023
97d2efe
Fix seeds in per_flavour layer
APJansen Dec 11, 2023
4c4a2d5
Add error for combination of dropout with per_flavour layers
APJansen Dec 11, 2023
2287194
Add basis_size argument to per_flavour layer
APJansen Dec 11, 2023
2f68e3d
Fix model_gen tests to use new generate_nn in favor of now removed ge…
APJansen Dec 11, 2023
4dd1649
Allow for nodes to be a tuple
APJansen Dec 11, 2023
6bd6466
Move dropout, per_flavour check to checks
APJansen Dec 11, 2023
2cd9e52
Clarify layer type check
APJansen Dec 14, 2023
1ae1b84
Clarify naming in nn_generator
APJansen Dec 14, 2023
e7a7cb4
Remove initializer_name argument
APJansen Dec 14, 2023
07c1e7d
clarify comment
APJansen Dec 14, 2023
25b8308
Add comment on shared layers
APJansen Dec 14, 2023
692014b
Rewrite comprehension over replica seeds
APJansen Dec 14, 2023
903c75b
Add check on layer type
APJansen Dec 15, 2023
a8fcfd3
Merge prefactors into single layer
APJansen Oct 25, 2023
661d39a
Add replica dimension to preprocessing factor in test
APJansen Dec 5, 2023
6641253
Update preprocessing layer in vpinterface
APJansen Dec 5, 2023
633a5c4
Remove assigning of weight slices
APJansen Dec 13, 2023
019fd56
Simplify loading weights from file
APJansen Jan 8, 2024
3c7607e
Update regression data
APJansen Jan 8, 2024
6d8d6b2
Always return a single NNs model for all replicas, adjust weight gett…
APJansen Jan 9, 2024
4e18e92
Revert "Update regression data"
APJansen Jan 10, 2024
47d5cab
Change structure of regression weights
APJansen Jan 10, 2024
e3ef5f0
Remove now unused postfix
APJansen Jan 10, 2024
3ddf629
Update regression weights
APJansen Jan 10, 2024
023a8d2
Give explicit shape to scatter_to_one
APJansen Jan 10, 2024
25de95c
Update developing weights structure
APJansen Jan 11, 2024
2240560
fix prefix typo
APJansen Jan 22, 2024
e38f359
add double ticks
APJansen Jan 22, 2024
a40c216
rename layer name constants
APJansen Jan 22, 2024
c095381
use constants defined in metamodel.py for layer names
APJansen Jan 22, 2024
115e30c
Explain need for is_stacked_single_replicas
APJansen Jan 22, 2024
8f4a596
shorten line
APJansen Jan 22, 2024
3568bbb
fix constant loading
APJansen Jan 22, 2024
7de2b59
Simplify get_replica_weights
APJansen Jan 22, 2024
3fba42f
NNs -> all_NNs
APJansen Jan 22, 2024
a1c46ec
Clarify get_layer_replica_weights
APJansen Jan 22, 2024
3557ee6
Clarify set_layer_replica_weights
APJansen Jan 22, 2024
2001e63
Remove comment about python 3.11
APJansen Jan 22, 2024
be29387
Merge prefactors into single layer
APJansen Oct 25, 2023
92d21a3
Add MultiDense layer
APJansen Jan 8, 2024
efcfc6a
Add MultiDense layer improvements
APJansen Jan 9, 2024
0e3dd2c
Recreate initializer per replica to make sure seed is properly set
APJansen Jan 9, 2024
4e81bf6
Add tolerences to test
APJansen Jan 9, 2024
08a3a18
Add multi_dense path in generate_nn
APJansen Jan 9, 2024
a84d8ac
Add MultiDropout
APJansen Jan 11, 2024
ef336e1
Replace old dense layer everywhere
APJansen Jan 11, 2024
737ed2b
Remove MultiDropout, not necessary
APJansen Jan 11, 2024
a5c6b11
Update developing weights structure
APJansen Jan 11, 2024
93a5636
Remove MultiDropout once more
APJansen Jan 11, 2024
c07a49b
Fix naming inconsistency wrt parallel-prefactor
APJansen Jan 23, 2024
c77add6
Merge prefactors into single layer
APJansen Oct 25, 2023
304b400
Add MultiDense layer improvements
APJansen Jan 9, 2024
a658304
Replace old dense layer everywhere
APJansen Jan 11, 2024
0fa674d
Avoid TensorFlow overhead by making one step a batch rather than an
APJansen Aug 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified n3fit/runcards/examples/developing_weights.h5
Binary file not shown.
25 changes: 14 additions & 11 deletions n3fit/src/n3fit/backends/__init__.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
from n3fit.backends.keras_backend.internal_state import (
set_initial_state,
clear_backend_state,
set_eager
)
from n3fit.backends.keras_backend import callbacks, constraints, operations
from n3fit.backends.keras_backend.MetaLayer import MetaLayer
from n3fit.backends.keras_backend.MetaModel import MetaModel
from n3fit.backends.keras_backend.MetaModel import (
NN_LAYER_ALL_REPLICAS,
NN_PREFIX,
PREPROCESSING_LAYER_ALL_REPLICAS,
MetaModel,
)
from n3fit.backends.keras_backend.base_layers import (
Concatenate,
Input,
concatenate,
Lambda,
base_layer_selector,
concatenate,
regularizer_selector,
Concatenate,
)
from n3fit.backends.keras_backend import operations
from n3fit.backends.keras_backend import constraints
from n3fit.backends.keras_backend import callbacks
from n3fit.backends.keras_backend.internal_state import (
clear_backend_state,
set_eager,
set_initial_state,
)

print("Using Keras backend")
169 changes: 116 additions & 53 deletions n3fit/src/n3fit/backends/keras_backend/MetaModel.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
backend-dependent calls.
"""

import logging
import re

import h5py
Expand All @@ -16,6 +17,8 @@

import n3fit.backends.keras_backend.operations as op

log = logging.getLogger(__name__)

# Check the TF version to check if legacy-mode is needed (TF < 2.2)
tf_version = tf.__version__.split(".")
if int(tf_version[0]) == 2 and int(tf_version[1]) < 2:
Expand Down Expand Up @@ -46,7 +49,8 @@
}

NN_PREFIX = "NN"
PREPROCESSING_PREFIX = "preprocessing_factor"
NN_LAYER_ALL_REPLICAS = "all_NNs"
PREPROCESSING_LAYER_ALL_REPLICAS = "preprocessing_factor"

# Some keys need to work for everyone
for k, v in optimizers.items():
Expand Down Expand Up @@ -156,7 +160,7 @@ def perform_fit(self, x=None, y=None, epochs=1, **kwargs):
of the model (the loss functions) to the partial losses.

If the model was compiled with input and output data, they will not be passed through.
In this case by default the number of `epochs` will be set to 1
In this case by default the number of ``epochs`` will be set to 1

ex:
{'loss': [100], 'dataset_a_loss1' : [67], 'dataset_2_loss': [33]}
Expand All @@ -169,10 +173,36 @@ def perform_fit(self, x=None, y=None, epochs=1, **kwargs):
x_params = self._parse_input(x)
if y is None:
y = self.target_tensors
history = super().fit(x=x_params, y=y, epochs=epochs, **kwargs)

# Avoids Tensorflow overhead that happens at every epoch, by putting multiple steps in an epoch
steps_per_epoch = self.determine_steps_per_epoch(epochs)

for k, v in x_params.items():
x_params[k] = tf.repeat(v, steps_per_epoch, axis=0)
y = [tf.repeat(yi, steps_per_epoch, axis=0) for yi in y]

history = super().fit(
x=x_params, y=y, epochs=epochs // steps_per_epoch, batch_size=1, **kwargs
)
loss_dict = history.history
return loss_dict

def determine_steps_per_epoch(self, epochs):
num_replicas = self.output_shape[0][0]
# in this case we're most likely running on the CPU and this is not worth it
if num_replicas == 1:
return 1

# On the GPU, run with
for divisor in [10, 100]:
if epochs % divisor != 0:
steps_per_epoch = divisor // 10
log.warning(
f"Epochs {epochs} not divisible by {divisor}, using {steps_per_epoch} steps per epoch"
)
return steps_per_epoch
return 100

def predict(self, x=None, **kwargs):
"""Call super().predict with the right input arguments"""
x = self._parse_input(x)
Expand All @@ -198,10 +228,15 @@ def compute_losses(self):
out_names = [f"{i}_loss" for i in self.output_names]
out_names.insert(0, "loss")

inputs = self._parse_input(None)
# get rid of the repetitions by number of epochs made in perform_fit
for k, v in inputs.items():
inputs[k] = v[:1]

# Compile a evaluation function
@tf.function
def losses_fun():
predictions = self(self._parse_input(None))
predictions = self(inputs)
# If we only have one dataset the output changes
if len(out_names) == 2:
predictions = [predictions]
Expand All @@ -228,7 +263,7 @@ def compile(
):
"""
Compile the model given an optimizer and a list of loss functions.
The optimizer must be one of those implemented in the `optimizer` attribute of this class.
The optimizer must be one of those implemented in the ``optimizer`` attribute of this class.

Options:
- A learning rate and a list of target outpout can be defined.
Expand Down Expand Up @@ -353,14 +388,10 @@ def get_replica_weights(self, i_replica):
dict
dictionary with the weights of the replica
"""
NN_weights = [
tf.Variable(w, name=w.name) for w in self.get_layer(f"{NN_PREFIX}_{i_replica}").weights
]
prepro_weights = [
tf.Variable(w, name=w.name)
for w in self.get_layer(f"{PREPROCESSING_PREFIX}_{i_replica}").weights
]
weights = {NN_PREFIX: NN_weights, PREPROCESSING_PREFIX: prepro_weights}
weights = {}
for layer_type in [NN_LAYER_ALL_REPLICAS, PREPROCESSING_LAYER_ALL_REPLICAS]:
layer = self.get_layer(layer_type)
weights[layer_type] = get_layer_replica_weights(layer, i_replica)

return weights

Expand All @@ -378,10 +409,9 @@ def set_replica_weights(self, weights, i_replica=0):
i_replica: int
the replica number to set, defaulting to 0
"""
self.get_layer(f"{NN_PREFIX}_{i_replica}").set_weights(weights[NN_PREFIX])
self.get_layer(f"{PREPROCESSING_PREFIX}_{i_replica}").set_weights(
weights[PREPROCESSING_PREFIX]
)
for layer_type in [NN_LAYER_ALL_REPLICAS, PREPROCESSING_LAYER_ALL_REPLICAS]:
layer = self.get_layer(layer_type)
set_layer_replica_weights(layer=layer, weights=weights[layer_type], i_replica=i_replica)

def split_replicas(self):
"""
Expand Down Expand Up @@ -411,51 +441,84 @@ def load_identical_replicas(self, model_file):
"""
From a single replica model, load the same weights into all replicas.
"""
weights = self._format_weights_from_file(model_file)
single_replica = self.single_replica_generator()
single_replica.load_weights(model_file)
weights = single_replica.get_replica_weights(0)

for i_replica in range(self.num_replicas):
self.set_replica_weights(weights, i_replica)

def _format_weights_from_file(self, model_file):
"""Read weights from a .h5 file and format into a dictionary of tf.Variables"""
weights = {}

with h5py.File(model_file, 'r') as f:
# look at layers of the form NN_i and take the lowest i
i_replica = 0
while f"{NN_PREFIX}_{i_replica}" not in f:
i_replica += 1
def is_stacked_single_replicas(layer):
"""
Check if the layer consists of stacked single replicas (Only happens for NN layers),
to determine how to extract single replica weights.

weights[NN_PREFIX] = self._extract_weights(
f[f"{NN_PREFIX}_{i_replica}"], NN_PREFIX, i_replica
)
weights[PREPROCESSING_PREFIX] = self._extract_weights(
f[f"{PREPROCESSING_PREFIX}_{i_replica}"], PREPROCESSING_PREFIX, i_replica
)
Parameters
----------
layer: MetaLayer
the layer to check

return weights
Returns
-------
bool
True if the layer consists of stacked single replicas
"""
if not isinstance(layer, MetaModel):
return False
return f"{NN_PREFIX}_0" in [sublayer.name for sublayer in layer.layers]

def _extract_weights(self, h5_group, weights_key, i_replica):
"""Extract weights from a h5py group, turning them into Tensorflow variables"""
weights = []

def append_weights(name, node):
if isinstance(node, h5py.Dataset):
weight_name = node.name.split("/", 2)[-1]
weight_name = weight_name.replace(f"{NN_PREFIX}_{i_replica}", f"{NN_PREFIX}_0")
weight_name = weight_name.replace(
f"{PREPROCESSING_PREFIX}_{i_replica}", f"{PREPROCESSING_PREFIX}_0"
)
weights.append(tf.Variable(node[()], name=weight_name))
def get_layer_replica_weights(layer, i_replica: int):
"""
Get the weights for the given single replica ``i_replica``,
from a ``layer`` that has weights for all replicas.

Note that the layer could be a complete a complete NN with many separated sub_layers
each of which containing weights for all replicas together.
This functions separates the per-replica weights and returns the list of weight as if the
input ``layer`` were made of _only_ replica ``i_replica``.
Parameters
----------
layer: MetaLayer
the layer to get the weights from
i_replica: int
the replica number

Returns
-------
weights: list
list of weights for the replica
"""
if is_stacked_single_replicas(layer):
weights = layer.get_layer(f"{NN_PREFIX}_{i_replica}").weights
else:
weights = [tf.Variable(w[i_replica : i_replica + 1], name=w.name) for w in layer.weights]

h5_group.visititems(append_weights)
return weights


def set_layer_replica_weights(layer, weights, i_replica: int):
"""
Set the weights for the given single replica ``i_replica``.
When the input ``layer`` contains weights for many replicas, ensures that
only those corresponding to replica ``i_replica`` are updated.

Parameters
----------
layer: MetaLayer
the layer to set the weights for
weights: list
list of weights for the replica
i_replica: int
the replica number
"""
if is_stacked_single_replicas(layer):
layer.get_layer(f"{NN_PREFIX}_{i_replica}").set_weights(weights)
return

# have to put them in the same order
weights_ordered = []
weights_model_order = [w.name for w in self.get_replica_weights(0)[weights_key]]
for w in weights_model_order:
for w_h5 in weights:
if w_h5.name == w:
weights_ordered.append(w_h5)
full_weights = [w.numpy() for w in layer.weights]
for w_old, w_new in zip(full_weights, weights):
w_old[i_replica : i_replica + 1] = w_new

return weights_ordered
layer.set_weights(full_weights)
Loading
Loading