diff --git a/README.model_signing.md b/README.model_signing.md index f7d7ef66..82624239 100644 --- a/README.model_signing.md +++ b/README.model_signing.md @@ -21,63 +21,109 @@ monitor](https://github.com/sigstore/rekor-monitor) that runs on GitHub Actions. ![Signing models with Sigstore](docs/images/sigstore-model-diagram.png) -## Usage +## Model Signing CLI -You will need to install a few prerequisites to be able to run all of the -examples below: +The `sign.py` and `verify.py` scripts aim to provide the necessary functionality +to sign and verify ML models. For signing and verification the following methods +are supported: + +* Bring your own key pair +* Bring your own PKI +* Skip signing (only hash and create a bundle) + +The signing part creates a [sigstore bundle](https://github.com/sigstore/protobuf-specs/blob/main/protos/sigstore_bundle.proto) +protobuf that is stored as in JSON format. The bundle contains the verification +material necessary to check the payload and a payload as a [DSSE envelope](https://github.com/sigstore/protobuf-specs/blob/main/protos/envelope.proto). +Further the DSSE envelope contains an in-toto statment and the signature over +that statement. The signature format and how the the signature is computed can +be seen [here](https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md). + +Finally, the statement itself contains subjects which are a list of (file path, +digest) pairs a predicate type set to `model_signing/v1/model`and a dictionary +f predicates. The idea is to use the predicates to store (and therefor sign) model +card information in the future. + +The verification part reads the sigstore bundle file and firstly verifies that the +signature is valid and secondly compute the model's file hashes again to compare +against the signed ones. + +**Note**: The signature is stored as `./model.sig` by default and can be adjusted +by setting the `--sig_out` flag. + +### Usage + +There are two scripts one can be used to create and sign a bundle and the other to +verify a bundle. Furthermore, the functionality can be used directly from other +Python tools. The `sign.py` and `verify.py` scripts can be used as canonical +how-to examples. + +The easiest way to use the scripts directly is from a virtual environment: ```bash -sudo apt install git git-lfs python3-venv python3-pip unzip -git lfs install +$ python3 -m venv .venv +$ source .venv/bin/activate +(.venv) $ pip install -r install/requirements.in ``` -After this, you can clone the repository, create a Python virtual environment -and install the dependencies needed by the project: +## Sign ```bash -git clone git@github.com:sigstore/model-transparency.git -cd model-transparency/model_signing -python3 -m venv test_env -source test_env/bin/activate -os=Linux # Supported: Linux, Windows, Darwin. -python3 -m pip install --require-hashes -r "install/requirements_${os}".txt +(.venv) $ python3 sign.py --model_path ${MODEL_PATH} --sig_out ${OUTPUT_PATH} --method {private-key, pki} {additional parameters depending on method} ``` -After this point, you can use the project to sign and verify models and -checkpoints. A help message with all arguments can be obtained by passing `-h` -argument, either to the main driver or to the two subcommands: +## Verify ```bash -python3 main.py -h -python3 main.py sign -h -python3 main.py verify -h +(.venv) $ python3 verify.py --model_path ${MODEL_PATH} --method {private-key, pki} {additional parameters depending on method} ``` -Signing a model requires passing an argument for the path to the model. This can -be a path to a file or a directory (for large models, or model formats such as -`SavedModel` which are stored as a directory of related files): +### Examples + +#### Bring Your Own Key ```bash -path=path/to/model -python3 main.py sign --path "${path}" +$ MODEL_PATH='/path/to/your/model' +$ openssl ecparam -name secp256k1 -genkey -noout -out ec-secp256k1-priv-key.pem +$ openssl ec -in ec-secp256k1-priv-key.pem -pubout > ec-secp256k1-pub-key.pem +$ source .venv/bin/activate +# SIGN +(.venv) $ python3 sign_model.py --model_path ${MODEL_PATH} --method private-key --private-key ec-secp256k1-priv-key.pem +... +#VERIFY +(.venv) $ python3 verify_model.py --model_path ${MODEL_PATH} --method private-key --public-key ec-secp256k1-pub-key.pem +... ``` -The sign process will start an OIDC workflow to generate a short lived -certificate based on an identity provider. This will be relevant when verifying -the signature, as shown below. +#### Bring your own PKI +In order to sign a model with your own PKI you need to create the following information: -**Note**: The signature is stored as `.sig` for a model serialized as a -single file, and `/model.sig` for a model in a folder-based format. + - The signing certificate + - The elliptic curve private key matching the signing certificate's public key + - Optionally, the certificate chain used for verification. -For verification, we need to pass both the path to the model and identity -related arguments: ```bash -python3 main.py verify --path "${path}" \ - --identity-provider https://accounts.google.com \ - --identity myemail@gmail.com +$ MODEL_PATH='/path/to/your/model' +$ CERT_CHAIN='/path/to/cert_chain' +$ SIGNING_CERT='/path/to/signing_certificate' +$ PRIVATE_KEY='/path/to/private_key' +# SIGN +(.venv) $ python3 sign_model.py --model_path ${MODEL_PATH} \ + --method pki \ + --private-key ${PRIVATE_KEY} \ + --signing_cert ${SIGNING_CERT} \ + [--cert_chain ${CERT_CHAIN}] +... +#VERIFY +$ ROOT_CERTS='/path/to/root/certs' +(.venv) $ python3 verify_model.py --model_path ${MODEL_PATH} \ + --method pki \ + --root_certs ${ROOT_CERTS} +... ``` +## Sigstore ID providers + For developers signing models, there are three identity providers that can be used at the moment: diff --git a/src/model_signing/model.py b/src/model_signing/model.py new file mode 100644 index 00000000..456ddedd --- /dev/null +++ b/src/model_signing/model.py @@ -0,0 +1,79 @@ +# Copyright 2024 The Sigstore Authors +# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pathlib +from typing import Callable, Iterable, TypeAlias + +from model_signing.manifest import manifest +from model_signing.serialization import serialization +from model_signing.signature import verifying +from model_signing.signing import signing + + +PayloadGeneratorFunc: TypeAlias = Callable[ + [manifest.Manifest], signing.SigningPayload +] + + +def sign( + model_path: pathlib.Path, + signer: signing.Signer, + payload_generator: PayloadGeneratorFunc, + serializer: serialization.Serializer, + ignore_paths: Iterable[pathlib.Path] = frozenset(), +) -> signing.Signature: + """Provides a wrapper function for the steps necessary to sign a model. + + Args: + model_path: the model to be signed. + signer: the signer to be used. + payload_generator: funtion to generate the manifest. + serializer: the serializer to be used for the model. + ignore_paths: paths that should be ignored during serialization. + Defaults to an empty set. + + Returns: + The model's signature. + """ + manifest = serializer.serialize(model_path, ignore_paths=ignore_paths) + payload = payload_generator(manifest) + sig = signer.sign(payload) + return sig + + +def verify( + sig: signing.Signature, + verifier: signing.Verifier, + model_path: pathlib.Path, + serializer: serialization.Serializer, + ignore_paths: Iterable[pathlib.Path] = frozenset(), +): + """Provides a simple wrapper to verify models. + + Args: + sig: the signature to be verified. + verifier: the verifier to verify the signature. + model_path: the path to the model to compare manifests. + serializer: the serializer used to generate the local manifest. + ignore_paths: paths that should be ignored during serialization. + Defaults to an empty set. + + Raises: + verifying.VerificationError: on any verification error. + """ + peer_manifest = verifier.verify(sig) + local_manifest = serializer.serialize(model_path, ignore_paths=ignore_paths) + if peer_manifest != local_manifest: + raise verifying.VerificationError("the manifests do not match") diff --git a/src/model_signing/signing/in_toto_signature.py b/src/model_signing/signing/in_toto_signature.py new file mode 100644 index 00000000..83e3bf01 --- /dev/null +++ b/src/model_signing/signing/in_toto_signature.py @@ -0,0 +1,71 @@ +# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Support for signing intoto payloads into sigstore bundles.""" + +import json +import pathlib +from typing import Self + +from sigstore_protobuf_specs.dev.sigstore.bundle import v1 as bundle_pb +from typing_extensions import override + +from model_signing.manifest import manifest as manifest_module +from model_signing.signature import signing as signature_signing +from model_signing.signature import verifying as signature_verifying +from model_signing.signing import in_toto +from model_signing.signing import signing + + +class IntotoSignature(signing.Signature): + def __init__(self, bundle: bundle_pb.Bundle): + self._bundle = bundle + + @override + def write(self, path: pathlib.Path) -> None: + path.write_text(self._bundle.to_json()) + + @classmethod + @override + def read(cls, path: pathlib.Path) -> Self: + bundle = bundle_pb.Bundle().from_json(path.read_text()) + return cls(bundle) + + def to_manifest(self) -> manifest_module.Manifest: + payload = json.loads(self._bundle.dsse_envelope.payload) + return in_toto.IntotoPayload.manifest_from_payload(payload) + + +class IntotoSigner(signing.Signer): + def __init__(self, sig_signer: signature_signing.Signer): + self._sig_signer = sig_signer + + @override + def sign(self, payload: signing.SigningPayload) -> IntotoSignature: + if not isinstance(payload, in_toto.IntotoPayload): + raise TypeError("only IntotoPayloads are supported") + bundle = self._sig_signer.sign(payload.statement) + return IntotoSignature(bundle) + + +class IntotoVerifier(signing.Verifier): + def __init__(self, sig_verifier: signature_verifying.Verifier): + self._sig_verifier = sig_verifier + + @override + def verify(self, signature: signing.Signature) -> manifest_module.Manifest: + if not isinstance(signature, IntotoSignature): + raise TypeError("only IntotoSignature is supported") + self._sig_verifier.verify(signature._bundle) + return signature.to_manifest() diff --git a/src/sign.py b/src/sign.py new file mode 100644 index 00000000..ec1ae946 --- /dev/null +++ b/src/sign.py @@ -0,0 +1,168 @@ +# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Script to sign models.""" + +import argparse +import logging +import pathlib + +from model_signing import model +from model_signing.hashing import file +from model_signing.hashing import memory +from model_signing.serialization import serialize_by_file +from model_signing.signature import fake +from model_signing.signature import key +from model_signing.signature import pki +from model_signing.signature import signing +from model_signing.signing import in_toto +from model_signing.signing import in_toto_signature + + +log = logging.getLogger(__name__) + + +def _arguments() -> argparse.Namespace: + parser = argparse.ArgumentParser("Script to sign models") + parser.add_argument( + "--model_path", + help="path to the model to sign", + required=True, + type=pathlib.Path, + dest="model_path", + ) + parser.add_argument( + "--sig_out", + help="the output file, it defaults ./model.sig", + required=False, + type=pathlib.Path, + default=pathlib.Path("./model.sig"), + dest="sig_out", + ) + + method_cmd = parser.add_subparsers( + required=True, + dest="method", + help="method to sign the model: [pki, private-key, skip]", + ) + # PKI + pki = method_cmd.add_parser("pki") + pki.add_argument( + "--cert_chain", + help="paths to pem encoded certificate files or a single file" + + "containing a chain", + required=False, + type=list[str], + default=[], + nargs="+", + dest="cert_chain_path", + ) + pki.add_argument( + "--signing_cert", + help="the pem encoded signing cert", + required=True, + type=pathlib.Path, + dest="signing_cert_path", + ) + pki.add_argument( + "--private_key", + help="the path to the private key PEM file", + required=True, + type=pathlib.Path, + dest="key_path", + ) + # private key + p_key = method_cmd.add_parser("private-key") + p_key.add_argument( + "--private_key", + help="the path to the private key PEM file", + required=True, + type=pathlib.Path, + dest="key_path", + ) + # skip + method_cmd.add_parser("skip") + + return parser.parse_args() + + +def _get_payload_signer(args: argparse.Namespace) -> signing.Signer: + if args.method == "private-key": + _check_private_key_options(args) + return key.ECKeySigner.from_path(private_key_path=args.key_path) + elif args.method == "pki": + _check_pki_options(args) + return pki.PKISigner.from_path( + args.key_path, args.signing_cert_path, args.cert_chain_path + ) + elif args.method == "skip": + return fake.FakeSigner() + else: + log.error(f"unsupported signing method {args.method}") + log.error('supported methods: ["pki", "private-key", "skip"]') + exit(-1) + + +def _check_private_key_options(args: argparse.Namespace): + if args.key_path == "": + log.error("--private_key must be set to a valid private key PEM file") + exit() + + +def _check_pki_options(args: argparse.Namespace): + _check_private_key_options(args) + if args.signing_cert_path == "": + log.error( + ( + "--signing_cert must be set to a valid ", + "PEM encoded signing certificate", + ) + ) + exit() + if args.cert_chain_path == "": + log.warning("No certificate chain provided") + + +def main(): + logging.basicConfig(level=logging.INFO) + args = _arguments() + + log.info(f"Creating signer for {args.method}") + payload_signer = _get_payload_signer(args) + log.info(f"Signing model at {args.model_path}") + + def hasher_factory(file_path: pathlib.Path) -> file.FileHasher: + return file.SimpleFileHasher( + file=file_path, content_hasher=memory.SHA256() + ) + + serializer = serialize_by_file.ManifestSerializer( + file_hasher_factory=hasher_factory + ) + + intoto_signer = in_toto_signature.IntotoSigner(payload_signer) + sig = model.sign( + model_path=args.model_path, + signer=intoto_signer, + payload_generator=in_toto.DigestsIntotoPayload.from_manifest, + serializer=serializer, + ignore_paths=[args.sig_out], + ) + + log.info(f'Storing signature at "{args.sig_out}"') + sig.write(args.sig_out) + + +if __name__ == "__main__": + main() diff --git a/src/verify.py b/src/verify.py new file mode 100644 index 00000000..031354bc --- /dev/null +++ b/src/verify.py @@ -0,0 +1,142 @@ +# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""This script can be used to verify model signatures.""" + +import argparse +import logging +import pathlib + +from model_signing import model +from model_signing.hashing import file +from model_signing.hashing import memory +from model_signing.serialization import serialize_by_file +from model_signing.signature import fake +from model_signing.signature import key +from model_signing.signature import pki +from model_signing.signature import verifying +from model_signing.signing import in_toto_signature + + +log = logging.getLogger(__name__) + + +def _arguments() -> argparse.Namespace: + parser = argparse.ArgumentParser("Script to verify models") + parser.add_argument( + "--sig_path", + help="the path to the signature", + required=True, + type=pathlib.Path, + dest="sig_path", + ) + parser.add_argument( + "--model_path", + help="the path to the model's base folder", + type=pathlib.Path, + dest="model_path", + ) + + method_cmd = parser.add_subparsers( + required=True, + dest="method", + help="method to verify the model: [pki, private-key, skip]", + ) + # pki subcommand + pki = method_cmd.add_parser("pki") + pki.add_argument( + "--root_certs", + help="paths to PEM encoded certificate files or a single file" + + "used as the root of trust", + required=False, + type=list[str], + default=[], + dest="root_certs", + ) + # private key subcommand + p_key = method_cmd.add_parser("private-key") + p_key.add_argument( + "--public_key", + help="the path to the public key used for verification", + required=True, + type=pathlib.Path, + dest="key", + ) + + method_cmd.add_parser("skip") + + return parser.parse_args() + + +def _check_private_key_flags(args: argparse.Namespace): + if args.key == "": + log.error("--public_key must be defined") + exit() + + +def _check_pki_flags(args: argparse.Namespace): + if not args.root_certs: + log.warning("no root of trust is set using system default") + + +def main(): + logging.basicConfig(level=logging.INFO) + args = _arguments() + + verifier: verifying.Verifier + log.info(f"Creating verifier for {args.method}") + if args.method == "private-key": + _check_private_key_flags(args) + verifier = key.ECKeyVerifier.from_path(args.key) + elif args.method == "pki": + _check_pki_flags(args) + verifier = pki.PKIVerifier.from_paths(args.root_certs) + elif args.method == "skip": + verifier = fake.FakeVerifier() + else: + log.error(f"unsupported verification method {args.method}") + log.error('supported methods: ["pki", "private-key", "skip"]') + exit(-1) + + log.info(f"Verifying model signature from {args.sig_path}") + + sig = in_toto_signature.IntotoSignature.read(args.sig_path) + + def hasher_factory(file_path: pathlib.Path) -> file.FileHasher: + return file.SimpleFileHasher( + file=file_path, content_hasher=memory.SHA256() + ) + + serializer = serialize_by_file.ManifestSerializer( + file_hasher_factory=hasher_factory + ) + + intoto_verifier = in_toto_signature.IntotoVerifier(verifier) + + try: + model.verify( + sig=sig, + verifier=intoto_verifier, + model_path=args.model_path, + serializer=serializer, + ignore_paths=[args.sig_path], + ) + except verifying.VerificationError as err: + log.error(f"verification failed: {err}") + + log.info("all checks passed") + + +if __name__ == "__main__": + main() diff --git a/tests/signing/in_toto_signature_test.py b/tests/signing/in_toto_signature_test.py new file mode 100644 index 00000000..86326f8f --- /dev/null +++ b/tests/signing/in_toto_signature_test.py @@ -0,0 +1,99 @@ +# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pathlib + +from model_signing.hashing import file +from model_signing.hashing import memory +from model_signing.serialization import serialize_by_file +from model_signing.serialization import serialize_by_file_shard +from model_signing.signature import fake +from model_signing.signing import in_toto +from model_signing.signing import in_toto_signature + + +class TestIntotoSignature: + def _shard_hasher_factory( + self, path: pathlib.Path, start: int, end: int + ) -> file.ShardedFileHasher: + return file.ShardedFileHasher( + path, memory.SHA256(), start=start, end=end + ) + + def _hasher_factory(self, path: pathlib.Path) -> file.FileHasher: + return file.SimpleFileHasher(path, memory.SHA256()) + + def test_sign_and_verify_sharded_manifest(self, sample_model_folder): + signer = in_toto_signature.IntotoSigner(fake.FakeSigner()) + verifier = in_toto_signature.IntotoVerifier(fake.FakeVerifier()) + shard_serializer = serialize_by_file_shard.ManifestSerializer( + self._shard_hasher_factory, allow_symlinks=True + ) + shard_manifest = shard_serializer.serialize(sample_model_folder) + + payload = in_toto.ShardDigestsIntotoPayload.from_manifest( + shard_manifest + ) + sig = signer.sign(payload) + verifier.verify(sig) + manifest = sig.to_manifest() + assert shard_manifest == manifest + + def test_sign_and_verify_digest_sharded_manifest(self, sample_model_folder): + signer = in_toto_signature.IntotoSigner(fake.FakeSigner()) + verifier = in_toto_signature.IntotoVerifier(fake.FakeVerifier()) + shard_serializer = serialize_by_file_shard.ManifestSerializer( + self._shard_hasher_factory, allow_symlinks=True + ) + shard_manifest = shard_serializer.serialize(sample_model_folder) + + payload = in_toto.DigestOfShardDigestsIntotoPayload.from_manifest( + shard_manifest + ) + sig = signer.sign(payload) + verifier.verify(sig) + manifest = sig.to_manifest() + assert shard_manifest == manifest + + def test_sign_and_verify_digest_of_digest_manifest( + self, sample_model_folder + ): + signer = in_toto_signature.IntotoSigner(fake.FakeSigner()) + verifier = in_toto_signature.IntotoVerifier(fake.FakeVerifier()) + file_serializer = serialize_by_file.ManifestSerializer( + self._hasher_factory, allow_symlinks=True + ) + file_manifest = file_serializer.serialize(sample_model_folder) + + payload = in_toto.DigestOfDigestsIntotoPayload.from_manifest( + file_manifest + ) + sig = signer.sign(payload) + verifier.verify(sig) + manifest = sig.to_manifest() + assert file_manifest == manifest + + def test_sign_and_verify_digest_manifest(self, sample_model_folder): + signer = in_toto_signature.IntotoSigner(fake.FakeSigner()) + verifier = in_toto_signature.IntotoVerifier(fake.FakeVerifier()) + file_serializer = serialize_by_file.ManifestSerializer( + self._hasher_factory, allow_symlinks=True + ) + file_manifest = file_serializer.serialize(sample_model_folder) + + payload = in_toto.DigestsIntotoPayload.from_manifest(file_manifest) + sig = signer.sign(payload) + verifier.verify(sig) + manifest = sig.to_manifest() + assert file_manifest == manifest