Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support SSM Parameter Store for remote_env #1344

Open
cmaggiulli opened this issue Aug 3, 2024 · 3 comments
Open

Feature: Support SSM Parameter Store for remote_env #1344

cmaggiulli opened this issue Aug 3, 2024 · 3 comments

Comments

@cmaggiulli
Copy link

Context

We have a suite of 15+ microservices using Django DRF and Zappa that make relatively heavy use of external configurations and settings ( for a microservice at least ). We cannot use the Lambda Environment Variables because of the character limits. We don't want to use an S3 JSON file because we are unable to segregate access and utilize KMS against specific settings.

We decided to use SSM Parameter Store since it resolves the aforementioned issues. However, we had to roll our own implementation. I'm getting tired of supporting our own implementation and would instead like to port it to this library. I will raise a pull request, and if it's decide this feature shouldn't be added the PR can be closed. Otherwise I can make whatever design or impl changes are requested as part of the merge process

Outline of Proposed Changes

  1. modify remote_arn to accept an SSM Parameter Store ARN ( and also a Global S3 ARN so I can generalize my code )
  2. A utility function for converting s3 scheme uri to arn and visa versa ( for generalization... very small ) in utilities.py
  3. A utility function for extracting service from arn in utilities.py
  4. A util funct for generalizing the parsing of remote resource identifiers ( ARN and S3 simple scheme ) ib utilities.py
  5. Small change to LambdaHandler constructor and load_remote_settings
  6. Similar change's in cli.py
  7. modification in core to create relevant ssm client and any policy changes needed ( probably the biggest change... still should only be a few line of code )
  8. Test changes/additions
  9. README/CONTRIBUTE/other doc changes

Miscellaneous Code Change Samples

This is obviously pseudo-code but something like:

def parse_s3_url(url: str) -> Tuple[str, str]:
    """
    Parses S3 URL.

    Returns:
        Tuple[str, str]: bucket (domain) and file (full path).
    """
    if not url.startswith('s3://'):
        return '', ''
    
    result = urlparse(url)
    bucket = result.netloc
    path = result.path.lstrip('/')
    return bucket, path

def set_nested_dict(d: Dict[str, Any], keys: list, value: Any):
    """
    Set a value in a nested dictionary.

    Args:
        d (Dict[str, Any]): The dictionary to update.
        keys (list): The list of keys representing the path to the value.
        value (Any): The value to set.
    """
    for key in keys[:-1]:
        d = d.setdefault(key, {})
    d[keys[-1]] = value

def ssm_parameters_to_dict(parameters: list) -> Dict[str, Any]:
    """
    Convert a list of SSM parameters to a nested dictionary.

    Args:
        parameters (list): The list of SSM parameters.

    Returns:
        Dict[str, Any]: A nested dictionary representing the parameters.
    """
    settings_dict = {}
    for parameter in parameters:
        keys = parameter['Name'].split('/')[1:]  # Split and remove the first empty element
        set_nested_dict(settings_dict, keys, parameter['Value'])
    return settings_dict

class LambdaHandler:
    """
    Singleton for avoiding duplicate setup.
    """

    __instance = None
    settings = None
    settings_name = None
    session = None

    def __new__(cls, settings_name="zappa_settings", session=None):
        """Singleton instance to avoid repeat setup"""
        if LambdaHandler.__instance is None:
            print("Instancing..")
            LambdaHandler.__instance = object.__new__(cls)
        return LambdaHandler.__instance

    def __init__(self, settings_name="zappa_settings", session=None):
        if not self.settings:
            self.settings = importlib.import_module(settings_name)
            self.settings_name = settings_name
            self.session = session

            if self.settings.LOG_LEVEL:
                level = logging.getLevelName(self.settings.LOG_LEVEL)
                logger.setLevel(level)

            remote_env = getattr(self.settings, "REMOTE_ENV", None)
            self.load_remote_settings(remote_env)

    def load_remote_settings(self, remote_env: str, kms_key_id: Optional[str] = None):
        """
        Attempt to read settings from either S3 or SSM Parameter Store.
        Adds each key->value pair as environment variables.

        Args:
            remote_env (str): The identifier of the resource containing the settings.
            kms_key_id (Optional[str]): The KMS key ID to use for decryption (optional).
        """
        boto_session = self.session or boto3.Session()

        # Attempt to parse as S3 URL first
        bucket, key = parse_s3_url(remote_env)
        if bucket and key:
            self._load_from_s3(boto_session, bucket, key)
        else:
            # Treat it as an SSM ARN if not an S3 URL
            self._load_from_ssm(boto_session, remote_env, kms_key_id)

    def _load_from_s3(self, session: boto3.Session, bucket: str, key: str):
        s3 = session.resource('s3')
        try:
            remote_env_object = s3.Object(bucket, key).get()
            content = remote_env_object['Body'].read().decode('utf-8')
            settings_dict = json.loads(content)
            self._set_env_variables(settings_dict)
        except Exception as e:
            print('Could not load remote settings file from S3.', e)

    def _load_from_ssm(self, session: boto3.Session, parameter_name: str, kms_key_id: Optional[str]):
        ssm = session.client('ssm')
        try:
            params = {
                'Path': parameter_name,
                'Recursive': True,
            }
            if kms_key_id:
                params.update({'WithDecryption': True, 'KeyId': kms_key_id})

            parameters = ssm.get_parameters_by_path(**params)
            settings_dict = ssm_parameters_to_dict(parameters['Parameters'])
            self._set_env_variables(settings_dict)
        except Exception as e:
            print('Could not load remote settings from SSM Parameter Store.', e)

    def _set_env_variables(self, settings: Dict[str, Any], prefix=''):
        for key, value in settings.items():
            if isinstance(value, dict):
                self._set_env_variables(value, prefix=prefix + key + '_')
            else:
                self._set_env_variable(prefix + key, value)

    def _set_env_variable(self, key: str, value: str):
        if self.settings.LOG_LEVEL == "DEBUG":
            print(f'Adding {key} -> {value} to environment')
        # Environment variable keys can't be Unicode
        # https://github.com/Miserlou/Zappa/issues/604
        try:
            os.environ[str(key)] = value
        except Exception:
            if self.settings.LOG_LEVEL == "DEBUG":
                print("Environment variable keys must be non-unicode!")

General Code Quality

I was trying to avoid any proposals that arent backwards compatible but I do not like attempting to figure out whether the remove_env is a s3 URI or an ARN for S3 or SSM. This entire thing would be cleaner with aws-cdk because there is an ARN class.... but aws-cdk-lib is huge. Would anyone be opposed to a more robust remote_* params such taht if only remote_env is provided it expects a S3 URI, otherwise you can set remote_env_service='s3|ssm' and remove_env_identifier_type='uri|arn'? Is anyone actually using this feature with the s3 json file because theres no beautiful way to do this in an entirely backwards compatible way

@sean-abbott
Copy link

I'm not ready to help yet, but I would really love the functionality.

@brunokloss
Copy link

this is must have! creds in s3 are the worst solution ever!

@sean-abbott
Copy link

sean-abbott commented Sep 4, 2024

Also, just a note for anyone trying to stay in the free tier: The s3 solution will break your free tier s3 usage. You get 2000 requests per month. Do the math on a request every 4 minutes.

The s3 solution also prints your creds in your logs, so there's that too.

You're gonna wanna roll your own parameter store read. If/when I get around to it, I'll try and post any related code, including IaC if I write it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants