Dmap is a free and open-source toolkit to assess your data security posture in the cloud. It allows you to quickly find information about your data repositories from different cloud environments, and then perform data discovery and classification by scanning those repositories for sensitive data patterns.
Dmap provides a hosted web service, command line interface (CLI), and Go library API to scan cloud environments to discover data repositories, and scan data repositories to discover and classify sensitive data.
We define a data repository as a collection of data that is stored in a specific location in the cloud. For example, an Amazon RDS database, a Redshift cluster, or a DynamoDB table are all examples of Dmap data repositories. Think of it as a more generic term for a data store or database.
We also define a cloud environment as a collection of cloud resources that are managed by a specific cloud provider. For example, an AWS account is a cloud environment.
Scanning a cloud environment with Dmap will provide you with a list of data repositories that exist in that environment. Scanning a data repository with Dmap will provide you with all the fields in the data repository that contain sensitive data.
We use Open Policy Agent's (OPA) Rego API to define rules that can be used to classify sensitive data in data repositories and assign them appropriate labels (such as CCN, SSN etc). Dmap provides a set of predefined data labels that can be used to classify sensitive data in data repositories. You can also define your own data labels if desired. Each data label has a name, description, and a set of tags that can be used to group labels, e.g. "PII", "PCI", "HIPAA", etc.
The Dmap CLI can be used to scan data repositories to perform data discovery and classification. It will produce JSON-formatted output that lists all the data labels used for classification, as well as the fields in the data repository that were classified as containing sensitive data. For example:
$ dmap repo-scan \
--type postgres \
--host ... \
--port ... \
--user ... \
--password ...
{
"labels": [
{
"name": "ADDRESS",
"description": "Address",
"tags": [
"PII"
]
},
...
],
"classifications": [
{
"attributePath": [
"postgres",
"public",
"patients",
"address"
],
"labels": [
"ADDRESS"
]
},
...
]
}
Optionally, by providing the --repo-id
, --client-id
, and --client-secret
flags, the results can be sent to the Dmap web service for further analysis and
reporting.
Use the --help
flag to see all available commands and options, e.g.:
$ dmap --help
$ dmap repo-scan --help
It is recommended to pass secure values via environment variables, e.g.:
# Read password from stdin
$ read -rs PASSWORD
$ dmap repo-scan --password $PASSWORD # ... other flags ...
The Dmap CLI can be installed as a native binary, a Docker image, or directly from source. Each approach is described below.
Binary executables of the CLI are available for Linux, MacOS, and Windows platforms. The appropriate binary for your platform can be downloaded from the releases page, e.g.:
# Replace with the desired version, e.g. v0.1.0
VERSION="v0.1.0"
curl -OL "https://github.com/cyralinc/dmap/releases/download/${VERSION}/dmap_${VERSION}_darwin_amd64.zip"
unzip dmap_${VERSION}_darwin_amd64.zip
Optionally, put the binary in a location in your PATH
for easy use.
The SHA256 checksums for each release are provided in a file named
dmap_<version>_sha256sums.txt
. You can verify the integrity of the downloaded
binary by comparing its checksum to the one in the file. The checksums are also
signed with Cyral's GPG key (fingerprint
E8DBE6574C87BF0FED7FFC464D91812ADF732B74
), and you can verify the checksums
file, e.g.:
# Replace with the desired version, e.g. v0.1.0.
# Assuming the binary/binaries and checksums are in the same directory.
sha256sum -c dmap_<version>_sha256sums.txt
gpg --verify dmap_<version>_sha256sums.txt.sig dmap_<version>_sha256sums.txt
Docker images for the Dmap CLI are available on the public Cyral ECR. Tags for
each version of Dmap are released, as well as a latest
tag. The image can be
run as a container, e.g.:
# Optionally replace `latest` with the desired version, e.g. v0.1.0.
docker run --rm public.ecr.aws/cyral/dmap:latest repo-scan \
--type ... \
--database ... \
--host ... \
--port ... \
--user ... \
--password ...
Requires Go 1.21 or later.
# Replace <version> with the desired version, e.g. v0.1.0, or the branch, e.g. main.
go install github.com/cyralinc/dmap/cmd/dmap@<version>
The Dmap Go library provides APIs to scan cloud environments to discover data repositories in those environments, as well as scan individual data repositories for sensitive data.
To import the Dmap library into your project, use the go get
command below:
go get github.com/cyralinc/dmap
The Cloud environment scanning API currently supports scanning AWS environments, and the following data repository types from across AWS services including:
- Amazon RDS (MySQL, PostgreSQL, SQL Server, etc)
- RDS Clusters (Aurora, Multi-AZ Clusters)
- Redshift
- DynamoDB
- DocumentDB
The Dmap library requires a set of read-only AWS service permissions to perform and environment scan, so that it's able to find existing data repositories from these services. IAM credentials with permissions for the following actions are required:
rds:DescribeDBClusters
rds:DescribeDBInstances
rds:ListTagsForResource
redshift:DescribeClusters
dynamodb:DescribeTable
dynamodb:ListTables
dynamodb:ListTagsOfResource
Make sure to use proper AWS credentials that contain the permissions above.
To use Dmap to find information about your existing data repositories, follow the steps below:
-
Define the AWS credentials for the account to be scanned. This can be done through one of the following options:
- Using credentials defined through AWS environment variables.
- Using the
default
profile from the AWS credentials file. - Assuming an AWS IAM Role.
If you want to use an IAM Role, follow the instructions below on how to configure the
ScannerConfig
to assume an IAM role. For more details, see the AWS official guide on Specifying Credentials. -
Define the
ScannerConfig
.- Define the AWS regions to be scanned.
- (Optional) Define the
AssumeRoleConfig
parameters for the IAM Role to be assumed. The AWS default external configurations will be used instead if this is not defined.
-
Instantiate a new
AWSScanner
using theNewAWSScanner
function with theScannerConfig
defined. -
Use the
AWSScanner
to call theScan
method for scanning all the existing data repositories for the configuration provided.
Here's a code example of how to do that:
package main
import (
"context"
"fmt"
"github.com/cyralinc/dmap/aws"
)
func main() {
ctx := context.Background()
// Define the AWS regions to be scanned.
regions := []string{
"us-east-1",
"us-east-2",
"us-west-1",
"us-west-2",
}
// Define the scanner configuration.
scannerConfig := aws.ScannerConfig{
Regions: regions,
// Optionally define an AssumeRoleConfig if you want to use an IAM Role.
// Otherwise, the AWS default external configurations will be used instead.
// AssumeRole: &aws.AssumeRoleConfig{
// IAMRoleARN: "",
// ExternalID: "",
// },
}
// Create the AWS scanner.
scanner, err := aws.NewAWSScanner(ctx, scannerConfig)
if err != nil {
fmt.Printf("Error creating AWS scanner: %v\n", err)
}
// Run data repositories scan.
results, err := scanner.Scan(ctx)
if err != nil {
fmt.Printf("Scan errors: %v\n", err)
}
// Print number of repositories scanned.
fmt.Printf(
"Scanned %d repositories:\n",
len(results.Repositories),
)
// Print each repository found.
for repoId, repo := range results.Repositories {
fmt.Printf(
"Id: %s | Repo: %v\n\n",
repoId,
repo,
)
}
}
The main API used for scanning repositories is sql.Scanner
.
It is an implementation of the RepoScanner
interface.
The repository scanning API currently supports scanning the following SQL data repositories out of the box:
- MySQL
- PostgreSQL
- SQL Server
- Redshift
- Snowflake
- Oracle
- Denodo
Example usage:
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"github.com/cyralinc/dmap/sql"
)
func main() {
ctx := context.Background()
// Configure and instantiate the scanner.
cfg := sql.ScannerConfig{
RepoType: "postgres",
RepoConfig: sql.RepoConfig{
Host: "example.com",
Port: "5431",
User: "user",
Password: "password",
},
}
scanner, err := sql.NewScanner(ctx, cfg)
if err != nil {
log.Fatalf("error creating new scanner: %v", err)
}
// Scan the repository.
results, err := scanner.Scan(ctx)
if err != nil {
log.Fatalf("error scanning repository: %v", err)
}
// Print the results to stdout as JSON.
jsonResults, err := json.MarshalIndent(results, "", " ")
if err != nil {
log.Fatalf("error marshalling results: %v", err)
}
fmt.Println(string(jsonResults))
}
Additional repository types can be added by implementing the sql.Repository
interface and registering it in a sql.Registry
. See the
sql
package for more details.
The Dmap library allows you to define custom data labels for classifying sensitive data in data repositories. Each data label has a name, description, and a set of tags that can be used to group labels, e.g. "PII", "PCI", "HIPAA", etc.
Labels are defined as OPA Rego policies and are loaded at runtime by the
repository scanner. The metadata for the labels is defined in a labels.yaml
file. This can be passed to the scanner via the LabelsYamlFilename
field in
the ScannerConfig
struct, e.g.:
cfg := sql.ScannerConfig{
LabelsYamlFilename: "/path/to/labels.yaml",
// Other fields...
}
scanner, err := sql.NewScanner(context.Background(), cfg)
If using the Dmap CLI, the --label-yaml-file
flag can be used to specify the
path to the labels YAML file, e.g.:
$ dmap repo-scan \
--label-yaml-file "/path/to/labels.yaml" \
# Other flags...
See the labels
package for more details on how to
define and use data labels for classifying sensitive data. Additionally, see the
labels.yaml
file for an example of the
file format and how to define custom data labels.
The database connection string is currently hardcoded for each repository type
(see #101 for discussion about possible
future improvements). For Postgres repositories, the connection string is
configurable using environment variables.
If you need to set additional connection parameters for other repository types,
you will need to modify the code or provide a new Repository
implementation.
Please open and issue and/or pull request if you have any suggestions or
contributions.
Learn more about Cyral by visiting Cyral.com and also the links below:
We use GitHub issues for tracking requests and bugs, please feel free to use that for reporting any requests or issues.