Skip to content

Lambda function dump DynamoDB into CSV format and stored in a S3 bucket

License

Notifications You must be signed in to change notification settings

liangruibupt/ddb2csv-lambda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

概述

本实验演示如何使用Lambda函数将DynamoDB转储为CSV格式并存储在S3存储桶中。 lambda可以按需手动触发,也可以通过CloudWatch事件定期触发。

官方方法是利用AWS Data Pipeline,但是AWS Data Pipeline在北京区域和宁夏区域均没有发布。 使用EMR群集将数据导出到Hive table并转换为CSV格式的方式涉及到EMR群集以及Hive external table的创建和维护。 本实验演示了如何使用 DynamoDB API和Lambda 扫描DynamoDB表并将其导出为CSV格式,该方法成本较低,并且不需要基础架构维护工作。

步骤一 准备测试数据

cd load-data
python MoviesCreateTable.py
python MoviesLoadData.py

## 检查数据大小
aws dynamodb scan --table-name Movies --select "COUNT" --region cn-north-1 --profile cn-north-1
{
    "Count": 4609,
    "ScannedCount": 4609,
    "ConsumedCapacity": null
}

aws dynamodb scan --table-name airport-codes --select "COUNT" --region cn-north-1 --profile cn-north-1
{
    "Count": 4609,
    "ScannedCount": 4609,
    "ConsumedCapacity": null
}

步骤一 创建 Lambda函数

本实验使用serverless framework进行 Lambda 函数的创建

  1. 运行下列命令
npm install -g serverless
serverless create --template aws-nodejs --path ddb2csv-lambda
cd ddb2csv-lambda
  1. 拷贝handler.js, serverless.yml, layer/, test-data.json 到 ddb2csv-lambda

步骤二 创建 Lambda函数Layer

cd layer/nodejs
npm install

步骤三 部署 Lambda函数

请根据您的环境修改serverless.yml 中的三个参数

      inTableName: Movies
      outS3Bucket: s3://solutions-aws-cn/results/Movies
      outFileName: Movies-export.csv
cd ddb2csv-lambda
# 部署函数
serverless deploy -v --aws-profile cn-north-1 --region cn-north-1
# 可选,如果函数有更新,请执行下面的命令
serverless deploy function -f Ddb2Csv -v --aws-profile cn-north-1 --region cn-north-1

测试

导出之前创建的Movies表

## 可根据您的环境修改 test-data.json
{
    "tablename": "Movies",
    "s3bucket": "serverless-hands-on",
    "filename": "Movies-export.csv",
    "action": "dump"
}

## 调用函数
serverless invoke -f Ddb2Csv -l --aws-profile cn-north-1 --region cn-north-1 --path test-data.json
{
    "statusCode": 200,
    "body": {
        "message": "Upload the DynamoDB dump to S3. Your function executed successfully!",
        "input": {
            "tablename": "Movies",
            "s3bucket": "serverless-hands-on",
            "filename": "Movies-export.csv",
            "action": "dump"
        }
    }
}
--------------------------------------------------------------------
2020-08-12 09:16:49.115 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	Do scanDynamoDB
2020-08-12 09:16:49.699 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	putting new key year into headers
2020-08-12 09:16:49.720 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	putting new key info into headers year
2020-08-12 09:16:49.722 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	putting new key title into headers year,info
2020-08-12 09:16:50.139 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	Do scanDynamoDB
2020-08-12 09:16:50.661 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	Do scanDynamoDB
2020-08-12 09:16:51.061 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	Do scanDynamoDB
2020-08-12 09:16:51.520 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	Do scanDynamoDB
2020-08-12 09:16:52.059 (+00:00)	22525c13-3fc2-4c5a-844e-e7138b4dd55a	INFO	File Written

## 查询导出的文件
aws s3 cp s3://serverless-hands-on/Movies-export.csv . --profile cn-north-1 --region cn-north-1

## 确认导出数据条数与DynamoDB Table的Item 数量一致
cat Movies-export.csv | wc -l
4609

## 查看样例
head -10 Movies-export.csv
year,info,title
1933,"{""actors"":[""Fay Wray"",""Robert Armstrong"",""Bruce Cabot""],""release_date"":""1933-03-07T00:00:00Z"",""plot"":""A film crew goes to a tropical island for an exotic location shoot and discovers a colossal giant gorilla who takes a shine to their female blonde star."",""genres"":[""Adventure"",""Fantasy"",""Horror""],""image_url"":""http://ia.media-imdb.com/images/M/MV5BMTkxOTIxMDU2OV5BMl5BanBnXkFtZTcwNjM5NjQyMg@@._V1_SX400_.jpg"",""directors"":[""Merian C. Cooper"",""Ernest B. Schoedsack""],""rating"":8,""rank"":3551,""running_time_secs"":6000}",King Kong
....

50k items table export testing

## Sample event, you can put below json in test-data.json
{
    "tablename": "airport-codes",
    "s3bucket": "serverless-hands-on",
    "filename": "airport-codes-export.csv",
    "action": "dump"
}
serverless invoke -f Ddb2Csv -l --aws-profile cn-north-1 --region cn-north-1 --path test-data.json
aws s3 cp s3://serverless-hands-on/airport-codes-export.csv . --profile cn-north-1 --region cn-north-1

清理

## DyanmoDB tables
aws dynamodb delete-table --table-name Movies --region cn-north-1 --profile cn-north-1
aws dynamodb delete-table --table-name airport-codes --region cn-north-1 --profile cn-north-1
aws dynamodb delete-table --table-name Orders --region cn-north-1 --profile cn-north-1

## S3 files
aws s3 rm s3://serverless-hands-on/Movies-export.csv --profile cn-north-1 --region cn-north-1
aws s3 rm s3://serverless-hands-on/airport-codes-export.csv --profile cn-north-1 --region cn-north-1
aws s3 rm s3://serverless-hands-on/airport-codes-export.csv --profile cn-north-1 --region cn-north-1

## Lambda function
cd ddb2csv-lambda
sls remove --aws-profile cn-north-1 --region cn-north-1

About

Lambda function dump DynamoDB into CSV format and stored in a S3 bucket

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published