Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment
A blog post going into details about this library.
$ yarn add @shelf/dynamodb-parallel-scan
This library has 2 peer dependencies:
@aws-sdk/client-dynamodb
@aws-sdk/lib-dynamodb
Make sure to install them alongside this library.
Easily parallelize scan requests to fetch all items from a table at once. This is useful when you need to scan a large table to find a small number of items that will fit the node.js memory.
Scan huge tables using async generator or stream. And yes, it supports streams backpressure! Useful when you need to process a large number of items while you scan them. It allows receiving chunks of scanned items, wait until you process them, and then resume scanning when you're ready.
const {parallelScan} = require('@shelf/dynamodb-parallel-scan');
(async () => {
const items = await parallelScan(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000}
);
console.log(items);
})();
Note: highWaterMark
determines items count threshold, so Parallel Scan can fetch concurrency
* 1MB more data even after highWaterMark was reached.
const {parallelScanAsStream} = require('@shelf/dynamodb-parallel-scan');
(async () => {
const stream = await parallelScanAsStream(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000, chunkSize: 10000, highWaterMark: 10000}
);
for await (const items of stream) {
console.log(items); // 10k items here
}
})();
$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags
MIT © Shelf