Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment
A blog post going into details about this library.
$ pnpm add @shelf/dynamodb-parallel-scan
This library targets ESM and AWS SDK v3. Install alongside peer deps:
pnpm add @shelf/dynamodb-parallel-scan @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
Requires Node.js 22+.
Easily parallelize scan requests to fetch all items from a table at once. This is useful when you need to scan a large table to find a small number of items that will fit the node.js memory.
Scan huge tables using async generator or stream. And yes, it supports streams backpressure! Useful when you need to process a large number of items while you scan them. It allows receiving chunks of scanned items, wait until you process them, and then resume scanning when you're ready.
import {parallelScan} from '@shelf/dynamodb-parallel-scan';
const items = await parallelScan(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000}
);
console.log(items);
Note: highWaterMark
determines items count threshold, so Parallel Scan can fetch concurrency
* 1MB more data even after highWaterMark was reached.
import {parallelScanAsStream} from '@shelf/dynamodb-parallel-scan';
const stream = await parallelScanAsStream(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000, chunkSize: 10000, highWaterMark: 10000}
);
for await (const items of stream) {
console.log(items); // 10k items here
}
This package is ESM-only. In CommonJS, use dynamic import:
const {parallelScan, parallelScanAsStream} = await import('@shelf/dynamodb-parallel-scan');
$ pnpm dlx np
MIT © Shelf