DynamoDB — Data Modeling That Actually Works — AWS for Backend Engineers

DynamoDB intimidates backend engineers coming from relational databases. There are no joins, no flexible queries, and no schema changes after the fact. It feels like programming with one hand tied behind your back.

But that’s the wrong framing. DynamoDB’s constraints are its strength. When you model correctly, you get single-digit millisecond reads at any scale — whether your table has 10 rows or 10 billion. The catch is that you have to know your access patterns upfront. You don’t normalize data and figure out queries later. You start with queries and design your data to serve them.

DynamoDB single-table design showing users, orders, and products

The Fundamentals

Primary Key

Every DynamoDB table has a primary key that uniquely identifies each item. Two types:

Partition key only (simple primary key): A single attribute. DynamoDB hashes this value to determine which physical partition stores the item.

Partition key + sort key (composite primary key): Two attributes. The partition key determines the partition, the sort key determines the order within that partition. This is what makes DynamoDB powerful.

Partition Key: USER#123
Sort Key:      PROFILE
→ Returns the user's profile

Partition Key: USER#123
Sort Key:      ORDER#2026-03-15#ORD-001
→ Returns a specific order

Partition Key: USER#123
Sort Key:      begins_with("ORDER#")
→ Returns ALL of this user's orders, sorted by date

With a composite key, one Query operation can return all items sharing a partition key, optionally filtered by sort key conditions: begins_with, between, >, <, =.

Items and Attributes

Items are like rows. Attributes are like columns. But unlike SQL, each item can have different attributes. There’s no schema enforced at the table level. One item might have email and phone. Another might have shippingAddress and loyaltyTier. DynamoDB doesn’t care.

The only required attributes are the primary key components.

Capacity Modes

On-demand: Pay per request. No capacity planning. DynamoDB scales automatically. Costs ~$1.25 per million write request units and ~$0.25 per million read request units. Perfect for unpredictable or spiky workloads.

Provisioned: You specify reads/writes per second. Cheaper for steady-state workloads (~70% less than on-demand at high volume). Auto-scaling adjusts capacity based on utilization, but it’s reactive — there’s a lag before scaling up.

Start with on-demand. Switch to provisioned when your workload is predictable and cost optimization matters.

Single-Table Design

This is the concept that blows relational minds. Instead of one table per entity (users, orders, products), you put everything in one table. Different entity types share the same table but use different key patterns.

Why? Because DynamoDB charges per table (if provisioned) and, more importantly, because related data can be co-located in the same partition for efficient queries.

The Access Patterns

Before writing a single line of code, list every access pattern your application needs:

Get user by ID
Get user’s orders (sorted by date)
Get a specific order
Get all orders by status (e.g., all “pending” orders)
Get product by ID
Get products by category

Now design the key structure to serve all of these:

Entity	PK	SK	GSI1-PK	GSI1-SK
User	`USER#<userId>`	`PROFILE`	`USER#<email>`	`PROFILE`
Order	`USER#<userId>`	`ORDER#<date>#<orderId>`	`STATUS#<status>`	`ORDER#<date>#<orderId>`
Order Item	`ORDER#<orderId>`	`ITEM#<productId>`
Product	`PRODUCT#<productId>`	`METADATA`	`CAT#<category>`	`PRODUCT#<productId>`

This serves every access pattern:

Get user by ID: Query PK=USER#123, SK=PROFILE
Get user’s orders: Query PK=USER#123, SK begins_with ORDER#
Get specific order: Query PK=USER#123, SK=ORDER#2026-03-15#ORD-001
Orders by status: Query GSI1 PK=STATUS#pending, SK begins_with ORDER#
Product by ID: Query PK=PRODUCT#456, SK=METADATA
Products by category: Query GSI1 PK=CAT#electronics, SK begins_with PRODUCT#

CRUD Operations with AWS SDK

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import {
  DynamoDBDocumentClient,
  PutCommand,
  GetCommand,
  QueryCommand,
  UpdateCommand,
  DeleteCommand
} from '@aws-sdk/lib-dynamodb';

const client = new DynamoDBClient({ region: 'us-east-1' });
const ddb = DynamoDBDocumentClient.from(client);
const TABLE = process.env.TABLE_NAME;

// CREATE — Put a new user
async function createUser(user) {
  await ddb.send(new PutCommand({
    TableName: TABLE,
    Item: {
      PK: `USER#${user.id}`,
      SK: 'PROFILE',
      GSI1PK: `USER#${user.email}`,
      GSI1SK: 'PROFILE',
      userId: user.id,
      name: user.name,
      email: user.email,
      createdAt: new Date().toISOString(),
      entityType: 'User'
    },
    ConditionExpression: 'attribute_not_exists(PK)'  // prevent overwrites
  }));
}

// READ — Get user by ID
async function getUser(userId) {
  const { Item } = await ddb.send(new GetCommand({
    TableName: TABLE,
    Key: {
      PK: `USER#${userId}`,
      SK: 'PROFILE'
    }
  }));
  return Item;
}

// READ — Get user's orders (sorted by date, newest first)
async function getUserOrders(userId, limit = 20) {
  const { Items } = await ddb.send(new QueryCommand({
    TableName: TABLE,
    KeyConditionExpression: 'PK = :pk AND begins_with(SK, :sk)',
    ExpressionAttributeValues: {
      ':pk': `USER#${userId}`,
      ':sk': 'ORDER#'
    },
    ScanIndexForward: false,  // newest first
    Limit: limit
  }));
  return Items;
}

// CREATE — Place an order
async function createOrder(userId, order) {
  const date = new Date().toISOString();
  await ddb.send(new PutCommand({
    TableName: TABLE,
    Item: {
      PK: `USER#${userId}`,
      SK: `ORDER#${date}#${order.id}`,
      GSI1PK: `STATUS#pending`,
      GSI1SK: `ORDER#${date}#${order.id}`,
      orderId: order.id,
      userId: userId,
      status: 'pending',
      total: order.total,
      items: order.items,
      createdAt: date,
      entityType: 'Order'
    }
  }));
}

// UPDATE — Change order status
async function updateOrderStatus(userId, orderSK, newStatus) {
  await ddb.send(new UpdateCommand({
    TableName: TABLE,
    Key: {
      PK: `USER#${userId}`,
      SK: orderSK
    },
    UpdateExpression: 'SET #status = :status, GSI1PK = :gsi1pk, updatedAt = :now',
    ExpressionAttributeNames: {
      '#status': 'status'
    },
    ExpressionAttributeValues: {
      ':status': newStatus,
      ':gsi1pk': `STATUS#${newStatus}`,
      ':now': new Date().toISOString()
    }
  }));
}

// DELETE — Remove an item
async function deleteOrder(userId, orderSK) {
  await ddb.send(new DeleteCommand({
    TableName: TABLE,
    Key: {
      PK: `USER#${userId}`,
      SK: orderSK
    }
  }));
}

GSIs and LSIs

Global Secondary Indexes (GSIs)

A GSI is a completely separate partition/sort key pair that gives you an alternate query dimension. Think of it as a materialized view that DynamoDB maintains automatically.

In the table above, GSI1 with PK=STATUS#pending lets us query all pending orders across all users. The base table can’t do this because the data is partitioned by user.

Key facts:

You can have up to 20 GSIs per table
Each GSI costs additional write capacity (every write to the base table is replicated to all GSIs)
GSI reads are eventually consistent only
You choose which attributes to project into the GSI (KEYS_ONLY, INCLUDE, or ALL)

// Query all pending orders using GSI1
async function getOrdersByStatus(status) {
  const { Items } = await ddb.send(new QueryCommand({
    TableName: TABLE,
    IndexName: 'GSI1',
    KeyConditionExpression: 'GSI1PK = :pk',
    ExpressionAttributeValues: {
      ':pk': `STATUS#${status}`
    },
    ScanIndexForward: false
  }));
  return Items;
}

Local Secondary Indexes (LSIs)

An LSI shares the same partition key as your table but has a different sort key. It must be created at table creation time (you can’t add one later).

Use case: your table’s sort key is ORDER#<date>#<orderId> for chronological sorting. An LSI with sort key AMOUNT#<total> lets you query a user’s orders sorted by amount instead.

LSIs are less commonly used than GSIs because they share throughput with the base table and limit the partition size to 10 GB.

Composite Sort Keys

This is the pattern that makes single-table design work. By encoding multiple values in the sort key, you enable hierarchical queries:

SK: ORDER#2026-03-15#ORD-001

Query with begins_with("ORDER#")           → all orders
Query with begins_with("ORDER#2026-03")    → March 2026 orders
Query with begins_with("ORDER#2026-03-15") → orders from March 15
Query with = "ORDER#2026-03-15#ORD-001"    → exact order

One key, four different query granularities. This is like having four indexes for free.

Sparse Indexes

A GSI only contains items where the GSI’s key attributes exist. If most items don’t have a flaggedForReview attribute, and you create a GSI on flaggedForReview, only flagged items appear in the index. The GSI is small, queries are fast, and you pay minimal storage.

// Flag an order for review — it appears in the sparse GSI
await ddb.send(new UpdateCommand({
  TableName: TABLE,
  Key: { PK: `USER#${userId}`, SK: orderSK },
  UpdateExpression: 'SET GSI2PK = :pk, GSI2SK = :sk, flagReason = :reason',
  ExpressionAttributeValues: {
    ':pk': 'FLAGGED',
    ':sk': `${new Date().toISOString()}#${orderId}`,
    ':reason': 'Unusual amount'
  }
}));

// Query all flagged orders — efficient because the GSI only contains flagged items
async function getFlaggedOrders() {
  const { Items } = await ddb.send(new QueryCommand({
    TableName: TABLE,
    IndexName: 'GSI2',
    KeyConditionExpression: 'GSI2PK = :pk',
    ExpressionAttributeValues: { ':pk': 'FLAGGED' }
  }));
  return Items;
}

TTL — Automatic Expiration

Set a ttl attribute (Unix timestamp in seconds) on any item, and DynamoDB deletes it automatically after that time. Free of charge. No Lambda, no cron jobs.

// Create a session with 24-hour TTL
async function createSession(userId, sessionId) {
  const ttl = Math.floor(Date.now() / 1000) + (24 * 60 * 60); // 24 hours from now

  await ddb.send(new PutCommand({
    TableName: TABLE,
    Item: {
      PK: `SESSION#${sessionId}`,
      SK: 'METADATA',
      userId,
      createdAt: new Date().toISOString(),
      ttl   // DynamoDB deletes this item after 24 hours
    }
  }));
}

Perfect for sessions, temporary tokens, cache entries, OTP codes, and any time-bounded data. Note: deletion is eventual — items may persist up to 48 hours past the TTL. Don’t rely on TTL for precise timing.

DynamoDB Streams

Every write to your table can be captured as a stream event. Streams power event-driven architectures:

Replicate data to Elasticsearch for full-text search
Maintain materialized views across tables
Trigger Lambda functions on data changes
Aggregate analytics as data flows in

// Lambda function triggered by DynamoDB Stream
export async function handler(event) {
  for (const record of event.Records) {
    const { eventName, dynamodb } = record;

    if (eventName === 'INSERT' || eventName === 'MODIFY') {
      const newItem = unmarshall(dynamodb.NewImage);

      if (newItem.entityType === 'Order' && newItem.status === 'completed') {
        // Sync completed order to analytics
        await updateAnalytics(newItem);

        // Index in Elasticsearch for search
        await indexInElasticsearch(newItem);
      }
    }

    if (eventName === 'REMOVE') {
      const oldItem = unmarshall(dynamodb.OldImage);
      await removeFromElasticsearch(oldItem);
    }
  }
}

Stream records include the old and new image of the item (configurable: keys only, new image, old image, or both). They’re retained for 24 hours.

Transactions

DynamoDB supports ACID transactions across up to 100 items in a single operation. This is essential for operations that must be all-or-nothing:

import { TransactWriteCommand } from '@aws-sdk/lib-dynamodb';

// Transfer loyalty points between users — must be atomic
async function transferPoints(fromUserId, toUserId, points) {
  await ddb.send(new TransactWriteCommand({
    TransactItems: [
      {
        Update: {
          TableName: TABLE,
          Key: { PK: `USER#${fromUserId}`, SK: 'PROFILE' },
          UpdateExpression: 'SET points = points - :pts',
          ConditionExpression: 'points >= :pts',
          ExpressionAttributeValues: { ':pts': points }
        }
      },
      {
        Update: {
          TableName: TABLE,
          Key: { PK: `USER#${toUserId}`, SK: 'PROFILE' },
          UpdateExpression: 'SET points = points + :pts',
          ExpressionAttributeValues: { ':pts': points }
        }
      }
    ]
  }));
}

Transactions cost 2x normal operations (they use a 2-phase commit protocol). Use them when correctness requires atomicity, not as the default for every write.

Batch Operations

For bulk reads and writes, use BatchGetItem and BatchWriteItem. They handle up to 25 items per call and are more efficient than individual operations.

import { BatchWriteCommand } from '@aws-sdk/lib-dynamodb';

async function batchCreateOrderItems(orderId, items) {
  const requests = items.map(item => ({
    PutRequest: {
      Item: {
        PK: `ORDER#${orderId}`,
        SK: `ITEM#${item.productId}`,
        productId: item.productId,
        quantity: item.quantity,
        price: item.price,
        entityType: 'OrderItem'
      }
    }
  }));

  // Process in chunks of 25
  for (let i = 0; i < requests.length; i += 25) {
    const chunk = requests.slice(i, i + 25);
    const result = await ddb.send(new BatchWriteCommand({
      RequestItems: {
        [TABLE]: chunk
      }
    }));

    // Handle unprocessed items (throttled)
    if (result.UnprocessedItems?.[TABLE]?.length > 0) {
      // Retry with exponential backoff
      await retryUnprocessed(result.UnprocessedItems);
    }
  }
}

The Hot Partition Problem

DynamoDB distributes data across partitions by hashing the partition key. If one partition key gets disproportionate traffic, that partition becomes “hot” and requests get throttled — even if your table has spare capacity elsewhere.

Common causes:

Using a status as partition key (STATUS#active gets 99% of traffic)
Celebrity accounts in a social app (one user has millions of followers)
Time-based keys where all current data shares the same key

Solutions:

Write sharding: append a random suffix (e.g., STATUS#active#3) and scatter reads across shards
Better key design: partition by something with higher cardinality
On-demand mode: handles burst traffic more gracefully than provisioned

// Write sharding for a hot partition
const SHARD_COUNT = 10;

async function incrementViewCount(pageId) {
  const shard = Math.floor(Math.random() * SHARD_COUNT);

  await ddb.send(new UpdateCommand({
    TableName: TABLE,
    Key: {
      PK: `PAGE#${pageId}`,
      SK: `VIEWS#${shard}`
    },
    UpdateExpression: 'ADD viewCount :inc',
    ExpressionAttributeValues: { ':inc': 1 }
  }));
}

// Read: query all shards and sum
async function getViewCount(pageId) {
  const { Items } = await ddb.send(new QueryCommand({
    TableName: TABLE,
    KeyConditionExpression: 'PK = :pk AND begins_with(SK, :sk)',
    ExpressionAttributeValues: {
      ':pk': `PAGE#${pageId}`,
      ':sk': 'VIEWS#'
    }
  }));

  return Items.reduce((sum, item) => sum + (item.viewCount || 0), 0);
}

Cost Modeling

DynamoDB pricing is based on:

Write Request Units (WRU): 1 WRU = one 1 KB write. On-demand: $1.25/million. Provisioned: ~$0.00065 per WRU per hour.
Read Request Units (RRU): 1 RRU = one 4 KB strongly consistent read (or two eventually consistent reads). On-demand: $0.25/million.
Storage: $0.25/GB/month.
GSIs: consume additional WRUs for every base table write.

A typical microservice with 100 writes/sec and 500 reads/sec:

On-demand: ~$330/month for writes + $33/month for reads = ~$363/month
Provisioned (with reserved capacity): ~$47/month for writes + ~$12/month for reads = ~$59/month

The jump from on-demand to provisioned is significant. Once your workload stabilizes, the switch is worth it.

When DynamoDB Is the Wrong Choice

DynamoDB is not a universal database. Don’t use it when:

You need ad-hoc queries: Analytics, reporting, exploratory queries — DynamoDB requires you to know your access patterns upfront. Use Athena, Redshift, or PostgreSQL.
You need complex joins: If your data model requires joining 5 tables together, use a relational database.
You need full-text search: DynamoDB has no text search capability. Use Elasticsearch/OpenSearch.
Your data model is highly relational: Many-to-many relationships with complex traversals — consider a relational or graph database.
You need strong consistency on GSIs: GSI reads are always eventually consistent.
Your items are large: The 400 KB item limit is a real constraint. Store large blobs in S3 and keep a reference in DynamoDB.

DynamoDB is the right choice when you need predictable single-digit millisecond latency at any scale, your access patterns are well-defined, and your data model fits the key-value/document paradigm. For backend APIs serving users, session stores, gaming leaderboards, IoT data ingestion, and event-driven microservices — it’s hard to beat.

The key insight: don’t model your data and then figure out queries. Start with the queries, and model your data to serve them. Everything in DynamoDB flows from this principle.