The Good, the Bad, and the Ugly of GSI and LSI in Amazon DynamoDB

5 min readNov 24, 2024

DynamoDB provides two indexing options for enhancing query flexibility: Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs). While these features significantly expand query capabilities, they come with specific advantages, drawbacks, and challenges.

The Good

Increased Query Flexibility

GSI: Allows querying attributes that are not part of the primary key in the base table. For example, a table with UserId as the partition key and Timestamp as the sort key can include a GSI to query by Email instead.
LSI: Enables querying a table using an alternate sort key on the same partition key. For example, if your table uses UserId as the partition key, you could add an LSI to sort queries by OrderDate or OrderAmount.

Performance Optimization

Both GSIs and LSIs allow optimized retrieval of data by indexing attributes suited for common query patterns, reducing the need for inefficient table scans.
GSIs, in particular, are helpful when querying across different partitions compared to LSIs, which are confined to a single partition.

Scalable

GSIs can scale independently of the main table, enabling high throughput for queries on indexed attributes without impacting the performance of the base table.

The Bad

Cost Implications

GSI: Additional read and write capacity units (RCUs and WCUs) are required for GSIs, potentially leading to significant costs if the indexed data is frequently written to or queried.
LSI: While LSIs do not incur additional WCU costs, they increase storage costs since every LSI duplicates a portion of the base table data.

Write Amplification

GSI: Every write operation to the main table that includes indexed attributes must propagate changes to the GSI, potentially leading to increased latency and higher write costs.
LSI: While it avoids write costs for throughput, the LSI structure can slow down write operations on the main table due to the additional sort key indexing.

Limited LSIs

DynamoDB imposes a limit of 5 LSIs per table, which can restrict flexibility when designing complex query patterns.

The Ugly

Eventual Consistency for GSIs

GSIs have eventual consistency, meaning there can be a lag between updates in the base table and their propagation to the GSI. This delay can cause queries on the GSI to return stale data, which is problematic for real-time use cases.
Irreversible Design Choices
LSI: LSIs must be defined at table creation and cannot be modified or added later. This inflexibility can lead to redesign efforts if initial data access patterns evolve.
GSI: Although GSIs can be added or removed after table creation, changes require careful planning to avoid service disruption or excessive costs during updates.

Hot Partitions

GSI: Poor partition key design for a GSI can lead to hot partitions, where one partition receives a disproportionate share of traffic, degrading performance.
LSI: Being tied to the base table’s partition key, LSIs are similarly vulnerable to hot partitioning if the base table’s partition key is poorly designed.

Debugging Complexity

Debugging issues like performance bottlenecks, write conflicts, or stale reads on GSIs and LSIs can be challenging, especially in large-scale, high-traffic environments. Misconfigured indexes may lead to inefficient queries and unexpected costs.

Now lets go through example JSON definition for GSI and LSI

{
    "TableName": "Orders",
    "KeySchema": [
        { "AttributeName": "CustomerId", "KeyType": "HASH" }, 
        { "AttributeName": "OrderId", "KeyType": "RANGE" }
    ],
    "AttributeDefinitions": [
        { "AttributeName": "CustomerId", "AttributeType": "S" },
        { "AttributeName": "OrderId", "AttributeType": "S" },
        { "AttributeName": "OrderStatus", "AttributeType": "S" } 
    ],
    "GlobalSecondaryIndexes": [
        {
            "IndexName": "OrderStatusIndex",
            "KeySchema": [
                { "AttributeName": "OrderStatus", "KeyType": "HASH" }
            ],
            "Projection": {
                "ProjectionType": "ALL"
            },
            "ProvisionedThroughput": {
                "ReadCapacityUnits": 5,
                "WriteCapacityUnits": 5
            }
        }
    ],
    "BillingMode": "PROVISIONED"
}

the JSON above represents Table with GSI, and the main things to recognize are:

The main table has CustomerId (partition key) and OrderId (sort key).
A GSI named OrderStatusIndex is created with OrderStatus as the partition key.
The projection type is ALL, meaning all attributes from the table are included in the index.

While the JSON below represents Table with LSI, and the main things to recognize in the Json above are:

The main table uses CustomerId as the partition key and OrderId as the sort key.
An LSI named OrderDateIndex is created with CustomerId as the partition key (same as the base table) and OrderDate as the alternate sort key.
The projection type is KEYS_ONLY, so only the keys and indexed attributes are available in the index.

{
    "TableName": "Orders",
    "KeySchema": [
        { "AttributeName": "CustomerId", "KeyType": "HASH" },
        { "AttributeName": "OrderId", "KeyType": "RANGE" }
    ],
    "AttributeDefinitions": [
        { "AttributeName": "CustomerId", "AttributeType": "S" },
        { "AttributeName": "OrderId", "AttributeType": "S" },
        { "AttributeName": "OrderDate", "AttributeType": "S" }
    ],
    "LocalSecondaryIndexes": [
        {
            "IndexName": "OrderDateIndex",
            "KeySchema": [
                { "AttributeName": "CustomerId", "KeyType": "HASH" },
                { "AttributeName": "OrderDate", "KeyType": "RANGE" }
            ],
            "Projection": {
                "ProjectionType": "KEYS_ONLY"
            }
        }
    ],
    "BillingMode": "PAY_PER_REQUEST"
}

So the key differences in the two JSONs above:

Attribute Definitions:

GSIs require attributes defined independently of the base table’s key schema.
LSIs share the same partition key as the base table, so only a sort key needs definition.

Key Schema:

GSIs define a new partition and (optional) sort key.
LSIs can only add a new sort key but must retain the base table’s partition key.

Provisioned Throughput:

GSIs require provisioning (or use on-demand billing).
LSIs use the base table’s throughput settings, which simplifies cost management.

A Query on GSI is

{
    "TableName": "Orders",
    "IndexName": "OrderStatusIndex",
    "KeyConditionExpression": "OrderStatus = :status",
    "ExpressionAttributeValues": {
        ":status": { "S": "Pending" }
    }
}

While Query on LSI is

{
    "TableName": "Orders",
    "IndexName": "OrderDateIndex",
    "KeyConditionExpression": "CustomerId = :customerId AND OrderDate > :startDate",
    "ExpressionAttributeValues": {
        ":customerId": { "S": "12345" },
        ":startDate": { "S": "2024-01-01" }
    }
}

As a summary, Choosing between GSIs and LSIs (or a combination of both) requires a careful analysis of query patterns, data access needs, and cost considerations. While they greatly enhance DynamoDB’s functionality, their trade-offs must be weighed against your specific use case requirements.

💡 Please let me know in the comments if you agree with me or if I missed something.

Thank you for your time 😊

The Good, the Bad, and the Ugly of GSI and LSI in Amazon DynamoDB

The Good

The Bad

The Ugly

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Useme Alehosaini

No responses yet