
The Good, the Bad, and the Ugly of GSI and LSI in Amazon DynamoDB
DynamoDB provides two indexing options for enhancing query flexibility: Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs). While these features significantly expand query capabilities, they come with specific advantages, drawbacks, and challenges.
The Good
Increased Query Flexibility
- GSI: Allows querying attributes that are not part of the primary key in the base table. For example, a table with
UserId
as the partition key andTimestamp
as the sort key can include a GSI to query byEmail
instead. - LSI: Enables querying a table using an alternate sort key on the same partition key. For example, if your table uses
UserId
as the partition key, you could add an LSI to sort queries byOrderDate
orOrderAmount
.
Performance Optimization
- Both GSIs and LSIs allow optimized retrieval of data by indexing attributes suited for common query patterns, reducing the need for inefficient table scans.
- GSIs, in particular, are helpful when querying across different partitions compared to LSIs, which are confined to a single partition.
Scalable
- GSIs can scale independently of the main table, enabling high throughput for queries on indexed attributes without impacting the performance of the base table.
The Bad
Cost Implications
- GSI: Additional read and write capacity units (RCUs and WCUs) are required for GSIs, potentially leading to significant costs if the indexed data is frequently written to or queried.
- LSI: While LSIs do not incur additional WCU costs, they increase storage costs since every LSI duplicates a portion of the base table data.
Write Amplification
- GSI: Every write operation to the main table that includes indexed attributes must propagate changes to the GSI, potentially leading to increased latency and higher write costs.
- LSI: While it avoids write costs for throughput, the LSI structure can slow down write operations on the main table due to the additional sort key indexing.
Limited LSIs
- DynamoDB imposes a limit of 5 LSIs per table, which can restrict flexibility when designing complex query patterns.
The Ugly
Eventual Consistency for GSIs
- GSIs have eventual consistency, meaning there can be a lag between updates in the base table and their propagation to the GSI. This delay can cause queries on the GSI to return stale data, which is problematic for real-time use cases.
- Irreversible Design Choices
- LSI: LSIs must be defined at table creation and cannot be modified or added later. This inflexibility can lead to redesign efforts if initial data access patterns evolve.
- GSI: Although GSIs can be added or removed after table creation, changes require careful planning to avoid service disruption or excessive costs during updates.
Hot Partitions
- GSI: Poor partition key design for a GSI can lead to hot partitions, where one partition receives a disproportionate share of traffic, degrading performance.
- LSI: Being tied to the base table’s partition key, LSIs are similarly vulnerable to hot partitioning if the base table’s partition key is poorly designed.
Debugging Complexity
- Debugging issues like performance bottlenecks, write conflicts, or stale reads on GSIs and LSIs can be challenging, especially in large-scale, high-traffic environments. Misconfigured indexes may lead to inefficient queries and unexpected costs.
Now lets go through example JSON definition for GSI and LSI
{
"TableName": "Orders",
"KeySchema": [
{ "AttributeName": "CustomerId", "KeyType": "HASH" },
{ "AttributeName": "OrderId", "KeyType": "RANGE" }
],
"AttributeDefinitions": [
{ "AttributeName": "CustomerId", "AttributeType": "S" },
{ "AttributeName": "OrderId", "AttributeType": "S" },
{ "AttributeName": "OrderStatus", "AttributeType": "S" }
],
"GlobalSecondaryIndexes": [
{
"IndexName": "OrderStatusIndex",
"KeySchema": [
{ "AttributeName": "OrderStatus", "KeyType": "HASH" }
],
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}
],
"BillingMode": "PROVISIONED"
}
the JSON above represents Table with GSI, and the main things to recognize are:
- The main table has
CustomerId
(partition key) andOrderId
(sort key). - A GSI named
OrderStatusIndex
is created withOrderStatus
as the partition key. - The projection type is
ALL
, meaning all attributes from the table are included in the index.
While the JSON below represents Table with LSI, and the main things to recognize in the Json above are:
- The main table uses
CustomerId
as the partition key andOrderId
as the sort key. - An LSI named
OrderDateIndex
is created withCustomerId
as the partition key (same as the base table) andOrderDate
as the alternate sort key. - The projection type is
KEYS_ONLY
, so only the keys and indexed attributes are available in the index.
{
"TableName": "Orders",
"KeySchema": [
{ "AttributeName": "CustomerId", "KeyType": "HASH" },
{ "AttributeName": "OrderId", "KeyType": "RANGE" }
],
"AttributeDefinitions": [
{ "AttributeName": "CustomerId", "AttributeType": "S" },
{ "AttributeName": "OrderId", "AttributeType": "S" },
{ "AttributeName": "OrderDate", "AttributeType": "S" }
],
"LocalSecondaryIndexes": [
{
"IndexName": "OrderDateIndex",
"KeySchema": [
{ "AttributeName": "CustomerId", "KeyType": "HASH" },
{ "AttributeName": "OrderDate", "KeyType": "RANGE" }
],
"Projection": {
"ProjectionType": "KEYS_ONLY"
}
}
],
"BillingMode": "PAY_PER_REQUEST"
}
So the key differences in the two JSONs above:
Attribute Definitions:
- GSIs require attributes defined independently of the base table’s key schema.
- LSIs share the same partition key as the base table, so only a sort key needs definition.
Key Schema:
- GSIs define a new partition and (optional) sort key.
- LSIs can only add a new sort key but must retain the base table’s partition key.
Provisioned Throughput:
- GSIs require provisioning (or use on-demand billing).
- LSIs use the base table’s throughput settings, which simplifies cost management.
A Query on GSI is
{
"TableName": "Orders",
"IndexName": "OrderStatusIndex",
"KeyConditionExpression": "OrderStatus = :status",
"ExpressionAttributeValues": {
":status": { "S": "Pending" }
}
}
While Query on LSI is
{
"TableName": "Orders",
"IndexName": "OrderDateIndex",
"KeyConditionExpression": "CustomerId = :customerId AND OrderDate > :startDate",
"ExpressionAttributeValues": {
":customerId": { "S": "12345" },
":startDate": { "S": "2024-01-01" }
}
}
As a summary, Choosing between GSIs and LSIs (or a combination of both) requires a careful analysis of query patterns, data access needs, and cost considerations. While they greatly enhance DynamoDB’s functionality, their trade-offs must be weighed against your specific use case requirements.
💡 Please let me know in the comments if you agree with me or if I missed something.
Thank you for your time 😊