Azure Cosmos DB
- 1 Intro
- 2 Documentation
- 3 Tips and Tidbits
- 4 Throughput
- 5 Consistency
- 6 Access
- 7 Partitions
- 8 Change feed processor in Azure Cosmos DB
- 9 Real Time Objectives
- 10 Pricing and Minimizing Costs
- 11 Queries
- 12 Store Proc, Triggers and Functions
- 13 CosmosClient
- 14 Creating CosmosDB with Azure CLI in powershell
- 15 Indexing policies in Azure Cosmos DB
Intro
Azure Cosmos DB is a fully managed NoSQL database for modern app development. Single-digit millisecond response times, and automatic and instant scalability, guarantee speed at any scale. Business continuity is assured with SLA-backed availability and enterprise-grade security
Â
Documentation
Frequently asked questions about different APIs in Azure Cosmos DB
Deep-dive in Azure Cosmos DB: Advanced topics on partitioning, data distribution and indexing
Â
Â
Tips and Tidbits
Â
Azure Cosmos DB transparently replicates your data across all the Azure regions associated with your Cosmos account.
Supports multiple consistency levels
Azure Cosmos DB provides the choice of five well-defined consistency levels to achieve optimal tradeoffs between consistency and performance.
These consistency levels are strong, bounded-staleness, session, consistent prefix and eventual.
With Cosmos DB, before a write operation is acknowledged to the client, the data is durably committed by a quorum of replicas within the region that accepts the write operations.
It is multi-model and supports document, key-value, graph, and columnar data models.
Azure Cosmos DB transparently replicates your data across all the Azure regions associated with your Cosmos account
The only Azure DB service that allows for multi-master writes
You can access the service using any of the common APIs:
Core (SQL) is the default API for Azure Cosmos DB, which provides you with a view of your data that resembles a traditional NoSQL document store.
You can query the hierarchical JSON documents with a SQL-like language. Core (SQL) uses JavaScript's type system, expression evaluation, and function invocation.Â
Azure Cosmos DB's API for MongoDB supports the MongoDB wire protocol. This API allows existing MongoDB client SDKs, drivers, and tools to interact with the data transparently, as if they are running against an actual MongoDB database.
The data is stored in document format, which is the same as using Core (SQL).
Azure Cosmos DB's support for the Cassandra API makes it possible to query data by using the Cassandra Query Language (CQL), and your data will appear to be a partitioned row store. Just like the MongoDB API, any clients or tools should be able to connect transparently to Azure Cosmos DB.
 Apache Cassandra uses a SQL-like query language named Cassandra Query Language (CQL)
Azure Table API provides support for applications that are written for Azure Table Storage that need premium capabilities like global distribution, high availability, scalable throughput.
The original Table API only allows for indexing on the Partition and Row keys; there are no secondary indexes.
Storing table data in Comsos DB automatically indexes all the properties and requires no index management.
Applications written for Azure Table storage can migrate to Azure Cosmos DB by using the Table API with no code changes and take advantage of premium capabilities.
The Table API has client SDKs available for .NET, Java, Python, and Node.js.
If you have data stored in Azure Table Storage, you can use either the data migration tool or AzCopy to import your data to the Azure Cosmos DB Table API.
Gremlin as the API provides a graph-based view over the data. Remember that at the lowest level, all data in any Azure Cosmos DB is stored in an ARS format.
Â
Â
You can select only one API type for a Cosmos account. You cannot use the same Cosmos account for Core (SQL) API and MongoDB API databases, for example.
The API selection cannot be changed after account creation.
Â
When you create a Cosmos DB account with a private endpoint, the public endpoint is disabled by default and your account receives traffic only from the private endpoint.
Key/value (table), columnar, document, and graph data models are all natively supported because of the ARS (atoms, records, and sequences) design that Azure Cosmos DB is built on.
Atoms, records, and sequences can be easily mapped and projected to various data models.
 Multi-model means Azure Cosmos DB supports multiple APIs and multiple data models, different APIs use different data formats for storage and wire protocol.
For example, SQL uses JSON, MongoDB uses BSON, Table uses EDM, Cassandra uses CQL, Gremlin uses JSON format.
At the lowest level, all data in any Azure Cosmos DB is stored in an ARS format
Azure Functions provides the simplest way to connect to the change feed.
You can create small reactive Azure Functions that will be automatically triggered on each new event in your Azure Cosmos container's change feed.
Currently, the Azure Functions trigger for Cosmos DB is supported for use with the Core (SQL) API only.
Serverless event-based architectures with Azure Cosmos DB and Azure Functions
Azure Functions SendGrid bindings
send email by using SendGrid bindings in Azure Functions. Azure Functions supports an output binding for SendGrid.
Other Azure function bindings: Azure Functions triggers and bindings concepts
Â
Azure Cosmos DB Table API has -
✑ Single-digit millisecond latency for reads and writes, backed with <10-ms latency reads and <15-ms latency writes at the 99th percentile, at any scale, anywhere in the world.
✑ Automatic and complete indexing on all properties, no index management.
✑ Turnkey global distribution from one to 30+ regions. Support for automatic and manual failovers at any time, anywhere in the world.
Design scalable and performant tables
Every entity stored in a table must have a unique combination of PartitionKey and RowKey.
As with keys in a relational database table, the PartitionKey and RowKey values are indexed to create a clustered index to enable fast look-ups.
Â
Throughput
How to choose between standard (manual) and autoscale provisioned throughput
You set the highest, or maximum RU/s
Tmax
you don't want the system to exceed. The system automatically scales the throughputT
such that0.1* Tmax <= T <= Tmax
.For example, if you set autoscale maximum RU/s of 4000 RU/s, the system will scale between 400 - 4000 RU/s.
Consistency
Â
Â
Supported consistency levels are:
Strong - Reads always return the most recent committed version of an item.
Strong consistency offers a linearizability guarantee.
The reads are guaranteed to return the most recent committed version of an item.
A client never sees an uncommitted or partial write.
Users are always guaranteed to read the latest committed write.
Bounded staleness - Reads might lag behind writes based on configured update versions (K) or time (t).
Bounded staleness is frequently chosen by globally distributed applications that expect low write latencies but require total global order guarantee.
Bounded staleness is great for applications featuring group collaboration and sharing, stock ticker, publish-subscribe/queueing etc.Â
The reads are guaranteed to honor the consistent-prefix guarantee.
The reads might lag behind writes by at most "K" versions (that is "updates") of an item or by "t" time interval.
When you choose bounded staleness, the "staleness" can be configured in two ways:
The number of versions (K) of the item
The time interval (t) by which the reads might lag behind the writes
Session - Scoped to a client session, and reads honor consistency guarantees including the consistent-prefix, monotonic reads, monotonic writes, read-your-writes, and write-follows-reads guarantees.
Consistent prefix - Updates that are returned contain some prefix of all the updates, and reads never see out-of-order writes.
Eventual - There is no ordering guarantee for reads, and replicas eventually converge.
There's no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge.
Examples include count of Retweets, Likes, or non-threaded comments.
Â
Consistency guarantees for a read operation correspond to the freshness and ordering of the database state that you request.
Read-consistency is tied to the ordering and propagation of the write/update operations.
If there are no write operations on the database, a read operation with eventual, session, or consistent prefix consistency levels is likely to yield the same results as a read operation with strong consistency level.
Â
For strong and bounded staleness, reads are done against two replicas in a four replica set (minority quorum) to provide consistency guarantees.
Session, consistent prefix and eventual do single replica reads.
The result is that, for the same number of request units, read throughput for strong and bounded staleness is half of the other consistency levels.
Â
You are designing an Azure Cosmos DB solution that will host multiple writable replicas in multiple Azure regions.
You need to recommend the strongest database consistency level for the design. The solution must meet the following requirements:
✑ Provide a latency-based Service Level Agreement (SLA) for writes.
✑ Support multiple regions.
Use Bounded staleness
Â
Access
Azure Cosmos DB exposes a built-in role-based access control (RBAC) system that lets you:
Authenticate your data requests with an Azure Active Directory (AAD) identity.
Authorize your data requests with a fine-grained, role-based permission model.
Azure Cosmos DB RBAC is the ideal access control method in situations where:
You don't want to use a shared secret like the primary key, and prefer to rely on a token-based authentication mechanism,
You want to use Azure AD identities to authenticate your requests,
You need a fine-grained permission model to tightly restrict which database operations your identities are allowed to perform,
You wish to materialize your access control policies as "roles" that you can assign to multiple identities.
Azure Cosmos DB uses hash-based message authentication code (HMAC) for authorization.
Each request is hashed using the secret account key, and the subsequent base-64 encoded hash is sent with each call to Azure Cosmos DB.
Â
All REST operations, whether you're using a master key token or resource token, must include the authorization header with the authorization string in order to interact with a resource
The authorization string has the following format:
Â
type={typeoftoken}&ver={tokenversion}&sig={hashsignature}
Â
{typeoftoken} denotes the type of token: master, resource, or aad(if you are using Azure Cosmos DB RBAC).
{tokenversion} denotes the version of the token, currently 1.0.
{hashsignature} denotes the hashed token signature or the oauth token if you are using Azure Cosmos DB RBAC.
Â
For data-plane access control, see: Configure role-based access control with Azure Active Directory for your Azure Cosmos DB account
For management-plane access control, see Azure role-based access control in Azure Cosmos DB
Partitions
Â
In partitioning, the items in a container are divided into distinct subsets called logical partitions.
Logical partitions are formed based on the value of a partition key that is associated with each item in a container.
All the items in a logical partition have the same partition key value.
EnableCrossPartitionQuery
indicates whether users are enabled to send more than one request to execute the query in the Azure Cosmos DB service.More than one request is necessary if the query is not scoped to single partition key value
You can’t change the partition key once the container has been created but you can migrate the data to a new container.
Create a synthetic partition key (SQL API)
It's the best practice to have a partition key with many distinct values, such as hundreds or thousands.
The goal is to distribute your data and workload evenly across the items associated with these partition key values.
If such a property doesn’t exist in your data, you can construct a synthetic partition key.
You can form a partition key by concatenating multiple property values into a single artificial
partitionKey
propertyAnother possible strategy to distribute the workload more evenly is to append a random number at the end of the partition key value.
When you distribute items in this way, you can perform parallel write operations across partitions.
Use a partition key with pre-calculated suffixes
The random suffix strategy can greatly improve write throughput, but it's difficult to read a specific item.
You don't know the suffix value that was used when you wrote the item.
To make it easier to read individual items, use the pre-calculated suffixes strategy
Â
Change feed processor in Azure Cosmos DB
The change feed processor is part of the Azure Cosmos DB SDK V3.
It simplifies the process of reading the change feed and distribute the event processing across multiple consumers effectively.
There are four main components of implementing the change feed processor:
The monitored container: The monitored container has the data from which the change feed is generated.
Any inserts and updates to the monitored container are reflected in the change feed of the container.
The lease container: The lease container acts as a state storage and coordinates processing the change feed across multiple workers.
The lease container can be stored in the same account as the monitored container or in a separate account.
The host: A host is an application instance that uses the change feed processor to listen for changes.
Multiple instances with the same lease configuration can run in parallel, but each instance should have a different instance name.
The delegate: The delegate is the code that defines what you, the developer, want to do with each batch of changes that the change feed processor reads.
You can work with the Azure Cosmos DB change feed using either a push model or a pull model.
With a push model, the change feed processor pushes work to a client that has business logic for processing this work
Using a push model is the easiest way to read from the change feed.
There are two ways you can read from the change feed with a push model: Azure Functions Cosmos DB triggers and the change feed processor library.
When you create an Azure Functions trigger for Azure Cosmos DB, you select the container to connect, and the Azure Function gets triggered whenever there is a change in the container.
Because Azure Functions uses the change feed processor behind the scenes, it automatically parallelizes change processing across your container's partitions.
The change feed processor library follows the observer pattern, where your processing function is called by the library.
The change feed processor library will automatically check for changes and, if changes are found, "push" these to the client.
Â
Real Time Objectives
Â
RTOÂ (Recovery Time Objective) indicates the time between the beginning of an outage impacting Cosmos DB and the recovery to full availability
RPOÂ (Recovery Point Objective) indicate the time between the last write correctly restored and the time of the beginning of the outage affecting Cosmos DB
Good explanation of these terms and what the problems were: RPO vs. RTO: Key differences explained with examples, tips
 It is possible to distribute your Cosmos DB cluster across availability zones, which results increased SLAs.
When using this option, Cosmos DB provides RTO = 0 and and RPO = 0 even in case of outages of a whole availability zone.
Single master replication across two regions with Staleness bound consistency, RTO < 15 mins
Multi-master replication over multiple regions provides an RTO of 0 minutes for all consistencies except Strong.
When deployed in a single region, the RTO is < 1 week for any consistency.
Pricing and Minimizing Costs
Â
Â
Â
Queries
Â
SQL queries - You can query data by writing queries using the Structured Query Language (SQL) as a JSON query language.
Since SQL API works on JSON values, it deals with tree-shaped entities instead of rows and columns.
You can refer to the tree nodes at any arbitrary depth, like
Node1.Node2.Node3…..Nodem
, similar to the two-part reference of<table>.<column>
in ANSI SQL.In-partition query
When you query data from containers, if the query has a partition key filter specified, Azure Cosmos DB automatically optimizes the query.
It routes the query to the physical partitions corresponding to the partition key values specified in the filter.
In order to be an in-partition query, the query must have an equality filter that includes the partition key:
SELECT * FROM c WHERE c.DeviceId = 'XMS-0001'
Â
Still in partition:
SELECT * FROM c WHERE c.DeviceId = 'XMS-0001' AND c.Location = 'Seattle'
Â
No longer an in-partition query:
Â
Cross-partition query
The following query doesn't have a filter on the partition key (
DeviceId
).Therefore, it must fan-out to all physical partitions where it is run against each partition's index:
Each physical partition has its own index.
Therefore, when you run a cross-partition query on a container, you are effectively running one query per physical partition.
Avoiding cross-partition queries typically only matters with large containers.
You are charged a minimum of about 2.5 RU's each time you check a physical partition's index for results, even if no items in the physical partition match the query's filter.
Â
Â
Â
Â
Store Proc, Triggers and Functions
Â
How to write stored procedures, triggers, and user-defined functions in Azure Cosmos DB
Azure Cosmos DB supports pre-triggers and post-triggers.
Pre-triggers are executed before modifying a database item and post-triggers are executed after modifying a database item.
Triggers are not automatically executed, they must be specified for each database operation where you want them to execute.
After you define a trigger, you should register and call a pre-trigger by using the Azure Cosmos DB SDKs.
Â
CosmosClient
Provides a client-side logical representation of the Azure Cosmos DB account. This client can be used to configure and execute requests in the Azure Cosmos DB database service.
Creating CosmosDB with Azure CLI in powershell
Â
Â
Indexing policies in Azure Cosmos DB
Â
In Azure Cosmos DB, every container has an indexing policy that dictates how the container's items should be indexed.
The default indexing policy for newly created containers indexes every property of every item and enforces range indexes for any string or number.
In some situations, you may want to override this automatic behavior to better suit your requirements.
You can customize a container's indexing policy by setting its indexing mode, and include or exclude property paths.
Azure Cosmos DB supports two indexing modes:
Consistent: The index is updated synchronously as you create, update or delete items.
This means that the consistency of your read queries will be the consistency configured for the account.
None: Indexing is disabled on the container. This is commonly used when a container is used as a pure key-value store without the need for secondary indexes.
It can also be used to improve the performance of bulk operations.
After the bulk operations are complete, the index mode can be set to Consistent and then monitored using the IndexTransformationProgress until complete.
By default, indexing policy is set to
automatic
. It's achieved by setting theautomatic
property in the indexing policy totrue
.Setting this property to
true
allows Azure Cosmos DB to automatically index documents as they are written.
A custom indexing policy can specify property paths that are explicitly included or excluded from indexing.
By optimizing the number of paths that are indexed, you can substantially reduce the latency and RU charge of write operations.
Queries that have an
ORDER BY
clause with two or more properties require a composite index.You can also define a composite index to improve the performance of many equality and range queries.
By default, no composite indexes are defined so you should add composite indexes as needed.
Â
Opt-out policy to selectively exclude some property paths
Composite index defined for (name asc, age desc):