Ingest data from Azure Cosmos DB into Azure Data Explorer (Preview) (2023)

  • Article
  • 10 minutes to read

Azure Data Explorer supports data ingestion from Azure Cosmos DB for NoSql using a change feed. The Cosmos DB change feed data connection is an ingestion pipeline that listens to your Cosmos DB change feed and ingests the data into your cluster. The change feed listens for new and updated documents but doesn't log deletes. For general information about data ingestion in Azure Data Explorer, see Azure Data Explorer data ingestion overview.

Each data connection listens to a specific Cosmos DB container and ingests data into a specified table. The ingestion method supports streaming ingestion (when enabled) and batch ingestion.

In this article, you'll learn how to set up a Cosmos DB change feed data connection to ingest data into Azure Data Explorer with System Managed Identity. Review the considerations before you start.

Use the following steps to set up a connector:

Step 1: Choose an Azure Data Explorer table and configure its table mapping

Step 2: Create a Cosmos DB data connection

Step 3: Test the data connection

Prerequisites

  • An Azure subscription. Create a free Azure account
  • A cluster and database
  • A container from a Cosmos DB account for NoSQL

Step 1: Choose an Azure Data Explorer table and configure its table mapping

Before you create a data connection, create a table where you'll store the ingested data and apply a mapping that matches schema in the source Cosmos DB container. If your scenario requires more than a simple mapping of fields, you can use update policies to transform and map data ingested from your change feed.

The following shows a sample schema of an item in the Cosmos DB container:

{ "id": "17313a67-362b-494f-b948-e2a8e95e237e", "creationTime": "2022-04-28T07:36:49.5434296Z", "_rid": "pL0MAJ0Plo0CAAAAAAAAAA==", "_self": "dbs/pL0MAA==/colls/pL0MAJ0Plo0=/docs/pL0MAJ0Plo0CAAAAAAAAAA==/", "_etag": "\"000037fc-0000-0700-0000-626a44110000\"", "_attachments": "attachments/", "_ts": 1651131409}

Use the following steps to create a table and apply a table mapping:

  1. In the Azure Data Explorer web UI, from the left navigation menu select Query, and then select the database where you want to create the table.

  2. Run the following command to create a table called TestTable.

    .create table TestTable(Id:string, Name:string, _ts:long, _timestamp:datetime)
  3. Run the following command to create the table mapping.

    The command maps custom properties from a Cosmos DB JSON document to columns in the TestTable table, as follows:

    Cosmos DB propertyTable columnTransformation
    idIdNone
    nameNameNone
    _ts_tsNone
    _ts_timestampUses DateTimeFromUnixSeconds to transform _ts (UNIX seconds) to _timestamp (datetime))

    Note

    We recommend using the following timestamp columns:

    • _ts: Use this column to reconcile data with Cosmos DB.
    • _timestamp: Use this column to run efficient time filters in your Kusto queries. For more information, see Query best practice.
    .create table TestTable ingestion json mapping "DocumentMapping"```[ {"column":"Id","path":"$.id"}, {"column":"Name","path":"$.name"}, {"column":"_ts","path":"$._ts"}, {"column":"_timestamp","path":"$._ts", "transform":"DateTimeFromUnixSeconds"}]```

Transform and map data with update policies

If your scenario requires more than a simple mapping of fields, you can use update policies to transform and map data ingested from your change feed.

Update policies are a way to transform data as it's ingested into your table. They're written in Kusto Query Language and are run on the ingestion pipeline. They can be used to transform data from a Cosmos DB change feed ingestion, such as in the following scenarios:

  • Your documents contain arrays that would be easier to query if they're transformed in multiple rows using the mv-expand operator.
  • You want to filter out documents. For example, you can filter out documents by type using the where operator.
  • You have complex logic that can't be represented in a table mapping.

For information on how to create and manage update policies, see Update policy overview.

Step 2: Create a Cosmos DB data connection

You can use the following methods to create the data connector:

  • Azure portal
  • ARM template
  1. In the Azure portal, go to your cluster overview page, and then select the Getting started tab.

  2. On the Data ingestion tile, select Create data connection > Cosmos DB.

    Ingest data from Azure Cosmos DB into Azure Data Explorer (Preview) (1)

  3. In the Cosmos DB Create data connection pane, fill out the form with the information in the table:

    Ingest data from Azure Cosmos DB into Azure Data Explorer (Preview) (2)

    FieldDescription
    Database nameChoose the Azure Data Explorer database into which you want to ingest data.
    Data connection nameSpecify a name for the data connection.
    SubscriptionSelect the subscription that contains your Cosmos DB NoSQL account.
    Cosmos DB accountChoose the Cosmos DB account from which you want to ingest data.
    SQL databaseChoose the Cosmos DB database from which you want to ingest data.
    SQL containerChoose the Cosmos DB container from which you want to ingest data.
    Table nameSpecify the Azure Data Explorer table name to which you want to ingest data.
    Mapping nameOptionally, specify the mapping name to use for the data connection.
  4. Optionally, under the Advanced settings section, do the following:

    1. Specify the Event retrieval start date. This is the time from which the connector will start ingesting data. If you don't specify a time, the connector will start ingesting data from the time you create the data connection. The recommended date format is the ISO 8601 UTC standard, specified as follows: yyyy-MM-ddTHH:mm:ss.fffffffZ.

    2. Select User-assigned and then select the identity. By Default, the System-assigned managed identity is used by the connection. If necessary, you can use a User-assigned identity.

      Ingest data from Azure Cosmos DB into Azure Data Explorer (Preview) (3)

  5. Select Create to crate the data connection.

Step 3: Test the data connection

  1. In the Cosmos DB container, insert the following document:

    { "name":"Cousteau"}
  2. In the Azure Data Explorer web UI, run the following query:

    TestTable

    The result set should look like the following image:

    Ingest data from Azure Cosmos DB into Azure Data Explorer (Preview) (4)

Note

Azure Data Explorer has an aggregation (batching) policy for data ingestion designed to optimize the ingestion process. The default batching policy is configured to seal a batch once one of the following conditions is true for the batch: a maximum delay time of 5 minutes, total size of one GB, or 1000 blobs. Therefore, you may experience a latency. For more information, see batching policy. To reduce latency, configure your table to support streaming. See streaming policy.

Considerations

The following considerations apply to the Cosmos DB change feed:

  • The change feed doesn't expose deletion events.

    The Cosmos DB change feed only includes new and updated documents. If you need to know about deleted documents, you can configure your feed use a soft marker to mark a Cosmos DB document as deleted. A property is added to update events that indicate whether a document has been deleted. You can then use the where operator in your queries to filter them out.

    For example, if you map the deleted property to a table column called IsDeleted, you can filter out deleted documents with the following query:

    TestTable| where not(IsDeleted)
  • The change feed only exposes the latest update of a document.

    To understand the ramification of the second consideration, examine the following scenario:

    A Cosmos DB container contains documents A and B. The changes to a property called foo are shown in the following table:

    Document IDProperty fooEventDocument timestamp (_ts)
    ARedCreation10
    BBlueCreation20
    AOrangeUpdate30
    APinkUpdate40
    BVioletUpdate50
    ACarmineUpdate50
    BNeonBlueUpdate70

    The change feed API is polled by the data connector at regular intervals, typically every few seconds. Each poll contains changes that occurred in the container between calls, but only the latest version of change per document.

    To illustrate the issue, consider a sequence of API calls with timestamps 15, 35, 55, and 75 as shown in the following table:

    API Call TimestampDocument IDProperty fooDocument timestamp (_ts)
    15ARed10
    35BBlue20
    35AOrange30
    55BViolet50
    55ACarmine60
    75BNeonBlue70

    Comparing the API results to the list of changes made in the Cosmos DB document, you'll notice that they don't match. The update event to document A, highlighted in the change table at timestamp 40, doesn't appear in the results of the API call.

    To understand why the event doesn't appear, we'll examine the changes to document A between the API calls at timestamps 35 and 55. Between these two calls, document A changed twice, as follows:

    Document IDProperty fooEventDocument timestamp (_ts)
    APinkUpdate40
    ACarmineUpdate50

    When the API call at timestamp 55 is made, the change feed API returns the latest version of the document. In this case, the latest version of document A is the update at timestamp 50, which is the update to property foo from Pink to Carmine.

    Because of this scenario, the data connector may miss some intermediate document changes. For example, some events may be missed if the data connection service is down for a few minutes, or if the frequency of document changes is higher than the API polling frequency. However, the latest state of each document is captured.

Next steps

  • Get latest versions of Azure Cosmos DB documents (Preview)
  • Kusto Query Language (KQL) overview
Top Articles
Latest Posts
Article information

Author: Domingo Moore

Last Updated: 10/13/2022

Views: 5706

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Domingo Moore

Birthday: 1997-05-20

Address: 6485 Kohler Route, Antonioton, VT 77375-0299

Phone: +3213869077934

Job: Sales Analyst

Hobby: Kayaking, Roller skating, Cabaret, Rugby, Homebrewing, Creative writing, amateur radio

Introduction: My name is Domingo Moore, I am a attractive, gorgeous, funny, jolly, spotless, nice, fantastic person who loves writing and wants to share my knowledge and understanding with you.