How to read data from Azure's CosmosDB in python

11,235

Solution 1

According to your error information, it seems to be caused by the authentication failed with your key as the offical explaination said below from here.

enter image description here

So please check your key, but I think the key point is using pydocumentdb incorrectly. These id of Database, Collection & Document are different from their links. These APIs ReadCollection, QueryDocuments need to be pass related link. You need to retrieve all resource in Azure CosmosDB via resource link, not resource id.

According to your description, I think you want to list all documents under the collection id path /dbs/test1/colls/test1. As reference, here is my sample code as below.

from pydocumentdb import document_client

uri = 'https://ronyazrak.documents.azure.com:443/'
key = '<your-primary-key>'

client = document_client.DocumentClient(uri, {'masterKey': key})

db_id = 'test1'
db_query = "select * from r where r.id = '{0}'".format(db_id)
db = list(client.QueryDatabases(db_query))[0]
db_link = db['_self']

coll_id = 'test1'
coll_query = "select * from r where r.id = '{0}'".format(coll_id)
coll = list(client.QueryCollections(db_link, coll_query))[0]
coll_link = coll['_self']

docs = client.ReadDocuments(coll_link)
print list(docs)

Please see the details of DocumentDB Python SDK from here.

Solution 2

For those using azure-cosmos, the current library (2019) I opened a doc bug and provided a sample in GitHub

Sample

from azure.cosmos import cosmos_client
import json

CONFIG = {
    "ENDPOINT": "ENDPOINT_FROM_YOUR_COSMOS_ACCOUNT",
    "PRIMARYKEY": "KEY_FROM_YOUR_COSMOS_ACCOUNT",
    "DATABASE": "DATABASE_ID",  # Prolly looks more like a name to you
    "CONTAINER": "YOUR_CONTAINER_ID"  # Prolly looks more like a name to you
}

CONTAINER_LINK = f"dbs/{CONFIG['DATABASE']}/colls/{CONFIG['CONTAINER']}"
FEEDOPTIONS = {}
FEEDOPTIONS["enableCrossPartitionQuery"] = True
# There is also a partitionKey Feed Option, but I was unable to figure out how to us it.

QUERY = {
    "query": f"SELECT * from c"
}

# Initialize the Cosmos client
client = cosmos_client.CosmosClient(
    url_connection=CONFIG["ENDPOINT"], auth={"masterKey": CONFIG["PRIMARYKEY"]}
)

# Query for some data
results = client.QueryItems(CONTAINER_LINK, QUERY, FEEDOPTIONS)

# Look at your data
print(list(results))

# You can also use the list as JSON
json.dumps(list(results), indent=4)
Share:
11,235
Rony Azrak
Author by

Rony Azrak

Updated on June 05, 2022

Comments

  • Rony Azrak
    Rony Azrak about 2 years

    I have a trial account with Azure and have uploaded some JSON files into CosmosDB. I am creating a python program to review the data but I am having trouble doing so. This is the code I have so far:

    import pydocumentdb.documents as documents
    import pydocumentdb.document_client as document_client
    import pydocumentdb.errors as errors
    
    url = 'https://ronyazrak.documents.azure.com:443/'
    key = '' # primary key
    
    # Initialize the Python DocumentDB client
    client = document_client.DocumentClient(url, {'masterKey': key})
    
    collection_link = '/dbs/test1/colls/test1'
    
    collection = client.ReadCollection(collection_link)
    
    result_iterable = client.QueryDocuments(collection)
    
    query = { 'query': 'SELECT * FROM server s' }
    

    I read somewhere that the key would be my primary key that I can find in my Azure account Keys. I have filled the key string with my primary key shown in the image but key here is empty just for privacy purposes.

    I also read somewhere that the collection_link should be '/dbs/test1/colls/test1' if my data is in collection 'test1' Collections.

    My code gets an error at the function client.ReadCollection().

    That's the error I have "pydocumentdb.errors.HTTPFailure: Status code: 401 {"code":"Unauthorized","message":"The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'get\ncolls\ndbs/test1/colls/test1\nmon, 29 may 2017 19:47:28 gmt\n\n'\r\nActivityId: 03e13e74-8db4-4661-837a-f8d81a2804cc"}"

    Once this error is fixed, what is there left to do? I want to get the JSON files as a big dictionary so that I can review the data.

    Am I in the right path? Am I approaching this the wrong way? How can I read documents that are in my database? Thanks.

  • Rony Azrak
    Rony Azrak about 7 years
    Thank you! Worked perfectly.
  • Patrick
    Patrick over 4 years
    FYI ... this syntax won't work soon... 🚨🚨🚨 The pydocumentdb package for versions 1.x and 2.x of the Azure Cosmos DB Python SDK for SQL API will be retired August 20, 2020. See the release and retirement documentation for more information. Please use the latest version of the Python SDK with new package name, azure-cosmos.🚨🚨🚨