CosmosDB (NoSQL)
Connecting CosmosDB (NoSQL) to LightBeam
Overview
LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.
Example: NoSql, MongoDB, DynamoDB, etc.
About Azure Cosmos DB NoSQL
Azure Cosmos DB for NoSQL is a scalable and flexible database service by Microsoft Azure, supporting both structured and semi-structured data. Data is organized into databases with containers acting as tables and items as rows. Each container functions like a table, and the items within these containers represent the rows of data. We are integrating Cosmos DB as a structured data source in LightBeam.
Features
Datasource Registration
Administrators can create a service principal with restricted read-only permissions and use its client ID, client secret, tenant ID, and the Cosmos DB endpoint for registration. During registration, users can select the databases they wish to scan, and LightBeam will scan all tables (containers) within those databases.
Metadata Scanning
We scan the tables (containers) in the databases specified in the scan conditions. Each container's items are treated as rows, with first-level fields in the items considered as columns. Any first-level field that is a nested object or an array is classified as a Blob.
PII Detection
We fetch sample data from each container and classify the first-level fields in the items. A field or column may be classified into a single attribute or multiple attributes if it is a nested field containing various types of PII data.
Full Blob Scan
Given that PII detection scans only a sample of documents, we offer an option for a full scan of blob columns to identify all possible attribute types. Users can mark blob columns for full scans, which are performed periodically (every 15 days). Since a full scan is resource-intensive, involving the examination of all documents in a table, it is not done by default and is only performed when configured by the user.
Limitations
Creating table clusters is not supported in Cosmos DB for NoSQL because it does not support joins between tables. In traditional relational databases, joins are used to combine rows from two or more tables based on a related column. However, Cosmos DB, being a NoSQL database, operates differently and does not have this capability. As a result, users cannot create complex queries that involve multiple tables, limiting the ability to form table clusters. This design choice prioritizes performance, scalability, and flexibility, typical of NoSQL databases, but it means that certain relational database features are not available.
Onboarding Azure Cosmos DB NoSQL Data Source
Log in to your LightBeam instance.
Click on Datasources on the Top Navigation Bar.
Click on “Add Data Source”.
Search for NoSql.
Click on NoSql under Azure CosmosDB section.
Configure Basic Details
In the Basic Details section, enter the following information:
Instance Name: Provide a unique name for the Cosmos DB NoSQL data source (e.g.,
cosmos-nosql-datasource
).Primary Owner: Enter the email address of the individual responsible for this data source (e.g.,
demo@lightbeam.ai
).Source of Truth (Optional): Toggle this option on if this database serves as a single source of truth for entity validation.
Description (Optional): Add a brief description of the database (e.g., "Cosmos NoSQL Datasource Instance").
Enter Connection Details
Provide the following details in the Connection section:
Endpoint: The Cosmos DB NoSQL account endpoint URL (e.g.,
https://your-cosmos-account.documents.azure.com:443/
).Tenant ID: The Azure Active Directory (AAD) Tenant ID associated with the Cosmos DB instance.
Client ID: The Application (Client) ID for the Azure AD app with access to Cosmos DB.
Client Secret: The authentication key (client secret) generated in Azure AD for this application.
Click Test Connection to validate the credentials.
Additional Details (Optional)
In this section, you can specify metadata attributes related to the data source:
Location: The location of the data source.
Purpose: The purpose of the data being collected/processed.
Stage: The stage of the data source. Example: Source, Processing, Archival, etc.
Verify that you get the message Connection Success! on the screen. Click on Next.
In the next step, you'll see a list of databases presented from your CosmosDB (NoSQL) datasource.
Displayed Databases: By default, all databases to which you have access permissions will be shown.
Custom Selection: If you wish not to scan certain databases, simply deselect them from the list.
Ensure you've made your desired selections before connecting the datasource.
Click on Start Sampling.
APPENDIX
Use the following details for the creation of a Cosmos NoSql user with a minimal set of permissions, suitable for getting it scanned with LightBeam. Admin users can create a user with specified permissions and use its credentials to connect Cosmos NoSql to LightBeam, as detailed in the following instructions.
Create a new service principal.
Note: If an existing service principal is there and you want to use that, then this step can be skipped
Go To Enterprise Applications (Search from Global Search Bar). Click on New Application.
Select Create your own application.
Enter the name of your application.
Now we need to generate a secret for this service principal.
Go To App registration (Search from Global Search Bar) and generate a new client Secret for this service principal.
Now we have a service principal with clientID (ApplicationID) and clientSecret. Copy tenantID also.
Now we need to create a role for accessing Cosmos NoSQL databases and assign it to the above service principal.
Create a New Role in CosmosDB (NoSQL)
1. Login to Azure using following command
2. Obtain the account name and resource group for the CosmosDB account. In the screenshot below kaif-cosmos-nosql
is the account name and kaif-group
is the resource group.
Creating a new role with permissions to read all databases
Create a new role using following command. Replace account_name and resource_group placeholders with values from step 2.
Note: In AssignableScopes, we have put “/” which will give permission for all cosmos databases.
The command will return following response.
The name field returned in the response above is the role ID.
3. Assign the created role to the service principal.
Note: principal-id will be ObjectID of the service principal we created earlier. roleID is the ID of the role created in step 2
After this datasource is ready to be onboarded with clientID, clientSecret, tenantID and Cosmos NoSQL endpoint.
Creating a role with read permissions to specific databases
We will create a new role with readMetadata permission for all dbs, i.e assignable scope as “/”. This helps in listing the databases.
Let's create another role to access the metadata inside the databases. The example below shows how to create a role to scan two databases named db-name-1
and db-name-2
Two commands from above will return role IDs. We will assign the two roles to the service principal
Assign the second role to service principal.
Validate permissions to the database
Next, the user needs to validate these permissions to the database. This ensures authorized access to the database by the credentials provided by the user. After validating the permissions to the database, the user can configure LightBeam Spectra on the system.
Steps
Go into
sql_user_check_cosmos_nosql
directoryPlease refer to the
README.md
file in the directory for detailed instructions.
About LightBeam
LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.
Last updated