CosmosDB (MongoDB)
Connecting CosmosDB (MongoDB) to LightBeam
Overview
LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.
Example: MongoDB, DynamoDB, Redshift, etc.
About CosmosDB (MongoDB)
Cosmos for MongoDB is a NoSQL database which can host data in structured as well as semi-structured form. Users can define tables with or without schema. We are supporting it as a structured datasource in lightbeam.
Features
Datasource Registration
MongoDB admin can create a user with restricted user permissions and use that restricted user's username
and password
for registration. Then they can use hostname
, username
, password
, database name
to connect to and optional port number
for datasource registration. Additionally, during registration, users can specify which databases they wish to scan. LightBeam will subsequently scan all collections (referred to as tables in traditional databases) within these selected databases.
Metadata Scanning
We scan the tables (collections) present in databases configured in scan conditions. For For each table, we treat each document as a row. All first-level fields in documents are treated as columns. Furthermore, any first-level field that is a nested object or an array is categorized as a Blob.
PII Detection
We fetch sample data for each table and classify first-level fields in the documents. A field or column may be classified into a single attribute or multi-attribute if it’s a nested field containing data of multiple different types of PII data.
Full Blob Scan
Given that our PII detection process initially involves scanning only a select sample of documents, we offer an additional comprehensive scanning option specifically for blob columns. This full scan is designed to identify all possible attribute types within these columns. Users can designate specific blob columns for this thorough scan. Once marked, we conduct these full scans periodically in the background, typically every 15 days. Since the full scan is a resource-intensive process, as it involves scanning all documents for a table, this service is not activated by default but is available upon user configuration.
Onboarding CosmosDB (MongoDB) Data Source
Login to your LightBeam Instance.
Click on DATASOURCES on the Top Navigation Bar.
Click on “Add a data source”.
Search for MongoDB.
Click on MongoDB under Azure CosmosDB section.
Configure Basic Details
In the Basic Details section, enter the following information:
Instance Name: Provide a unique name for the Cosmos DB MongoDB data source (e.g.,
lb-cosmos-mongo-datasource
).Primary Owner: Enter the email address of the individual responsible for this data source (e.g.,
demo@lightbeam.ai
).Source of Truth (Optional): Toggle this option on if this database serves as a single source of truth for entity validation.
Description (Optional): Add a brief description of the database (e.g., "Cosmos MongoDB Datasource Instance").
Enter Connection Details
Provide the following details in the Connection section:
Username: The username for database authentication.
Password: The corresponding password for the username.
Host: The Cosmos DB MongoDB server hostname (e.g.,
your-cosmosdb-name.mongo.cosmos.azure.com
).Port (Optional): The MongoDB connection port (default 27017).
Database (Optional): The specific MongoDB database name within the Cosmos DB instance.
Click Test Connection to validate the credentials.
Additional Details (Optional)
In this section, you can specify metadata attributes related to the data source:
Location: The location of the data source.
Purpose: The purpose of the data being collected/processed.
Stage: The stage of the data source. Example: Source, Processing, Archival, etc.
Verify that you get the message Connection Success! on the screen. Click on Next.
In the next step, you'll see a list of databases presented from your CosmosDB (MongoDB) datasource.
Displayed Databases: By default, all databases to which you have access permissions will be shown.
Custom Selection: If you wish not to scan certain databases, simply deselect them from the list.
Ensure you've made your desired selections before connecting the datasource.
Click on Start Sampling.
APPENDIX
Use the following details for the creation of a CosmosDB (MongoDB) user with a minimal set of permissions, suitable for getting it scanned with LightBeam. Admin users can create a user with specified permissions and use its credentials to connect Cosmos(MongoDB) to LightBeam, as detailed in the following instructions.
Minimal permissions setup
From Azure CosmosDB (MongoDB) console, Go to Settings → Features and make sure Role Based Access Control (RBAC) is turned on.
Make sure Azure CLI (az) is installed. Connect to your subscription.
We will need a user with read only access to databases that are to be scanned.
read build-in role can be used. It has privileges which are sufficient for us to scan the database. Support a user with username
test1
needs to be created with passwordAdmin123
for a databasedb1
Execute the following command:
If there are other databases, similar command needs to be repeated for them too.
Validate permissions to the database
Next, the user needs to validate these permissions to the database. This ensures authorized access to the database by the credentials provided by the user. After validating the permissions to the database, the user can configure LightBeam Spectra on the system.
Steps
Go into
sql_user_check_mongodb
directoryPlease refer to the
README.md
file in the directory for detailed instructions.
About LightBeam
LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.
Last updated