PostgreSQL
Connecting PostgreSQL to LightBeam
Last updated
Connecting PostgreSQL to LightBeam
Last updated
LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.
Example: PostgreSQL, Snowflake, SMB, etc.
Login to your LightBeam Instance.
Click on DATASOURCES on the Top Navigation Bar.
Click on “Add a data source”.
4. Click on PostgreSQL.
A pop-up window appears, prompting you to select one of the following (See Figure 3):
A. Live Database Scanning – Use this option to connect to a real-time PostgreSQL instance.
B. Snapshot Scanning – Choose this option to onboard a static snapshot of your database. Snapshot Scanning is a broad method that can be used for offline backups.
CSV Scanning Note: If you are onboarding a CSV file stored in cloud storage (often referred to as “Postgres Offline”), LightBeam treats the CSV file as an offline snapshot. For detailed CSV onboarding steps, please refer to the CSV Files as a Datasource document.
Click Proceed to continue.
If you If you selected Live Database Scanning, the next screen will ask for:
Basic Details
Instance Name – A unique identifier for this PostgreSQL connection.
Primary Owner – The email of the user responsible for this data source.
Description (Optional) – A brief description of the data source.
Mark as Source of Truth (Optional) – Toggle this setting if the database serves as a definitive reference for entity resolution.
Connection Details:
Username & Password – Credentials for database authentication.
Host – The FQDN or IP address of your PostgreSQL server.
Port – Default is 5432, or specify a custom port.
Authentication Mechanisms
You can choose one of the following:
Basic Authentication (Username/Password)
AWS Access Keys
AWS IAM Role (newly added)
Option 1: Basic Authentication
Best for: Standard PostgreSQL instances that use direct credential-based authentication.
Required Inputs:
Username
Password
Host (FQDN or IP Address)
Port (Default: 5432 or Custom Port)
Optional Security Features: Upload SSL Certificate, SSL Key, and SSL CA Certificate for encrypted communication.
Option 2: AWS Access Keys
Best for: Connecting to PostgreSQL instances hosted on AWS RDS using programmatic credentials.
Required Inputs:
AWS Access Key ID
AWS Secret Key
Host (FQDN or IP Address)
Port (Default: 5432 or Custom Port)
Region Selection (e.g., US East - us-east-1)
Optional Security Features: Upload SSL Certificate, SSL Key, and SSL CA Certificate for enhanced security.
Option 3: AWS IAM Role
Best for: Secure authentication within AWS environments where IAM roles are preferred over static credentials.
Required Inputs:
Host (FQDN or IP Address)
Port (Default: 5432 or Custom Port)
Region Selection (e.g., US East - us-east-1)
Optional Security Features: Upload SSL Certificate, SSL Key, and SSL CA Certificate to enable encrypted connections.
4. Additional Details (Optional)
In this section, you can specify metadata attributes related to the PostgreSQL data source:
Location: Select the geographic region where the database is hosted.
Purpose: Define the purpose of data collection (e.g., Analytics, Compliance, Security).
Stage: Indicate the stage of data processing (e.g.,Collection, Processing, Storage).
When Snapshot Scanning is chosen, you can onboard:
Offline Database Backups or
CSV Files stored in a connected cloud storage service.
Basic Details:
Instance Name – A unique identifier for the database connection.
Primary Owner – The email address of the responsible user.
Description (Optional) – A short description of the data source.
Mark as Source of Truth (Optional) – Toggle this setting if the database serves as a definitive reference for entity resolution.
Connection Details:
Select Data Source Where the File is Present – Choose from connected Google Drive, OneDrive, or SharePoint.
Select a Drive – Drive owner’s email (for example, owner@mycompany.com
).
Folder Link or Folder Name
For Google Drive: paste the folder link (e.g., https://drive.google.com/drive/folders/<some_id>
).
For OneDrive or SharePoint: enter the folder path (e.g., folder1
, or folder1/nested
for subfolders).
Click Test Connection.
If the connection is successful, a Connection Success! message appears.
Click Next and select the list of databases (or CSV snapshots) to scan.
Click Start Sampling.
The PostgreSQL data source is now ready for scanning.
If you don’t see any data being scanned without error, it might be a permission issue. Consider running a SELECT *
query on a table and see if you are able to see the data. If you see a message of permission denied, consider granting permission to the user.
We require the following permissions to scan only a subset of the databases for the instance:
CONNECT
permissions
For each database - CONNECT
and SELECT
permissions
Use the following script to create a user with such permissions. In this example, we are creating a user with the permissions to connect to the LightBeam database.
-- CREATE USER test1 WITH PASSWORD 'lbadmin12345';
-- GRANT SELECT ON ALL TABLES IN SCHEMA public TO test1;
-- GRANT CONNECT ON DATABASE lightbeam to test1;
-- GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO test1;
Use the user you just created to register PostgreSQL datasource.
If you want to, you can scan all the databases and allow wider scope permissions. LightBeam recommends a full read-only user that can access a list of databases, connect to every database, and read data.
Next, the user needs to validate these permissions to the database. This ensures authorized access to the database by the credentials provided by the user. After validating the permissions to the database, the user can configure LightBeam Spectra on the system.
The following tools need to be installed on the system in order to verify database permissions:
Git
PSQL tool
Go into sql_user_check_postgres
directory
Please refer to the README.md
file in the directory for detailed instructions.
LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.
First, clone the repository
For any questions or suggestions, please get in touch with us at: .