PostgreSQL

Connecting PostgreSQL to LightBeam


Overview

LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.

Example: PostgreSQL, Snowflake, SMB, etc.


Onboarding PostgreSQL Data Source

  1. Login to your LightBeam Instance.

  2. Click on DATASOURCES on the Top Navigation Bar.

  3. Click on “Add a data source”.

Figure 1. Add Data Source

4. Click on PostgreSQL.

Figure 2. Selection of PostgreSQL Data Source

Figure 2.1 PostgreSQL Data Source

5. Choose Database Scanning Type

  • A pop-up window appears, prompting you to select one of the following (See Figure 3):

    A. Live Database Scanning – Use this option to connect to a real-time PostgreSQL instance.

    B. Snapshot Scanning – Choose this option to onboard a static snapshot of your database. Snapshot Scanning is a broad method that can be used for offline backups.

    CSV Scanning Note: If you are onboarding a CSV file stored in cloud storage (often referred to as “Postgres Offline”), LightBeam treats the CSV file as an offline snapshot. For detailed CSV onboarding steps, please refer to the CSV Files as a Datasource document.

  1. Click Proceed to continue.

Figure 3: PostgreSQL Datasource - Selecting Database Scanning Type

A. Configure Live Database Scanning

If you If you selected Live Database Scanning, the next screen will ask for:

  1. Basic Details

    • Instance Name – A unique identifier for this PostgreSQL connection.

    • Primary Owner – The email of the user responsible for this data source.

    • Description (Optional) – A brief description of the data source.

    • Mark as Source of Truth (Optional) – Toggle this setting if the database serves as a definitive reference for entity resolution.

  2. Connection Details:

    • Username & Password – Credentials for database authentication.

    • Host – The FQDN or IP address of your PostgreSQL server.

    • Port – Default is 5432, or specify a custom port.

  3. Authentication Mechanisms

    You can choose one of the following:

    1. Basic Authentication (Username/Password)

    2. AWS Access Keys

    3. AWS IAM Role (newly added)

    Option 1: Basic Authentication

    • Best for: Standard PostgreSQL instances that use direct credential-based authentication.

    • Required Inputs:

      • Username

      • Password

      • Host (FQDN or IP Address)

      • Port (Default: 5432 or Custom Port)

    • Optional Security Features: Upload SSL Certificate, SSL Key, and SSL CA Certificate for encrypted communication.

    Figure 4: Basic Details for Live Database Scanning - Basic Authentication

Option 2: AWS Access Keys

  • Best for: Connecting to PostgreSQL instances hosted on AWS RDS using programmatic credentials.

  • Required Inputs:

    • AWS Access Key ID

    • AWS Secret Key

    • Host (FQDN or IP Address)

    • Port (Default: 5432 or Custom Port)

    • Region Selection (e.g., US East - us-east-1)

  • Optional Security Features: Upload SSL Certificate, SSL Key, and SSL CA Certificate for enhanced security.

Figure 4.1 :Basic Details for Live Database Scanning - AWS Access Keys

Option 3: AWS IAM Role

  • Best for: Secure authentication within AWS environments where IAM roles are preferred over static credentials.

  • Required Inputs:

    • Host (FQDN or IP Address)

    • Port (Default: 5432 or Custom Port)

    • Region Selection (e.g., US East - us-east-1)

  • Optional Security Features: Upload SSL Certificate, SSL Key, and SSL CA Certificate to enable encrypted connections.

Figure 4.2: Basic Details for Live Database Scanning - AWS IAM Role

4. Additional Details (Optional)

In this section, you can specify metadata attributes related to the PostgreSQL data source:

  • Location: Select the geographic region where the database is hosted.

  • Purpose: Define the purpose of data collection (e.g., Analytics, Compliance, Security).

  • Stage: Indicate the stage of data processing (e.g.,Collection, Processing, Storage).

B. Configure Snapshot (Offline) Scanning

When Snapshot Scanning is chosen, you can onboard:

  1. Offline Database Backups or

  2. CSV Files stored in a connected cloud storage service.

  1. Basic Details:

  • Instance Name – A unique identifier for the database connection.

  • Primary Owner – The email address of the responsible user.

  • Description (Optional) – A short description of the data source.

  • Mark as Source of Truth (Optional) – Toggle this setting if the database serves as a definitive reference for entity resolution.

  1. Connection Details:

  • Select Data Source Where the File is Present – Choose from connected Google Drive, OneDrive, or SharePoint.

  • Select a Drive – Drive owner’s email (for example, [email protected]).

  • Folder Link or Folder Name

    • For Google Drive: paste the folder link (e.g., https://drive.google.com/drive/folders/<some_id>).

    • For OneDrive or SharePoint: enter the folder path (e.g., folder1, or folder1/nested for subfolders).

  1. Click Test Connection.

  2. If the connection is successful, a Connection Success! message appears.

  3. Click Next and select the list of databases (or CSV snapshots) to scan.

  4. Click Start Sampling.

The PostgreSQL data source is now ready for scanning.

Figure 6. List of Databases


APPENDIX

Troubleshooting

If you don’t see any data being scanned without error, it might be a permission issue. Consider running a SELECT * query on a table and see if you are able to see the data. If you see a message of permission denied, consider granting permission to the user.

Minimal permissions setup

We require the following permissions to scan only a subset of the databases for the instance:

  • CONNECT permissions

  • For each database - CONNECT and SELECT permissions

Use the following script to create a user with such permissions. In this example, we are creating a user with the permissions to connect to the LightBeam database.

User with restricted permissions for a single database

-- CREATE USER test1 WITH PASSWORD 'lbadmin12345';

-- GRANT SELECT ON ALL TABLES IN SCHEMA public TO test1;

-- GRANT CONNECT ON DATABASE lightbeam to test1;

-- GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO test1;

Use the user you just created to register PostgreSQL datasource.

Full permissions setup

If you want to, you can scan all the databases and allow wider scope permissions. LightBeam recommends a full read-only user that can access a list of databases, connect to every database, and read data.

Validate permissions to the database

Next, the user needs to validate these permissions to the database. This ensures authorized access to the database by the credentials provided by the user. After validating the permissions to the database, the user can configure LightBeam Spectra on the system.

Prerequisite

The following tools need to be installed on the system in order to verify database permissions:

  • Git

  • PSQL tool

Steps

  1. Go into sql_user_check_postgres directory

  2. Please refer to the README.md file in the directory for detailed instructions.


About LightBeam

LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.

For any questions or suggestions, please get in touch with us at: [email protected].

Last updated