LightBeam Documentation
Installer GuidesData SourcesPlaybooksInsightsPrivacyOpsGovernance
  • 💡What is LightBeam?
  • 🚀Getting Started
    • ⚙️Installer Guides
      • Pre-Requisites / Security Configurations
        • Firewall Requirements
        • Securing LightBeam on EKS with AWS Certificate Manager on Elastic Load Balancer
        • Configure HTTPS for LightBeam Endpoint FQDN Standalone deployment
        • Using Custom Certificates with LightBeam
        • Securing LightBeam on GKE with Google Certificate Manager and GCE Ingress
      • Core
        • LightBeam Deployment Instructions
        • LightBeam Installer
        • Web App Deployment
        • LightBeam Diagnostics
        • LightBeam Cluster Backup & Restore using Velero
      • Platform Specific
        • AWS
        • Microsoft Azure
        • Google Cloud (GKE)
        • Standalone Virtual Machine
        • Deployment on an Existing Managed Kubernetes Cluster
        • Azure Marketplace Deployment
      • Integration and Setup
        • Setting Up AWS PrivateLink for RDS-EKS Interaction
        • Twingate and LightBeam Integration Guide
        • Data Subject Request Web Application Server
        • Generate CSR for LightBeam
  • 🧠Core Features
    • 🔦Spectra AI
      • 🔗Data Sources
        • Cloud Platforms
          • AWS Auto Discovery
          • GCP Auto Discovery
        • Databases and Datalakes
          • PostgreSQL
          • Aurora (PostgreSQL)
          • Snowflake
          • MS SQL
          • MySQL
          • Aurora (MySQL)
          • BigQuery
          • AWS Redshift
          • Oracle
          • DynamoDB
          • MongoDB
          • CosmosDB (PostgreSQL)
          • CosmosDB (MongoDB)
          • CosmosDB (NoSQL)
          • Looker
          • AWS Glue
          • Databricks
          • SAP HANA
          • CSV Files as a Datasource
        • Messaging
          • Gmail
          • Slack
          • MS Teams
          • MS Outlook
        • Developer Tools
          • Zendesk
          • ServiceNow
          • Jira
          • GitHub
          • Confluence
        • File Repositories
          • NetDocuments
          • AWS S3
          • Azure Blob
          • Google Drive
          • OneDrive
          • SharePoint
          • Viva Engage
          • Dropbox
          • Box
          • SMB
        • CRM
          • Hubspot
          • Salesforce
          • Automated Data Processing (ADP)
          • Marketo
          • Iterable
          • MS Dynamics 365 Sales
          • Salesforce Marketing Cloud
      • 🔔PlayBooks
        • What is LightBeam Playbooks?
        • Policy and Alerts
          • Types of Policies
          • How to create a rule set
            • File Extension Filter
          • Configuring Retention Policies
          • Viewing Alerts
          • Sub Alerts
            • Reassigning Sub-Alerts
            • Sub-alert States
          • Levels of Actions on Alerts
          • User Roles and Permissions
            • Admin View
            • Alert Owner View
            • Onboarding New Users
              • User Management
              • Okta Integration
              • Alert Assignment Settings
              • Email Notifications
            • Planned Enhancements
          • Audit Logs
          • No Scan List
          • Permit List
          • Policy in read-only mode
      • 📊Insights
        • Entity Workflow
        • Document Classification
        • Attribute Management Overview
          • Attributes Page View
          • Attribute Sets
          • Creating Custom Attribute
          • Attributes List
        • Template Builder
        • Label Management
          • MIP Integration
          • Google Labels Integration
      • 🗃️Reporting
        • Delta Reporting
        • Executive Report
        • LightBeam Lens
      • Scanning and Redaction of Files
        • On-demand scanning
      • How-to Guides
        • Leveraging LightBeam insights for structured data sources
      • LightBeam Dashboard Outlay
      • Risk Score
    • 🏛️PrivacyOps
      • Data Subject Request (DSR)
        • What is DSR?
        • Accessing the DSR Module
        • DSR Form Builder (DPO View)
          • Creating a New DSR Form
            • Using a Predefined Template
            • Creating a Custom Form
          • Form Configuration
          • Form Preview and Publishing
          • Multi-Form Management
          • Messaging Templates
        • Form Submission & Email Verification (Data Subject View)
        • DSR Management Dashboard (DPO View)
        • Processing DSR Requests
          • Data Protection Officer (DPO) Workflow
          • Self Service Workflow (Direct Validation)
          • Data Source Owner (DSO) Workflow
        • DSR Report
      • 🚧Consent Management
        • Overview
        • Consent Logs
        • Preference Centre
        • Settings
      • 🍪Cookie Consent
        • Dashboard
        • Banners
        • Domains
        • Settings
        • CMP Deployment Guide for Google Tag Manager
        • FAQs
      • 🔏Privacy Impact Assessment (PIA)
        • PIA Templates
        • PIA Assessment Workflow
        • Collaborator View
        • Process Owner Login View (With Collaborator)
        • Filling questionnaire without collaborator
        • Submitting the assessment for DPO review
        • DPO review process
        • Marking the assessment as reviewed
        • Editing and resubmitting assessments after DPO review
        • Revoke review request
        • Edit Reviewer
        • PIA Reports
      • ⏺️Records of Processing Activity (RoPA)
        • Creating a RoPA Template
          • How to clone a template
          • How to use a template
        • How to create a process
          • Adding Process Details
          • Adding Data Elements
          • Adding Data Subjects
          • Adding Data Retention
          • Adding Safeguards
          • Adding Transfers
          • Adding a Custom Section
          • Setting a Review Schedule
          • Data Flow Diagram
        • How to add a collaborator
        • Overview Section
        • Generating a RoPA Report Using LightBeam
        • Collaborator working on a ticket
    • 🛡️Governance
      • Access
        • Dashboard
        • Users
        • Groups
        • Objects
        • Active Directory Settings
        • Access Governance at a Data Source Level
        • Policies and Alerting
        • Access Governance Statistics
        • Governance Module Dashboard
      • Privacy At Partners
  • 📊Tools & Resources
    • 🔀API Documentation
      • API to Create Reports for Structured Datasource
    • ❓Onboarding Assessments
      • Structured Datasource Onboarding Questionnaire
        • MongoDB/CosmosDB Questionnaire
        • Oracle Datasource Questionnaire
      • SMB Questionnaire
    • 🛠️Administration
      • Audit Logs
      • SMTP
        • Basic and oAuth Configuration
      • User Management
        • SAML Identity Providers
          • Okta
            • LightBeam Okta SAML Configuration Guide
          • Azure
            • Azure AD SAML Configuration for LightBeam
          • Google
            • Google IDP
        • Local User Management
          • Adding a User to the LightBeam Dashboard
          • Reset Default Admin Password
  • 📚Support & Reference
    • 📅Release Notes
      • LightBeam v2.2.0
      • Reporting Release Notes
      • Q1 2024 Key Enhancements
      • Q2 2024 Key Enhancements
      • Q3 2024 Key Enhancements
      • Q4 2024 Key Enhancements
    • 📖Glossary
Powered by GitBook
On this page
  • Overview
  • Onboarding Snowflake Data Source
  • Viewing Details of Skipped Databases
  • APPENDIX
  • Troubleshooting
  • Validate permissions to the database
  • About LightBeam
  1. Core Features
  2. Spectra AI
  3. Data Sources
  4. Databases and Datalakes

Snowflake

Connecting Snowflake to LightBeam

PreviousAurora (PostgreSQL)NextMS SQL

Last updated 3 months ago


Overview

LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.

Examples: Snowflake, SMB, MySQL, PostgreSQL, etc.


Onboarding Snowflake Data Source

  1. Login to your LightBeam Instance.

  2. Click on DATASOURCES on the Top Navigation Bar.

  3. Click on “Add a data source”.

Figure 1. Add Data Source
  1. Search for “Snowflake”.

3. Fill in the details as shown below and click Next:

Basic Information

  1. Instance Name: This is the unique name given to the data source.

  2. Description: This is an optional field needed to describe the use of this data source.

  3. Primary owner: Email address of the person responsible for this data source which will get alerts by default.

  4. Source of Truth: LightBeam Spectra would have monitored data sources that contain data acting as a single point of truth and that can be used for looking up entities/attributes that help to identify if the other attributes/entities found in any other data source are accurate or not. A Source of Truth data set would create entities based on the attributes found in the data.

Connection Details

  1. Provide the following details in the Connection section:

    • Username: The Snowflake account username (e.g., admin).

    • Password: The password associated with the username.

    • Account Name: Enter the account name in the format <account_locator>.<region>.<cloud>. For example: rs31112.europe-west4.gcp.

      • For AWS us-west-2, use only the <account_locator>.

      • Alternatively, <org_name>-<account_name> can also be used.

    • Role: The role assigned to the user for accessing the Snowflake instance (e.g., lightbeam_users).

    • Warehouse: Specify the warehouse (e.g., compute_wh).

The account name should be in this format account_locator.cloud_region_id.cloud. For example: rs31112.europe-west4.gcp. These parameters can be obtained from Admin -> Accounts page. For AWS us-west-2 region, use only account locator without any region-id or cloud. Alternatively <org_name>-<account_name> can also be used.

5. Click Test Connection to validate the credentials. If successful, a Test Connection Success message will appear.

Click Next to continue.

6. In this step, you can choose either of two scan setting options –

i) Show all databases to select

ii) Select specific database(s) that you have permission for

i) To show all databases, select the first scan setting. This will show a list of all the Snowflake databases.

ii) To select specific databases you have permission for, select the second scan setting.

Click on Add database name.

Type the name of the database you would like to scan in the Search box and choose the correct option from the drop-down list.

7. After completing step 6, check the tickboxes next to the databases you would like to add.

Now we are ready to connect to the test database and proceed.

Click on Start Sampling.

This will show you the following message:

Click on Proceed with Sampling.

Now you can browse the updated datasource.


Viewing Details of Skipped Databases

During the scanning process, some databases may be skipped if required permissions or configurations are missing. LightBeam provides a clear way to identify and address these skipped databases.

In the Datasources section, skipped databases are highlighted in the Overview panel:

  • A yellow notification banner indicates the number of skipped databases (e.g., "12 Skipped") and states the reason, for example: "necessary permissions haven’t been configured."

Resolving Scan Issues for Skipped Databases

  1. Click View Details on the yellow banner.

  2. A modal window will appear, listing the names of the skipped databases (e.g., sandbox, automation_stuff, test2).

  3. The modal provides a description explaining that these databases were skipped due to missing permissions or errors.

  4. Once permissions are configured, LightBeam will automatically include these databases in the next scan cycle.


APPENDIX

Troubleshooting

If you don’t see any data being scanned without error, it might be a permission issue. Consider running a SELECT * query on a table and see if you are able to see the data. If you see a message of permission denied, consider granting permission to the user.

Whitelisting IP address

By default, Snowflake allows users to connect to the service from any computer or device. If there is an active policy that allows access only from certain networks, add the public IP of all the nodes where the LightBeam cluster is running. 1. Goto admin -> security from snowflake UI. Create a new network rule containing the Public IP address of all LightBeam nodes.

2. Attach this network rule to the active network policy as Allowed.

Connecting through Private Links

A private link is a feature for securing connectivity between your clients and the Snowflake without traversing the public Internet.

Following these links if you want to setup that:-

Setting up a new user in Snowflake

We need to create a new user containing all permissions required by LightBeam to scan the datasource.

User: A user in Snowflake is an account in the system, generally associated with an individual person. Users can log into Snowflake, issue SQL commands, manage data, and perform other operations. A user is associated with specific properties, such as login name, password, and default role.

Role: A role in Snowflake, on the other hand, is a named set of access privileges that can be granted to users or other roles. These privileges determine what actions a user can perform and on which database objects.

A user can be assigned multiple roles and can switch between them during a session to access different sets of privileges as needed.

In essence, a user is who logs into the system, and a role determines what that user can do once they are logged in. This distinction allows Snowflake to provide flexible and granular control over access to its resources.

The following SQL snippet can be used for creating a role, a user, and assigning permissions to a single database.

-- Create a new role containing all permissions. Replace ROLE placeholder with a role name of your choice
CREATE ROLE <ROLE>

-- Grant access to a warehouse. Replace WAREHOUSE placeholder with an existing warehouse in your instance.
GRANT usage on warehouse <WAREHOUSE> to role <ROLE>;
GRANT operate on warehouse <WAREHOUSE> to role <ROLE>;

-- Grant access to connect to the specific database and schema. Replace DATABASE placeholder with the name of database you want to scan.
GRANT USAGE on DATABASE <DATABASE> to ROLE <ROLE>;
-- Replace SCHEMA placeholder with the name of schema you want to scan. If you have multiple schemas in the database repeat the below SQL statement for every schema.
GRANT USAGE on SCHEMA <SCHEMA> to ROLE <ROLE>;

-- Grant access to scan existing and future tables in a database.
-- Repeat this for all databases that you want to scan.
GRANT SELECT ON ALL TABLES IN DATABASE <DATABASE> to ROLE <ROLE>;
GRANT SELECT ON FUTURE TABLES IN DATABASE <DATABASE> to ROLE <ROLE>;

-- Create a user with a strong password and default warehouse. Replace USERNAME placeholder with the username of your choice.
-- Replace the value of DEFAULT_WAREHOUSE with an existing warehouse in your instance.
CREATE USER <USERNAME> PASSWORD='<PASSWORD>' DEFAULT_WAREHOUSE=<WAREHOUSE>;

-- Assign the role created above to the new user.
GRANT ROLE <ROLE> TO USER <USERNAME>;

If you want to scan more than one database it is recommended to create a user and assign read permissions to all databases.

Step 1: First, create a user and assign permissions to use a warehouse.

-- Create a new role. Replace ROLE placeholder with a role name of your choice
CREATE ROLE <ROLE>

-- Grant access to a warehouse. Replace WAREHOUSE placeholder with an existing warehouse in your instance.
GRANT usage on warehouse <WAREHOUSE> to role <ROLE>;
GRANT operate on warehouse <WAREHOUSE> to role <ROLE>;

-- Create a user with a strong password and default warehouse. Replace USERNAME placeholder with the username of your choice.
-- Replace the value of DEFAULT_WAREHOUSE with an existing warehouse in your instance.
CREATE USER <USERNAME> PASSWORD='<PASSWORD>' DEFAULT_WAREHOUSE=<WAREHOUSE>;

-- Assign the role created above to the new user.
GRANT ROLE <ROLE> TO USER <USERNAME>;

Step 2: Now assign permissions to the role to access all databases in the account.

In the SQL snippet below replace the ROLE placeholder with the name of the role created in step 1. Run the SQL snippet, it will print a bunch of SQL statements for granting permissions. Copy the output and run those statements again.

SELECT
	'grant usage on database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')
UNION ALL
SELECT
	'grant usage on all schemas in database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')
UNION ALL
SELECT
	'grant select on all tables in database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')
UNION ALL
SELECT
	'grant select on future tables in database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')

Note: If you want to exclude some databases modify the SQL snippet in step 2 to include the name of the database alongside SNOWFLAKE_SAMPLE_DATA.

Provide the createdUsername, Password, Role Name, Warehouse Name and Account Name to register the Snowflake datasource.

Validate permissions to the database

Next, the user needs to validate these permissions to the database. This ensures authorized access to the database by the credentials provided by the user. After validating the permissions to the database, the user can configure LightBeam Spectra on the system.

Prerequisite

Install snowsql on the machine.

Steps

  1. Go into sql_user_check_snowflake/ directory

Run the script

WAREHOUSE_NAME=<WAREHOUSE NAME> ACCOUNT_NAME=<SNOWFLAKE ACCOUNT NAME> ROLE_NAME=<ROLE ASSIGNED TO USER> SF_USERNAME=<USERNAME> SF_DATABASE=<DATABASE TO CONNECT> bash run.sh

User Credentials

* WAREHOUSE_NAME: Name of the warehouse in your Snowflake instance.

* ACCOUNT_NAME: Name of your Snowflake account.

* ROLE_NAME: Name of the role assigned to the user in Snowflake.

* SF_USERNAME: Username for the Snowflake instance.

* SF_DATABASE: Name of the database in Snowflake to which you wish to establish a connection and validate the permissions.

To validate whether the commands were successful, check the output of the file generated from the commands.


About LightBeam

LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.

Figure 2. Snowflake Data Source
Figure 2.1 Snowflake Data Source
Figure 6. Snowflake Configuration - Select database
Figure 6.1 Snowflake Configuration - Select database
Figure 6.2 Snowflake Configuration - Select database
Figure 6.3 Snowflake Configuration - Select database
Figure 6.4 Snowflake Configuration - Select database

AWS:-

Azure:-

GCP:-

First, clone the repository

For any questions or suggestions, please get in touch with us at: .

🧠
🔦
🔗
https://docs.snowflake.com/en/user-guide/admin-security-privatelink
https://docs.snowflake.com/en/user-guide/privatelink-azure
https://docs.snowflake.com/en/user-guide/private-service-connect-google
https://github.com/lightbeamai/lb-installer
support@lightbeam.ai
Figure 3. Snowflake Data Source - Basic Information & Connection Details
Figure 4. Admin -> Accounts
Figure 5. Obtaining Account locator, cloud region, cloud
Figure 7. Snowflake Configuration - Skipped databases
Figure 7.1 Snowflake Configuration - Skipped databases
Figure 7. Network Rule
Figure 8. Attaching network rule to Active network policy.