Snowflake
Connecting Snowflake to LightBeam
Overview
LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.
Examples: Snowflake, SMB, MySQL, PostgreSQL, etc.
Onboarding Snowflake Data Source
- Login to your LightBeam Instance. 
- Click on DATASOURCES on the Top Navigation Bar. 
- Click on “Add a data source”. 
- Search for “Snowflake”. 
3. Fill in the details as shown below and click Next:
Basic Information
- Instance Name: This is the unique name given to the data source. 
- Description: This is an optional field needed to describe the use of this data source. 
- Primary owner: Email address of the person responsible for this data source which will get alerts by default. 
- Source of Truth: LightBeam Spectra would have monitored data sources that contain data acting as a single point of truth and that can be used for looking up entities/attributes that help to identify if the other attributes/entities found in any other data source are accurate or not. A Source of Truth data set would create entities based on the attributes found in the data. 

Connection Details
- Provide the following details in the Connection section: - Username: The Snowflake account username (e.g., - admin).
- Password: The password associated with the username. 
- Account Name: Enter the account name in the format - <account_locator>.<region>.<cloud>. For example:- rs31112.europe-west4.gcp.- For AWS - us-west-2, use only the- <account_locator>.
- Alternatively, - <org_name>-<account_name>can also be used.
 
- Role: The role assigned to the user for accessing the Snowflake instance (e.g., - lightbeam_users).
- Warehouse: Specify the warehouse (e.g., - compute_wh).
 


5. Click Test Connection to validate the credentials. If successful, a Test Connection Success message will appear.
Click Next to continue.
6. In this step, you can choose either of two scan setting options –
i) Show all databases to select
ii) Select specific database(s) that you have permission for
i) To show all databases, select the first scan setting. This will show a list of all the Snowflake databases.
ii) To select specific databases you have permission for, select the second scan setting.
Click on Add database name.
Type the name of the database you would like to scan in the Search box and choose the correct option from the drop-down list.
7. After completing step 6, check the tickboxes next to the databases you would like to add.
Now we are ready to connect to the test database and proceed.
Click on Start Sampling.
This will show you the following message:
Click on Proceed with Sampling.
Now you can browse the updated datasource.
Tables & Columns
This section provides a comprehensive guide to viewing and understanding the data discovered from your Snowflake tables and columns.
Overview and Scan Method
The Tables & Columns pages are the primary way to view the foundational data within your Snowflake instance.
- Tables Page: Lists all discovered tables from the connected databases. 
- Columns Page: Provides a master list of all columns across every discovered table. 
A key feature of this section is its scan method: all tables and columns within the databases you selected during setup are scanned automatically and continuously by default. No manual action is needed to initiate scanning.
How-To Guide: Viewing Table and Column Data
- Navigate to the Tables Page: - Go to the DATASOURCES section from the top navigation bar. 
- Select your configured Snowflake data source. 
- Click on the Tables & Columns tab. By default, it will display the "Tables" view. 
 
- Browse and Filter Tables: - On the Tables page, you will see a list of all discovered tables. 
- Use the filter options at the top to narrow the list by Database, Schema, or Table Name. 
 
- Inspect a Specific Table's Details: - Click on any table name in the list to open its detailed view. 
- In this view, you will find: - Complete Schema: A list of all columns within that table. 
- Sample Data: A preview of the data for each column. 
- Scan Summary: A count of the total columns and the number of columns that require review based on data classification. 
- Classification Details: Clicking on the scan summary reveals the specific sensitive data classification (e.g., PII, PHI) applied to each column. 
 
 
- View All Columns: - From the Tables & Columns tab, use the dropdown menu to switch from "Tables" to the "Columns" view. 
- This page lists every single column from all tables, which can be filtered and searched to find specific data points across your data source. 
 
Views
This section provides a comprehensive guide to the discovery, scanning, and analysis of SQL Views in Snowflake.
Overview and Scan Method
In addition to standard tables, LightBeam supports the scanning of SQL Views, allowing users to gain visibility into virtualized data sources that often aggregate or represent sensitive information.
What Are SQL Views? SQL Views in Snowflake are virtual tables defined by SQL queries. They do not store data themselves but provide a way to encapsulate business logic, access data across multiple tables, and present it in a customized format. These views often contain sensitive data including Personally Identifiable Information (PII) or Protected Health Information (PHI), and hence require stringent security monitoring and data classification.
Unlike tables, views are not scanned by default. Scanning a view is a manual, on-demand action that gives you control over when these potentially complex queries are run.
How-To Guide: Scanning and Analyzing Views
- Navigate to the Views Page: - Go to the DATASOURCES section and select your configured Snowflake data source. 
- Click on the Tables & Columns tab. 
- Use the dropdown menu to switch from the "Tables" view to the "Views" view. This will list all discoverable views. 
 
- Manually Trigger a Scan: - Select the checkbox next to one or more views you wish to scan. 
- Click the Scan button to initiate the process. 
 
- Review Scan Results: - After the scan completes, click on a view name to inspect its details. 
- The results will show: - Highlighted Classifications: Any columns containing sensitive data will be highlighted. You can review or edit these classifications as needed. 
- Sample Data: A preview of the data for each column in the view. 
- Rescan Capability: Views can be rescanned at any time to reflect changes in the underlying data or permissions. 
 
 
Creating Entities from Views
Using Table Cluster to Generate Entities from Views
After scanning, users can generate entities from a View by creating a table cluster.
- Navigate to the Table Structure section of your data source. 
- Click Create Cluster and select the Views tab. 
- Choose a pre-scanned view. 
- Define a primary key or unique identifier to track entities across data sources. 
- Save the cluster. LightBeam will identify and count instances of each attribute, assisting in building a unified data identity. 
Viewing Details of Skipped Databases
During the scanning process, some databases may be skipped if required permissions or configurations are missing. LightBeam provides a clear way to identify and address these skipped databases.
In the Datasources section, skipped databases are highlighted in the Overview panel:
- A yellow notification banner indicates the number of skipped databases (e.g., "12 Skipped") and states the reason, for example: "necessary permissions haven’t been configured." 

Resolving Scan Issues for Skipped Databases
- Click View Details on the yellow banner. 
- A modal window will appear, listing the names of the skipped databases (e.g., - sandbox,- automation_stuff,- test2).
- The modal provides a description explaining that these databases were skipped due to missing permissions or errors. 
- Once permissions are configured, LightBeam will automatically include these databases in the next scan cycle. 

APPENDIX
Troubleshooting
If you don’t see any data being scanned without error, it might be a permission issue. Consider running a SELECT * query on a table and see if you are able to see the data. If you see a message of permission denied, consider granting permission to the user.
Whitelisting IP address
By default, Snowflake allows users to connect to the service from any computer or device. If there is an active policy that allows access only from certain networks, add the public IP of all the nodes where the LightBeam cluster is running. 1. Goto admin -> security from snowflake UI. Create a new network rule containing the Public IP address of all LightBeam nodes.

2. Attach this network rule to the active network policy as Allowed.

Connecting through Private Links
A private link is a feature for securing connectivity between your clients and the Snowflake without traversing the public Internet.
Following these links if you want to setup that:-
Setting up a new user in Snowflake
We need to create a new user containing all permissions required by LightBeam to scan the datasource.
User: A user in Snowflake is an account in the system, generally associated with an individual person. Users can log into Snowflake, issue SQL commands, manage data, and perform other operations. A user is associated with specific properties, such as login name, password, and default role.
Role: A role in Snowflake, on the other hand, is a named set of access privileges that can be granted to users or other roles. These privileges determine what actions a user can perform and on which database objects.
A user can be assigned multiple roles and can switch between them during a session to access different sets of privileges as needed.
In essence, a user is who logs into the system, and a role determines what that user can do once they are logged in. This distinction allows Snowflake to provide flexible and granular control over access to its resources.
The following SQL snippet can be used for creating a role, a user, and assigning permissions to a single database.
-- Create a new role containing all permissions. Replace ROLE placeholder with a role name of your choice
CREATE ROLE <ROLE>
-- Grant access to a warehouse. Replace WAREHOUSE placeholder with an existing warehouse in your instance.
GRANT usage on warehouse <WAREHOUSE> to role <ROLE>;
GRANT operate on warehouse <WAREHOUSE> to role <ROLE>;
-- Grant access to connect to the specific database and schema. Replace DATABASE placeholder with the name of database you want to scan.
GRANT USAGE on DATABASE <DATABASE> to ROLE <ROLE>;
-- Replace SCHEMA placeholder with the name of schema you want to scan. If you have multiple schemas in the database repeat the below SQL statement for every schema.
GRANT USAGE on SCHEMA <SCHEMA> to ROLE <ROLE>;
-- Grant access to scan existing and future tables in a database.
-- Repeat this for all databases that you want to scan.
GRANT SELECT ON ALL TABLES IN DATABASE <DATABASE> to ROLE <ROLE>;
GRANT SELECT ON FUTURE TABLES IN DATABASE <DATABASE> to ROLE <ROLE>;
-- Create a user with a strong password and default warehouse. Replace USERNAME placeholder with the username of your choice.
-- Replace the value of DEFAULT_WAREHOUSE with an existing warehouse in your instance.
CREATE USER <USERNAME> PASSWORD='<PASSWORD>' DEFAULT_WAREHOUSE=<WAREHOUSE>;
-- Assign the role created above to the new user.
GRANT ROLE <ROLE> TO USER <USERNAME>;
If you want to scan more than one database it is recommended to create a user and assign read permissions to all databases.
Step 1: First, create a user and assign permissions to use a warehouse.
-- Create a new role. Replace ROLE placeholder with a role name of your choice
CREATE ROLE <ROLE>
-- Grant access to a warehouse. Replace WAREHOUSE placeholder with an existing warehouse in your instance.
GRANT usage on warehouse <WAREHOUSE> to role <ROLE>;
GRANT operate on warehouse <WAREHOUSE> to role <ROLE>;
-- Create a user with a strong password and default warehouse. Replace USERNAME placeholder with the username of your choice.
-- Replace the value of DEFAULT_WAREHOUSE with an existing warehouse in your instance.
CREATE USER <USERNAME> PASSWORD='<PASSWORD>' DEFAULT_WAREHOUSE=<WAREHOUSE>;
-- Assign the role created above to the new user.
GRANT ROLE <ROLE> TO USER <USERNAME>;Step 2: Now assign permissions to the role to access all databases in the account.
In the SQL snippet below replace the ROLE placeholder with the name of the role created in step 1. Run the SQL snippet, it will print a bunch of SQL statements for granting permissions. Copy the output and run those statements again.
SELECT
	'grant usage on database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')
UNION ALL
SELECT
	'grant usage on all schemas in database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')
UNION ALL
SELECT
	'grant select on all tables in database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')
UNION ALL
SELECT
	'grant select on future tables in database ' || DATABASE_NAME || ' to role <ROLE>;'
FROM
	information_schema.databases
WHERE
	database_name NOT in('SNOWFLAKE', 'SNOWFLAKE_SAMPLE_DATA')Note: If you want to exclude some databases modify the SQL snippet in step 2 to include the name of the database alongside SNOWFLAKE_SAMPLE_DATA.
Provide the createdUsername, Password, Role Name, Warehouse Name and Account Name to register the Snowflake datasource.
Setting up Resource Monitor
A resource monitor in snowflake can help control costs and avoid unexpected credit usage caused by running warehouses. Resource monitor can suspend a warehouse if credit limit for warehouse exceeds than specified limit.
Note: resource monitor can be created from user with ACCOUNTADMIN role only
You can create Resource Monitor by executing these queries from snowflake
 CREATE OR REPLACE RESOURCE MONITOR <name of resource limiter>
  WITH CREDIT_QUOTA = <credit limit>
       FREQUENCY = <DAILY/WEEKLY/MONTHLY/YEARLY>
       START_TIMESTAMP = IMMEDIATELY
       NOTIFY_USERS = (<list of users to send notification>)
  TRIGGERS ON 50 PERCENT DO NOTIFY
           ON 60 PERCENT DO SUSPEND
           ON 100 PERCENT DO SUSPEND_IMMEDIATE;
To assign the created resource monitor to a warehouse use this query
ALTER WAREHOUSE <name fo warehouse>
SET RESOURCE_MONITOR = <name of resource limiter>;Creating Resource Monitor from UI
Follow these steps to create resource monitor from snowflake UI
- Click on Admin in snowflake UI  - Figure 9: select Admin 
- Click on resource monitors tab  - Figure 10: List resource monitors 
- Create a resource limiter from Resource monitor button in top right corner. If you want to use exisitng resource monitor skip to step  - Figure 11: resource monitor button 
- After clicking add resource monitor button, you should see a pop up this, Fill name, credit quota, monitor type , select warehouse. By default schedule is monthly, you can customize it according to your needs, follow step 5 for schedule  - Figure 12: Fill resource monitor information 
- Click on customize and select frequency type and start date from UI . You can provide custom range of time for resource monitor as well.  - Figure 13: cutomize frequcny cyle of resource monitor 
- Click on apply and click on create resource monitor button to save the entered resource monitor information. Resource monitor is created successfully, you can see information of resource monitor like this when you click on a resource limiter from list. 

Validate permissions to the database
Next, the user needs to validate these permissions to the database. This ensures authorized access to the database by the credentials provided by the user. After validating the permissions to the database, the user can configure LightBeam Spectra on the system.
Prerequisite
Install snowsql on the machine.
Steps
- First, clone the repository https://github.com/lightbeamai/lb-installer 
- Go into - sql_user_check_snowflake/directory
Run the script
WAREHOUSE_NAME=<WAREHOUSE NAME> ACCOUNT_NAME=<SNOWFLAKE ACCOUNT NAME> ROLE_NAME=<ROLE ASSIGNED TO USER> SF_USERNAME=<USERNAME> SF_DATABASE=<DATABASE TO CONNECT> bash run.shUser Credentials
* WAREHOUSE_NAME: Name of the warehouse in your Snowflake instance.
* ACCOUNT_NAME: Name of your Snowflake account. 
* ROLE_NAME: Name of the role assigned to the user in Snowflake. 
* SF_USERNAME: Username for the Snowflake instance. 
* SF_DATABASE: Name of the database in Snowflake to which you wish to establish a connection and validate the permissions.
To validate whether the commands were successful, check the output of the file generated from the commands.
About LightBeam
LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.
For any questions or suggestions, please get in touch with us at: [email protected].
Last updated
