AWS S3
Connecting AWS S3 to LightBeam
Last updated
Connecting AWS S3 to LightBeam
Last updated
LightBeam Spectra users can connect various data sources to the LightBeam application and these data sources will be continuously monitored for PII, PHI data.
Example: AWS S3, Google Drive, OneDrive, SharePoint, etc.
Login to your LightBeam Instance.
Click on DATASOURCES on the Top Navigation Bar.
Click on “Add a data source”.
Search for “AWS S3”.
Click on AWS S3.
Fill in the requested information and click on Next.
Data Source Name: This is the unique name given to the data source.
Description: This is an optional field needed to describe the use of this data source.
Primary Owner: Email address of the person responsible for this data source which will get alerts by default.
Entity Creation: LightBeam Spectra detects and associates attributes based on the context and identifies whose data it is; these are called entities. Example: Jane Doe is an entity for whom LightBeam Spectra might have detected Name and SSN in a monitored data source.
Source of Truth: LightBeam Spectra would have monitored data sources that contain data acting as a single point of truth and that can be used for looking up entities/attributes which help to identify if the other attributes/entities found in any other data source are accurate or not. A Source of Truth data set would create entities based on the attributes found in the data.
Location: The location of the data source.
Purpose: The purpose of the data being collected/processed.
Stage: The stage of the data source. Example: Source, Processing, Archival, etc.
Data Source Configuration LightBeam uses the "Live Scan" approach, which tracks changes made to objects in buckets and makes use of AWS EventBridge to provide real-time updates of these changes.
Each bucket must have the EventBridge service enabled for this to work. If it isn't already enabled, LightBeam will do so automatically.
1. Under Authentication Method, choose:
a. Access Key/Secret Key (default)
b. IAM Role (Only for AWS EKS deployments) 2. In the "Scan Data" section, specify how frequently LightBeam should scan your S3 buckets:
Numerical Input: Enter a value (e.g., 10
) or use ▲/▼ arrows to adjust.
Unit Selector: Choose Seconds
, Minutes
, Hours
, or Days
by dropdown.
Examples:
Every 30 Minutes
Every 2 Hours
Every 1 Day
a. For Access Key/Secret Key:
Please ensure that appropriate permissions to do this are configured with these credentials.
Access keys: Access keys are long-term credentials for an IAM user or the AWS account root user. You can use access keys to sign programmatic requests to the AWS CLI or AWS API (directly or using the AWS SDK).
Secret access keys: Secret Access keys are like your password. AWS does not allow retrieval of a secret access key after its initial creation. This applies to both root secret access keys and AWS Identity and Access Management (IAM) user secret access keys.
b. For IAM Role:
Ensure LightBeam is deployed on an AWS EKS cluster.
Verify the EKS node group’s IAM role has the required policy (see Appendix: IAM Role Setup).
No credentials needed – authentication is role-based.
Click on Test Connection.
Verify that you get the message Connection Success! on the screen. Click on Next.
In this step, you can choose either of two scan setting options –
i) Scan all buckets
ii) Scan selected buckets
iii) Scan selected folders
To choose option (i), select Scan all Buckets, and click on Validate And Save.
This will allow for the registration of the AWS S3 buckets.
Registration of S3 buckets is a two-step process:
Validating the bucket: Certain modifications are made to the client's S3 buckets by LightBeam Spectra. After these adjustments take effect, LightBeam Spectra starts scanning the buckets in real-time.
Downloading JSON file: Following the validation of the buckets, an automated JSON file download will reveal a history of any modifications made to the user's buckets, including the configuration both before and after the modifications. This will help the user to track the changes made by LightBeam.
To choose option (ii), select Scan selected Buckets. Now enter the names of the buckets that you would like to scan in the Search box individually. Select the buckets by ticking the checkboxes next to them.
To choose option (iii), select Scan selected folders. Here we need to enter name folder within bucket which we want to scan in format s3://<bucket-name>/<folder>.
Once the required buckets is selected, click on Save
Now that the AWS S3 datasource is connected to LightBeam, we can begin viewing the dashboard and other pages of the onboarded datasource.
A validation failure for the AWS S3 buckets can occur in the following cases:
Failure in enabling AWS Eventbridge:
If the Eventbridge service is not functioning properly, it may impact the ability to verify the scan parameters, which could result in incorrect or incomplete scans.
There can be various reasons why AWS Eventbridge may fail to launch, including:
Network connectivity issues
Insufficient permissions or access to the service
Incorrect configuration of Eventbridge settings
Resource constraints, such as low memory or disk space
Service outages or maintenance by AWS
Software bugs or compatibility issues.
To onboard AWS S3 datasource we need the AWS Access key and AWS Secret key of the IAM user with the following permissions:
EventBridge:
We use EventBridge EventBus to consume real-time change events from S3 and route them to SNS topics.
S3:
We require read permissions for all files and write permissions to modify the bucket notification configuration for powering real-time sync of data to LightBeam.
SQS:
We create new SQS queues to consume the real-time change events from S3.
SNS:
We use a fan-out approach to subscribe SNS topics to SQS queues so multiple actors in the system can consume these events.
These SNS topics, SQS topics etc are created as part of datasource registration by LightBeam backend. These resources may incur additional cost on AWS.
Follow these steps to attach the required policy to your EKS node group’s IAM role:
Navigate to the EKS Cluster.
Click on Compute -> Node group. Select the node group on which Lightbeam is running.
3. Open the IAM Role attached to the node group.
4: Click on Add permissions -> Create inline policy
5: Copy the JSON payload.
Scroll down and click on "Next".
Enter the name as "lightbeam-s3-policy
" and click on "Create policy".
LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.
This will start the identical procedure outlined in the section.
If these modifications fail for any of the reasons listed in the section.
The IAM permissions are limited write and read access to , , , . We require read and write permissions for the following reasons:
For any questions or suggestions, please get in touch with us at: .