CSV Files as a Datasource
Overview
LightBeam Spectra supports scanning CSV files stored in cloud storage services such as Google Drive and OneDrive. The CSV file must contain structured data for proper scanning and processing.
How the Scanning Works
The original data source where the CSV file is present must be pre-onboarded with LightBeam. For example, if the CSV file is stored in OneDrive, OneDrive must be onboarded before scanning the CSV file.
During the scanning process, LightBeam downloads the CSV file, converts it to an SQL dump, and creates a temporary PostgreSQL database within the LightBeam cluster with the SQL dump. The database is destroyed after the scanning is complete.
LightBeam fetches the CSV file with the latest timestamp from the folder configured during datasource registration. It filters only CSV files, and if multiple CSV files are present, it selects the one with the latest timestamp.
Notes
Since only one CSV file is scanned per datasource, a table named
nanolog
is created inside a database namedlightbeam
.Sample data for columns won't be available since the temporary database instance is destroyed after the scan is complete.
A column is needed in CSV file containing unique values which can be made primary key. A primary key column is needed for building entities.
Onboarding Steps
Select the PostgreSQL datasource with the Snapshot scanning option.
2. Enter details like datasource name, datasource owner’s email, etc
Provide the following details:
Select the name of the datasource where the CSV file is present. Names of onboarded Google Drive and OneDrive datasources will appear in the dropdown.
Search for the name of the drive owner where the CSV file is located.
Enter the name of the folder where the CSV file is present.
Configure the scan frequency. This will determine how frequently the configured folder will be scanned.
Click on "Register" to complete the onboarding process.
By following these steps, you can successfully onboard a CSV file as a datasource in LightBeam Spectra. LightBeam will periodically scan the specified folder, process the latest CSV file, and provide insights into the structured data contained within the file. This feature allows you to monitor and analyze data from various cloud storage services seamlessly within the LightBeam platform.
About LightBeam
LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.
Last updated