CSV Files as a Datasource


Overview

LightBeam Spectra supports scanning CSV files stored in cloud storage services such as Google Drive and OneDrive. The CSV file must contain structured data for proper scanning and processing.


How the Scanning Works

  1. The original data source where the CSV file is present must be pre-onboarded with LightBeam. For example, if the CSV file is stored in OneDrive, OneDrive must be onboarded before scanning the CSV file.

  2. During the scanning process, LightBeam downloads the CSV file, converts it to an SQL dump, and creates a temporary PostgreSQL database within the LightBeam cluster with the SQL dump. The database is destroyed after the scanning is complete.

  3. LightBeam fetches the CSV file with the latest timestamp from the folder configured during datasource registration. It filters only CSV files, and if multiple CSV files are present, it selects the one with the latest timestamp.


Notes

  • Since only one CSV file is scanned per datasource, a table named nanolog is created inside a database named lightbeam.

  • Sample data for columns won't be available since the temporary database instance is destroyed after the scan is complete.

  • A column is needed in CSV file containing unique values which can be made primary key. A primary key column is needed for building entities.


Onboarding Steps

  1. Select the PostgreSQL datasource with the Snapshot scanning option.

Fig 1. CSV Files as a Datasource - Snapshot Scanning

2. Enter details like datasource name, datasource owner’s email, etc

Fig 2. CSV Files as a Datasource - Configuration Basic Info
  1. Provide the following details:

  • Select the name of the datasource where the CSV file is present. Names of onboarded Google Drive and OneDrive datasources will appear in the dropdown.

  • Search for the name of the drive owner where the CSV file is located.

  • Enter the name of the folder where the CSV file is present.

Note:

  1. For Google Drive, folder link needs to be entered. For example: https://drive.google.com/drive/folders/<some_id>

  2. For OneDrive, folder name needs to be entered. Example 1: If there is a folder named folder1 at the root then folder1 needs to be entered. Example 2: If there is a folder named folder1/nested at the root then folder1/nestedneeds to be entered.

  • Configure the scan frequency. This will determine how frequently the configured folder will be scanned.

Fig 3. CSV Files as a Datasource - Config Connection Details
  1. Click on "Register" to complete the onboarding process.

By following these steps, you can successfully onboard a CSV file as a datasource in LightBeam Spectra. LightBeam will periodically scan the specified folder, process the latest CSV file, and provide insights into the structured data contained within the file. This feature allows you to monitor and analyze data from various cloud storage services seamlessly within the LightBeam platform.


About LightBeam

LightBeam automates Privacy, Security, and AI Governance, so businesses can accelerate their growth in new markets. Leveraging generative AI, LightBeam has rapidly gained customers’ trust by pioneering a unique privacy-centric and automation-first approach to security. Unlike siloed solutions, LightBeam ties together sensitive data cataloging, control, and compliance across structured and unstructured data applications providing 360-visibility, redaction, self-service DSRs, and automated ROPA reporting ensuring ultimate protection against ransomware and accidental exposures while meeting data privacy obligations efficiently. LightBeam is on a mission to create a secure privacy-first world helping customers automate compliance against a patchwork of existing and emerging regulations.

For any questions or suggestions, please get in touch with us at: [email protected].

Last updated