Skip to main content
Skip table of contents

S3 Based Ingestion

Replication modes: Full, Databrew (Dataprep), Incremental

  1. Click on the pipeline icon on the left hand side of the Data Pipes portal

  2. You will be landed on the ingestion pipelines page, where one can see all the pipelines initiated by the user.

  3. Click on the plus icon to create a new pipeline.

  4. Once you click on the plus icon, click on the configure source block to create the source.

  5. Select File -> S3 source

  6. On clicking S3 source, the following form appears where one needs to specify the S3 path of the source where the files are residing. Currently Data Pipes only support CSV and JSON based files.

  7. Once the source is configured, click on the configure destination option and select the default destination of the data lake.

  8. Select the replication modes:

    1. Full - In full replication mode, the file gets ingested as-is without any change.

    2. DataBrew - In DataBrew replication mode, we allow users to prepare or transform the data before the ingestion process. Data Pipes leverages AWS DataBrew service.

    3. Incremental - In incremental replication, a cron job is set up to continuously poll for new files at a frequency set by the end user during starting the pipeline.

  9. Once the replication mode is selected, click on configure pipeline to provide details like - pipeline name, domain name(where the data is to be loaded), table name.

  10. Click on the play button to start the pipeline and select the schedule frequency.

  11. Once started each pipeline will undergo various steps

    1. Fetching Schema - This will show the actual schema of how the data looks like.

    2. PII scan running - Each pipeline undergoes PII scanning to identify any critical PII data. In case PII data is detected, the user can then decide whether or not to mask/tokenize that column.

    3. Started - Once the user selects the columns to mask(optional), he/she can start the actual ingestion.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.