Load Data into the Data Lake from CSV
Data ingestion refers to the process of collecting and importing data from various sources into a system or database. In this tutorial, we will be uploading a CSV file (comma-separated values) into the data lake powered by Amazon Athena and AWS S3 storage.
If you need a sample CSV dataset, consider downloading one from https://www.kaggle.com/datasets
Create a new CSV Ingestion Pipeline
Navigate to the Ingestion Pipelines Interface by clicking on the Pipeline icon on the left.
Click on the “+” icon to create a new pipeline.
Configure a CSV Data Source
Click on “Configure Source”.
Select a CSV File Source from your computer.
Configure a Data Lake Destination
Click on “Configure Destination”
Select an existing Amazon Athena as a destination.
Configure an Ingestion Mode
Full ingestion mode will create and if applicable, overwrite existing datasets.
Review the list of Ingestion Modes if you are working on production use cases.
Click on “Select Mode” icon.
Select “Full“ ingestion mode.
Configure Pipeline Output Details
Click on “Configure Pipeline”.
Create a new dataset.
[Optional] Modify the default table name for better searchability.
Save the pipeline details