Pipeline Pioneers Data Projectpresent
Pipeline Pioneers Demo Video
Overcoming errors with the power of friendship
The **Totesys ETL Pipeline** is a data engineering solution that extracts, transforms, and loads data into an OLAP data warehouse for analytical purposes. The project incorporates AWS services to build a robust, automated pipeline and provides insights through **Tableau** dashboards.
**Data Ingestion**: Extracts raw data from the Totesys database and ingests it into an AWS S3 ingestion bucket.
- **Data Transformation**: Processes raw data into a structured schema suitable for the data warehouse.
- **Data Loading**: Loads transformed data into fact and dimension tables in the data warehouse.
- **Automation**: Event-driven architecture that triggers processes using AWS Lambda and S3 events.
- **Monitoring and Logging**: AWS CloudWatch monitors the pipeline for operational visibility.
- **Visualization**: Tableau provides interactive dashboards to analyze the data.
**S3**: Ingestion and processed buckets.
- **Lambda**: Python-based ETL scripts for data processing.
- **CloudWatch**: Monitoring and logging.
- **QuickSight**: BI tool for creating dashboards.
**Ingestion**:
- Data is extracted from the Totesys database and placed in the S3 ingestion bucket.
- Find the file in src/extract_lambda directory
- **Trigger**: Manual or scheduled job.
2. **Transformation**:
- AWS Lambda processes data upon ingestion and transforms it into the defined schema.
- Processed data is stored in Parquet format in the S3 processed bucket.
- Find the file in src/transform_lambda directory
3. **Loading**:
- Transformed data is loaded into a prepared data warehouse at defined intervals.
- **Trigger**: Event-driven or scheduled Lambda.
4. **Visualization**:
- Tableau to generate dashboards.
The Team
Simon Kinder
Macshellah Zisengwe
Louise Concepcion
Jeremy Lam
Abbey Ola
Technologies
We used: Terraform, Python, Pandas, Pytest, AWS, Boto3, Moto, Tableau, PG8000, PostgresSQL
We had experience using them during the bootcamp, and we were confident in them.
Challenges Faced
Yes, many
FAQs
What was our AWS bill during the project phase?
Our bill was $8.89