Pipeline Pioneers Data Project
present

Pipeline Pioneers Demo Video

Pipeline Pioneers Demo Video

Overcoming errors with the power of friendship

The **Totesys ETL Pipeline** is a data engineering solution that extracts, transforms, and loads data into an OLAP data warehouse for analytical purposes. The project incorporates AWS services to build a robust, automated pipeline and provides insights through **Tableau** dashboards.
**Data Ingestion**: Extracts raw data from the Totesys database and ingests it into an AWS S3 ingestion bucket.
- **Data Transformation**: Processes raw data into a structured schema suitable for the data warehouse.
- **Data Loading**: Loads transformed data into fact and dimension tables in the data warehouse.
- **Automation**: Event-driven architecture that triggers processes using AWS Lambda and S3 events.
- **Monitoring and Logging**: AWS CloudWatch monitors the pipeline for operational visibility.
- **Visualization**: Tableau provides interactive dashboards to analyze the data.
**S3**: Ingestion and processed buckets.
- **Lambda**: Python-based ETL scripts for data processing.
- **CloudWatch**: Monitoring and logging.
- **QuickSight**: BI tool for creating dashboards.
**Ingestion**:
- Data is extracted from the Totesys database and placed in the S3 ingestion bucket.
- Find the file in src/extract_lambda directory
- **Trigger**: Manual or scheduled job.
2. **Transformation**:
- AWS Lambda processes data upon ingestion and transforms it into the defined schema.
- Processed data is stored in Parquet format in the S3 processed bucket.
- Find the file in src/transform_lambda directory
3. **Loading**:
- Transformed data is loaded into a prepared data warehouse at defined intervals.
- **Trigger**: Event-driven or scheduled Lambda.
4. **Visualization**:
- Tableau to generate dashboards.

The Team

  • Team member imagePreview: Team member image

    Simon Kinder

  • Team member imagePreview: Team member image

    Macshellah Zisengwe

  • Team member imagePreview: Team member image

    Louise Concepcion

  • Team member imagePreview: Team member image

    Jeremy Lam

  • Team member imagePreview: Team member image

    Abbey Ola

Technologies

Terraform, Python, Pandas, Pytest, AWS, Boto3, Tableau, PostgresSQLPreview: Terraform, Python, Pandas, Pytest, AWS, Boto3, Tableau, PostgresSQL

We used: Terraform, Python, Pandas, Pytest, AWS, Boto3, Moto, Tableau, PG8000, PostgresSQL

We had experience using them during the bootcamp, and we were confident in them.

Challenges Faced

Yes, many

FAQs

  • What was our AWS bill during the project phase?

    Our bill was $8.89