Lambda Legends Data Project
present

Lambda Legends Demo Video

Lambda Legends Demo Video

Work of Legends

This is a data engineering project which implements an end-to-end ETL (extract, transform, load) pipeline. It extracts data from a database, transforms it to a star schema and finally loads it into an AWS warehouse.
Current features:

Data Extraction: Uses a Python application to automatically ingest data from the totesys operational database into an S3 bucket in AWS.

Data Transformation: Uses a Python application to process raw data to conform to a star schema for the data warehouse. The transformed data is stored in parquet format in a second S3 bucket.

Data Loading: Loads transformed data into an AWS-hosted data warehouse, populating dimensions and fact tables.

Automation: End-to-end pipeline triggered by completion of a data job.

Monitoring and Alerts: Logs to CloudWatch and sends SNS email alerts in case of failures.

The Team

  • Team member imagePreview: Team member image

    Pratik Shrestha

  • Team member imagePreview: Team member image

    Rrezon Mripa

  • Team member imagePreview: Team member image

    Joshua Man

  • Team member imagePreview: Team member image

    Mirriam Karimi

  • Team member imagePreview: Team member image

    Eloise Holland

Technologies

pandas, boto3, aws, pytest, terraformPreview: pandas, boto3, aws, pytest, terraform

We used: pg8000, pandas, boto3, aws wrangler, pytest, moto, terraform, git, github actions

pg8000 for connecting and querying the PostgreSQL database.
Pandas for manipulating and transforming data into tables.
Boto3 for interacting with AWS services.
AWS wrangler for simplifying the process of writing transformed dataframes back to S3 in parquet format during the Transform phase.
Pytest for testing.
Moto for mocking AWS services during testing.
Terraform for defining and provisioning the AWS infrastructure
Git: enabled version control for tracking changes in our project code
GitHub Actions: Automated testing and deployment workflows to ensure code quality and streamline the CI/CD pipeline.

Challenges Faced

We face challenges during the extraction of data as we wanted to avoid saving the data on our local machines. We also faced challenges with terraform changes not automatically reflected in our lambda functions.