Data Dynamo Squad's Data Project
present

The more you know, the more you don't know.

The project aimed to establish a robust data ingestion and transformation pipeline utilising various AWS services to ensure efficient, timely, and error-resilient processing of data. The pipeline consists of multiple components orchestrated to ingest data from a database into an S3 “landing zone,” transform it, and then load it into a data warehouse. Key components include AWS EventBridge for job scheduling, AWS Lambda for computing tasks, S3 buckets for storing ingested and processed data, and CloudWatch for logging and monitoring.

The Team

  • Team member imagePreview: Team member image

    Jovan Ellis

  • Team member imagePreview: Team member image

    Andreea Mitel

  • Team member imagePreview: Team member image

    Valerie Parfeliuk

  • Team member imagePreview: Team member image

    Nathan Stoneley

  • Team member imagePreview: Team member image

    Ben Ward

  • Team member imagePreview: Team member image

    Sumaya Abdirahman

Technologies

We used: Terraform, github actions, python, aws

Leveraging Terraform, GitHub Actions, Python, and AWS has streamlined our project workflow. Terraform simplifies infrastructure management, enabling us to deploy and maintain resources effortlessly. GitHub Actions automates tasks, ensuring seamless integration and continuous deployment. Python, with its versatility, empowers us to handle data manipulation. AWS S3 allowed us to store our data which triggered our functions in AWS lambda.

Challenges Faced

We faced Runtime issues in the transform lambda, solved by doing the data manipulation using pandas. Also, faced issue with Lambda Layers being too large, solved by using an in built lambda layer. Another one was running SQL Tests in the workflow using locally created database - solved by having to import some more pre-built github actions