Data Dynamo Squad's Data Projectpresent
Data Dynamo Squad
The more you know, the more you don't know.
The project aimed to establish a robust data ingestion and transformation pipeline utilising various AWS services to ensure efficient, timely, and error-resilient processing of data. The pipeline consists of multiple components orchestrated to ingest data from a database into an S3 “landing zone,” transform it, and then load it into a data warehouse. Key components include AWS EventBridge for job scheduling, AWS Lambda for computing tasks, S3 buckets for storing ingested and processed data, and CloudWatch for logging and monitoring.
The Team
Jovan Ellis
Andreea Mitel
Valerie Parfeliuk
Nathan Stoneley
Ben Ward
Sumaya Abdirahman
Technologies
We used: Terraform, github actions, python, aws
Leveraging Terraform, GitHub Actions, Python, and AWS has streamlined our project workflow. Terraform simplifies infrastructure management, enabling us to deploy and maintain resources effortlessly. GitHub Actions automates tasks, ensuring seamless integration and continuous deployment. Python, with its versatility, empowers us to handle data manipulation. AWS S3 allowed us to store our data which triggered our functions in AWS lambda.
Challenges Faced
We faced Runtime issues in the transform lambda, solved by doing the data manipulation using pandas. Also, faced issue with Lambda Layers being too large, solved by using an in built lambda layer. Another one was running SQL Tests in the workflow using locally created database - solved by having to import some more pre-built github actions