TotesOps' Data Projectpresent
TotesOps demo video
Streamline, Automate, Innovate: Revolutionising TotesOps
Our ETL Data Engineering Project at TotesOps showcases an adept utilisation of AWS services, exemplifying our Python proficiency. The data processing pipeline relies on three AWS Lambda functions, seamlessly handling extraction, transformation, and loading tasks.
Infrastructure as Code (IaC) with Terraform ensures automated deployment, scalability, and consistent resource management. Continuous Integration/Continuous Deployment (CI/CD) using Github Actions enhances workflow efficiency.
In terms of data storage, AWS S3 buckets provide scalability and durability for ingested and processed data. The inclusion of CloudWatch for monitoring and alerting adds a layer of proactive oversight, ensuring optimal performance.
Throughout the project, we have diligently embraced Agile methodologies, employing an iterative and adaptive approach to development. This commitment underscores our dedication to innovation and excellence in the field of data engineering. The project has challenged us in unexpected ways, and we are proud of the outcome of the final product.
The Team
Tom Roberts
Minnie Taylor Manson
Kirsten Brindle
Leah Morden-Tew
Cinthya Sánchez
Elliott Mullins
Technologies
We used AWS S3, Lambda, Cloudwatch, IAM, Secrets Manager, Python, SQL (Protgres), Terraform, Github actions, Pandas, pytest, moto, unittest and Trello.
The combination of AWS services, GitHub Actions, Terraform and boto3 allowed for seamless deployment and automation of our project.
Pandas and Postgresql assisted in the data-wrangling aspect of our project.
For high-quality testing, we utilized Pytest, Moto and Unittest (MagicMock, patch, Mock) ensuring code integrity.
Challenges Faced
In the project's concluding phases, code refactoring was necessary because of unanticipated structural issues in the data warehouse schema. To address this challenge, we had to reassess our code's organisation and make necessary adjustments to align with the final schema.