Orchid Data Engineering Projectpresent
Persistent and resilient in the face of technical challenges.
This was an ETL project extracting data from an OLTP database into an s3 bucket using python code loaded onto an AWS lambda. This ingested data was then processed again using python code loaded onto another AWS lambda and put into another s3 bucket ready to be loaded into an OLAP data warehouse.
The Team
Karl Robinson
Ellen Morris
Souad Alkhaledi
Anita
Gillian Piper
Technologies
We used: AWS services including Lambda, s3, sns, Cloudwatch, Eventbridge and Secrets Manager. Terraform, PostgreSQL, pgAdmin, Python packages including boto3, pandas, moto, pytest, pg8000. GitHub and GitHub actions.
AWS was chosen as working on the cloud allows infrastructure to be scaled as needed making this a more cost effective solution. Using AWS to host our project then allows us to more easily make use of its services such as Lambda to automate processes. This could then all be orchestrated through GitHub actions to create a CI/CD pipeline. Given that the chosen coding language was python, we would make us of the aforementioned packages.
Challenges Faced
Figuring out the best structure to store our data in the s3 buckets. Figuring out the best way for the code in the Lambda to access the necessary python packages.