Orchid Data Engineering Project
present

Persistent and resilient in the face of technical challenges.

This was an ETL project extracting data from an OLTP database into an s3 bucket using python code loaded onto an AWS lambda. This ingested data was then processed again using python code loaded onto another AWS lambda and put into another s3 bucket ready to be loaded into an OLAP data warehouse.

The Team

  • Team member imagePreview: Team member image

    Karl Robinson

  • Team member imagePreview: Team member image

    Ellen Morris

  • Team member imagePreview: Team member image

    Souad Alkhaledi

  • Team member imagePreview: Team member image

    Anita

  • Team member imagePreview: Team member image

    Gillian Piper

Technologies

Logos for AWS, Pandas, Terraform, pytest, Lambda, s3, Python, Cloudwatch, Eventbridge and PostgreSQLPreview: Logos for AWS, Pandas, Terraform, pytest, Lambda, s3, Python, Cloudwatch, Eventbridge and PostgreSQL

We used: AWS services including Lambda, s3, sns, Cloudwatch, Eventbridge and Secrets Manager. Terraform, PostgreSQL, pgAdmin, Python packages including boto3, pandas, moto, pytest, pg8000. GitHub and GitHub actions.

AWS was chosen as working on the cloud allows infrastructure to be scaled as needed making this a more cost effective solution. Using AWS to host our project then allows us to more easily make use of its services such as Lambda to automate processes. This could then all be orchestrated through GitHub actions to create a CI/CD pipeline. Given that the chosen coding language was python, we would make us of the aforementioned packages.

Challenges Faced

Figuring out the best structure to store our data in the s3 buckets. Figuring out the best way for the code in the Lambda to access the necessary python packages.