Hedgehog
present

"Data Engineering: Turning '404 not found' into '200 OK'." - ChatGPT

This Northcoders Data Engineering Final Project is a pipeline which transforms an SQL database of a shop's sales records (in Online Transaction Processing "OLTP" format) to a structured data warehouse (in Online Analytical Processing "OLAP" format); all hosted in amazon web services (AWS). See the end of the ReadMe for images of the entity relationship diagrams (ERDs) for the initial and transformed databases.

The Pipeline:
1. The pipeline's "extraction" Lambda function collects both archive and new data entries by scanning the database periodically for updates. It converts new, unique data to CSV files which are stored in an S3 bucket; and logs in CloudWatch. The database credentials are stored in Secrets Manager; and Systems Manager is used to store timestamps.
2. Any bucket upload event triggers a second "processing" Lambda function which transforms and normalises the data using Pandas DataFrames and stores them in parquet format in a second S3 bucket.
3. Finally, a third "storage" Lambda function scans the second bucket periodically for updates, which the pipeline converts back to SQL and loads to a data warehouse in star format.
The entire pipeline infrastructure is managed using Terraform.

The Team

  • Team member imagePreview: Team member image

    Philippa Clarkson

  • Team member imagePreview: Team member image

    Dylan Hickman Singh

  • Team member imagePreview: Team member image

    Victoria Messam

  • Team member imagePreview: Team member image

    Tom Avery

  • Team member imagePreview: Team member image

    Phillip Taylor

Technologies

Technologies section imagePreview: Technologies section image

We used: Python; PSQL; AWS Lambda, S3, CloudWatch, SSM, SecretsManager; Terraform; GitHub; Trello.

These were the most efficient resources for our required project.

Challenges Faced

Terraform Layers, Github Merges were tricky initially, CSV delimiters.

FAQs

  • Why did we choose Hedgehog as a team name?

    Philippa was late to the first team meeting because a hedgehog had crawled into the foundations of her house and she was busy pulling up floorboards to find it.