Banshee project phase
present

Banshee Demo Video

Banshee Demo Video

Debug hard. Deploy fast.

The following directory acts as an Extract-Transform-Load (ETL) pipeline, transporting and re-structuring raw, unformatted data to Dimension and Fact tables, specifically in accordance with several predetermined, outlined Star Schemas. This is achieved primarily through the use of Cloud Engineering, Data Engineering and Code as Infrastructure principals and thus fundamentally relies on the commissioning of Amazon-Web-Services (AWS) resources utilising Terraform along-side fully tested and reviewed Python scripts. In addition, throughout the project, Continuous Integration and Continuous Delivery (CI/CD) was practiced, alongside Test-Driven Development (TTD) to maximise both the effectiveness and validity of such code, as written and deployed. Principally, the structure of this infrastructure is represented within the above diagram. A scheduler was initialised to trigger an AWS State-Machine every 20 minutes, which in-turn activates all three lambda sequentially encapsulated within the aforementioned State-Machine.

The Team

  • Team member imagePreview: Team member image

    Meral Hewitt

  • Team member imagePreview: Team member image

    Shea Macfarlane

  • Team member imagePreview: Team member image

    Ahmad Fadhli

  • Team member imagePreview: Team member image

    Mihai Misai

  • Team member imagePreview: Team member image

    Anna Fedyna

  • Team member imagePreview: Team member image

    Carlo Danieli

Technologies

Python, AWS (Lambda, CloudWatch, Eventbridge), Terraform, Postgres, SQLPreview: Python, AWS (Lambda, CloudWatch, Eventbridge), Terraform, Postgres, SQL

We used: Python (TDD with Pytest), AWS (Lambda, RDS, CloudWatch, IAM, Step Functions, Eventbridge), Terraform, CI/CD (Github Actions, Github Secrets), Postgres, SQL

Using these technologies allowed us to gain relevant project experience as they are commonly used in industry. They are also the tools that we were collectively most familiar with, enabling us to work efficiently from day 1, which was important given our limited time frame.

Challenges Faced

Throughout the development of our ETL pipeline, we encountered several technical challenges that required strategic problem-solving and strong teamwork to overcome. These challenges included the state machine configuration, handling failed executions without losing data and splitting tasks.

Check out the finished project at https://github.com/mihaimisai/de-project