Banshee project phasepresent

Banshee Demo Video
Debug hard. Deploy fast.
The following directory acts as an Extract-Transform-Load (ETL) pipeline, transporting and re-structuring raw, unformatted data to Dimension and Fact tables, specifically in accordance with several predetermined, outlined Star Schemas. This is achieved primarily through the use of Cloud Engineering, Data Engineering and Code as Infrastructure principals and thus fundamentally relies on the commissioning of Amazon-Web-Services (AWS) resources utilising Terraform along-side fully tested and reviewed Python scripts. In addition, throughout the project, Continuous Integration and Continuous Delivery (CI/CD) was practiced, alongside Test-Driven Development (TTD) to maximise both the effectiveness and validity of such code, as written and deployed. Principally, the structure of this infrastructure is represented within the above diagram. A scheduler was initialised to trigger an AWS State-Machine every 20 minutes, which in-turn activates all three lambda sequentially encapsulated within the aforementioned State-Machine.
The Team
Meral Hewitt
Shea Macfarlane
Ahmad Fadhli
Mihai Misai
Anna Fedyna
Carlo Danieli
Technologies

We used: Python (TDD with Pytest), AWS (Lambda, RDS, CloudWatch, IAM, Step Functions, Eventbridge), Terraform, CI/CD (Github Actions, Github Secrets), Postgres, SQL
Using these technologies allowed us to gain relevant project experience as they are commonly used in industry. They are also the tools that we were collectively most familiar with, enabling us to work efficiently from day 1, which was important given our limited time frame.
Challenges Faced
Throughout the development of our ETL pipeline, we encountered several technical challenges that required strategic problem-solving and strong teamwork to overcome. These challenges included the state machine configuration, handling failed executions without losing data and splitting tasks.
Check out the finished project at https://github.com/mihaimisai/de-project