Team Ness' Project

Team Ness Demo Video

Team Ness Demo Video

E.T.L: Extract.Transform. Loch Ness Monster.

A project to create a data platform that extracts data from an operational database, archives it in a data lake and makes it available in a remodelled OLAP data warehouse.

The Team

  • Team member imagePreview: Team member image

    Paul Sandford

  • Team member imagePreview: Team member image

    Liam Dearlove

  • Team member imagePreview: Team member image

    Inna Teterina

  • Team member imagePreview: Team member image

    Rahul Aneesh

  • Team member imagePreview: Team member image

    Muhammed Irfan

  • Team member imagePreview: Team member image

    Muhammad Raza


Technologies section imagePreview: Technologies section image

We used: Python - including pytest, bandit, safety, coverage, pandas, pg8000, sqlAlchemy, autopep8. Terraform. GithubActions. AWS - including s3, Lambda, CloudWatch, SystemsManager, EventBridge. PostgreSQL.

Python - because it is a powerful and flexible programming language well suited to the tasks and challenges involved in Data Engineering.

Bandit and Safety - to make sure there were no security issues in our code / installed libraries.

Coverage - to ensure that our tests provided sufficient coverage, i.e. above 90%.

Terraform - to allow us to build and alter AWS infrastructure with speed and flexibility.

GithubActions - to build an efficient and robust CI/CD pipeline.

AWS - because Amazon Web Services is an accessible and widely used cloud computing platform.

Trello - we broke down the project into granular tickets. This allowed us to focus on achievable tasks and to manage the workflow of the team as a whole.

Pair programming - we worked in pairs frequently in order to foster a collaborative and supportive working environment, and to make sure we were producing high quality code.

Daily stand-up meetings - we conducted regular stand up meetings to make sure that we could overcome any blockers that pairs were facing.

Challenges Faced

Differences between column names on the provided ERD and the actual data warehouse - this led to us having to refactor parts of our transform app, but was a good experience as it enabled us to practise being flexible and responding to changing conditions.

Deploying the load app using sqlAlchemy, psycopg2 and pandas - this led to our deployment package being too large for an AWS Lambda, so we had to learn how to use sqlAlchemy with pg8000 rather than psycopg2.

Mock testing - in order to sufficiently test our functions we had to learn about and practice ways of testing that we weren't particularly familiar with before the start of the project, which proved to be a very useful challenge.

Github - making sure we were on separate branches and regularly merging to / pulling from main.