Baltica 2 Data Project
present

Baltica 2 Demo Video

Baltica 2 Demo Video

Collaborative and Creative

This project will create and deploy an AWS pipeline for transferring data from a database into a data warehouse. The system takes data from an initial database, converts to json in a lambda function, and stores in a s3 bucket. It then uses another lambda function to reformat into a schema matching that of our target data warehouse and store in a second s3. Finally, the data is uploaded from the bucket to a data warehouse with a lambda function.

The Team

  • Team member imagePreview: Team member image

    Ana Terra Camilo Silveira

  • Team member imagePreview: Team member image

    Laura Messenger

  • Team member imagePreview: Team member image

    Oscar Ogilvie

  • Team member imagePreview: Team member image

    Wesley Shaw

  • Zishaan AsifPreview: Zishaan Asif

    Zishaan Asif

Technologies

AWS, Terraform, Python, PostgreSQLPreview: AWS, Terraform, Python, PostgreSQL

We used: AWS, Terraform, Python, PostgreSQL, GitHub Actions, Tableau

We used Terraform as it can be updated, redeployed, and reused. The AWS platform allowed us to work flexibly in the cloud. We used Python as we have advanced knowledge and could use useful packages such as pg8000 to integrate it with PostgreSQL. GitHub Actions allowed us to work collaboratively and build a CI/CD pipeline to test and deploy changes. We used Tableau for efficient data visualisation using our final product.

Challenges Faced

One challenge was ensuring that all data would be captured, including that which came in during the runtime of the ingestion code, but without duplicating data. To do this we ensured it ran on a schedule at set times, allowing us to program capture windows with no gaps or overlap.