Baltica 2 Data Projectpresent
Baltica 2 Demo Video
Collaborative and Creative
This project will create and deploy an AWS pipeline for transferring data from a database into a data warehouse. The system takes data from an initial database, converts to json in a lambda function, and stores in a s3 bucket. It then uses another lambda function to reformat into a schema matching that of our target data warehouse and store in a second s3. Finally, the data is uploaded from the bucket to a data warehouse with a lambda function.
The Team
Ana Terra Camilo Silveira
Laura Messenger
Oscar Ogilvie
Wesley Shaw
Zishaan Asif
Technologies
We used: AWS, Terraform, Python, PostgreSQL, GitHub Actions, Tableau
We used Terraform as it can be updated, redeployed, and reused. The AWS platform allowed us to work flexibly in the cloud. We used Python as we have advanced knowledge and could use useful packages such as pg8000 to integrate it with PostgreSQL. GitHub Actions allowed us to work collaboratively and build a CI/CD pipeline to test and deploy changes. We used Tableau for efficient data visualisation using our final product.
Challenges Faced
One challenge was ensuring that all data would be captured, including that which came in during the runtime of the ingestion code, but without duplicating data. To do this we ensured it ran on a schedule at set times, allowing us to program capture windows with no gaps or overlap.