Bootcamp Life
What I have learnt in the first 9 weeks of the Data Engineering bootcamp
2.5 quintillion bytes are created every day in 2022. Some of it will be captured, cleaned, and stored for analysis and that is the job of a data engineer.
While looking into expanding my skillset, I came across this course in Data Engineering (DE) offered by Northcoders, which prides itself to be not only a coding school but also “a diverse digital community with a mission to change lives for the better.”
The DE bootcamp has been in the making for the past year with plans being made for this project since 2021. I am part of the first cohort and, if you are a data addict like me or only looking to read about something new, this is a brief rundown of what you can expect from the bootcamp.
The bootcamp builds on the success of Northcoders’ full-stack developer pathway. This other course introduces the fundamentals of programming by way of JavaScript and zooms on the most recent technologies (Node.JS, Express, along with some others not so new but widely used like SQL). It also seeks to equip graduates with best practices of the trade like GitHub, Test-Driven Development (TDD) and Continuous Integration Continuous Deployment (CI/CD). This is an established learning path which has been continuously improved since the launch of the company in 2015.
Now to Data Engineering: given my background in research in the Humanities, I was excited when I heard that Northcoders is offering training in data.
“Data engineering is the development, implementation, and maintenance of systems and processes that take in raw data and produce high-quality, consistent information that supports downstream use cases, such as analysis and machine learning.”
Yes, Data Engineering is a developer-type of job. We use code to gather data from multiple sources: the Internet of Things (IoT), social media, client input, etc. In order to be analysed, these data need to be captured, processed and delivered to the downstream users. These users are data analysts and those doing machine learning, but the downstream beneficiaries may include simple weather dashboards that we all use in our daily lives.
As Python is the language for DE, week 7 of the course at Northcoders is focused on getting to grips with the basics of the language. Set-up, control flow, functions, comprehensions, and modules are covered in the lectures (2 per day this week) seconded by many katas with increased difficulty. Object-Oriented Programming (OOP), Errors, and Testing are not forgotten, and we also get a fair idea about Pythonic Code.
Week 8 kicks off with two days focused on SQL advanced queries. The rest of the week develops data architecture patterns. We begin to understand what a Data Warehouse is (tables of data optimised for analysis) and how that is different from a “Data Lake” (many files) and a “Data Lakehouse” (a combination between a Data Warehouse and a Data Lake). We apply this theory by looking into how to “normalise” data and how to ingest, clean and prepare data to be stored in a data warehouse. This is part of the so-called ETL pattern, extract — transform — load (capture and prepare the data; make them available for analysis). Week 8 also sets the scene for one of the most intense learning experiences which happen in week 9: how to deploy a lambda function on Amazon Web Services (AWS).
Monday in Week 9 is focused on using the graphical user interface of AWS to create a virtual machine and load a Python function on the cloud; we attach a ‘trigger’ or a program that activates the function at a certain time. Over the next 2 days, we learn how to achieve similar outcomes by using the command line interface. This means one interacts with the AWS cloud by code and we can see how the system follows up on these commands. Finally, Thursday and Friday were focused on putting these commands into a script. In DE, the script is like writing a letter with detailed instructions to your lawyer. It is a program which contains all the commands we would like the cloud service to execute without our having to intervene in any way. On Friday, we learn how to deploy this script on GitHub Actions and the advantages of Continuous Integrations and Continuous Delivery — small changes to the base code to debug, maintain and refine the existing code.
The teaching team is extremely engaging and knowledgeable in Python, Data Engineering and AWS. The skills gained with AWS are transferable to other platforms; it is unlikely that you are not going to use it in your professional career at some point.
I hope to be able to write about the final 4 weeks of the bootcamp at the right time but before that, I would like to comment on what I would have done differently if I were to do the course again:
1. Measure progress by drawing comparisons with versions of yourself.
I would not compare myself with others as some of the students will have more coding experience when coming into the course.
2. Read the documentation carefully.
I would try to read twice as much documentation as I have. Training your mind to find the relevant information in new, technical, and sometimes obscure terminology is a lifesaver when it comes to working independently or driving development with a coding partner.
3. Ask questions.
At every opportunity, I would ask for advice from mentors and tutors; in those 10 or 20 minutes of one-to-one, the training will be focused on your gaps in knowledge or technical ability.
Vasi
Data Engineering Student