Back to Senior Level
Big Data Engineer
Prerequisite
You are tasked to create a data pipeline using pub/sub. A microservice will publish data to a Kafka topic and another microservice will subscribe to read the contents from the Kafka topic and put results in an S3 file. The S3 file will be loaded to PostgreSQL for analysis.
Requirements
- Get Kafka and PostgreSQL running locally using docker – you can find any docker file from the internet
- Write a microservice in Java (or another language you prefer) that publishes randomly generated temperature readings in Celcius to a kafka topic called (
celcius-readings
) every 1 second. The data in the topic should be: adouble
celcius reading and along
epoch timestamp - Write a microservice in Java (or another language you prefer) that subscribes to the
celcius-readings
kafka topic and outputs (appends) each celcius and epoch timestamp to an S3 file. - Write a task to load the S3 file to PostgreSQL every 3 minutes in the subscriber microserice
- Write a SQL Query that can be run on the PostgreSQL database to find top 10 hottest temperatures.
Deliverable
- The repo should have a good README explanation for us to get the project running locally and review the Tasks
- We will run the project using
docker-compose up
. Everything should come up with this one command (i.e. kafka, postgresql, producer, and consumer) or otherwise be explained in the README - If we need to run things in addition to
docker-compose up
, please list out what we should be running from the command line. - Instructions to
exec
into the postgresql pod to run the SQL query to find the top 10 hottest temperatures. - A GitHub repo with read permissions given to GitHub users
rafty8s
,bsneider
,omnipresent07
, andbarakstout
(how to invite collaborators)