Back to Senior Level

Big Data Engineer


Prerequisite

You are tasked to create a data pipeline using pub/sub. A microservice will publish data to a Kafka topic and another microservice will subscribe to read the contents from the Kafka topic and put results in an S3 file. The S3 file will be loaded to PostgreSQL for analysis.

Requirements

  • Get Kafka and PostgreSQL running locally using docker – you can find any docker file from the internet
  • Write a microservice in Java (or another language you prefer) that publishes randomly generated temperature readings in Celcius to a kafka topic called (celcius-readings) every 1 second. The data in the topic should be: a double celcius reading and a long epoch timestamp
  • Write a microservice in Java (or another language you prefer) that subscribes to the celcius-readings kafka topic and outputs (appends) each celcius and epoch timestamp to an S3 file.
  • Write a task to load the S3 file to PostgreSQL every 3 minutes in the subscriber microserice
  • Write a SQL Query that can be run on the PostgreSQL database to find top 10 hottest temperatures.

Deliverable

  • The repo should have a good README explanation for us to get the project running locally and review the Tasks
  • We will run the project using docker-compose up. Everything should come up with this one command (i.e. kafka, postgresql, producer, and consumer) or otherwise be explained in the README
  • If we need to run things in addition to docker-compose up, please list out what we should be running from the command line.
  • Instructions to exec into the postgresql pod to run the SQL query to find the top 10 hottest temperatures.
  • A GitHub repo with read permissions given to GitHub users rafty8s,bsneider, omnipresent07, and barakstout (how to invite collaborators)