Back to Senior Level

Kafka Visualization


Prerequisite

  • Experience working in a data platform engineering capacity dealing with data streams
  • Integrating pipelines

Challenge

You have been hired by ACME Corp, to visualize streams of data coming from devices in the ocean. Your point of contact has asked you to ingest the raw streaming data from these devices and visualize it on a map and show how the devices move in the ocean over time.

The Task

Architecture Diagram

  • The data is streamed daily and you are provided data for Day 1 and Day 2.
  • The data can be picked up from here for Day 1 and Day 3
  • It is a fat JSON file that contains the location information from 90 devices in the ocean. Each device has a unique spotterId.
  • The structure of the json is as below:
{}"]all_{}data"}"d:at[a"""]"slw:piaomv{tie{}ttse""r::I""""""""""d[spmppmmtll"ieeeeeeiao:gaaaaaamtnnknkknneigiPPDDDDstifeeiiiitutirrrrrraduciieeeemedaooccccp"enddtttt":"t""iiii::W::ooooannnnv"a"ae:l:lHSSeppirrgeehaatdd""":::
Requirements
  • Prepare a docker compose that contains the following: Confluent Kafka, Apache Druid, and Apache Superset
  • Publish Day 1 data into Kafka topic ocean-data
  • Connect this kafka topic to Apache Druid
  • Connect Apache Superset to the Druid database
  • Pick a Map dashboard from Superset and show the latitude / longitude locations of the ocean devices
  • Publish Day 2 data into the same topic
  • Refresh the Map dashboard in superset to now see day 2 data on the map

Deliverable

  • A GitHub repo with read permissions given to GitHub users rafty8s,bsneider, omnipresent07, and barakstout (how to invite collaborators)
  • The repo should have enough README.md instructions
  • A docker-compose.yml using which everything can be spun up
  • Any additional commands that need to be executed