In this project, we create a streaming application backed by Apache Kafka using a Python client. This is a simple real-time fraud detection system. We will generate a stream of synthetic transactions and use Python script to process those stream of transactions to detect which ones are potential fraud.
Below is the folder map to all the files we have for the project:
.
├── docker-compose.yml
├── detector
│ ├── Dockerfile
│ ├── app.py
│ └── requirements.txt
├── generator
│ ├── Dockerfile
│ ├── app.py
│ ├── transactions.py
│ └── requirements.txt
├── start.sh
├── start_main_docker_compose.sh
├── read_whole_topic.sh
├── restart.sh
├── stop.sh
We will produce fake transactions on one end, filter and log those that look suspicious on the other end. This will include:
- a transaction generator (which produces the synthetic data for the process).
- a fraud detector. Both applications will run in Docker containers and interact with the Kafka cluster.
The fraud detector is a typical example of a stream processing application. It takes a stream of transactions as an input, performs the filtering task, then outputs the result into two separate streams - those that are legitimate, and those that are suspicious, an operation also known as branching.
Assumption: Since in the real world, deteching fraud is a complex problem and it depends on so many different metrics to determine fraud. In this project, we will keep the metric simple which it is illegal to send more than $900.00 at a time. As a result, any transaction whose amount is greater than 900 can be considered as fraud.
- From the Bash shell run:
$ chmod +w ./start.sh
$ ./start.sh
- In another tab of Bash shell, run:
$ chmod +w ./start_main_docker_compose.sh
$ ./start_main_docker_compose.sh
The we should see this output:
We can see the legit transaction which lower than our metric which is: $900.00
- Read the whole topic, run this command:
$ docker-compose -f docker-compose.kafka.yml exec broker kafka-console-consumer --bootstrap-server localhost:9092 --topic queueing.transactions --from-beginning
or run:
$ chmod +x ./read_whole_topic.sh
$ ./read_whole_topic.sh
and see the total number of the read messages, Run Ctrl + C
:
- Run
Ctrl + C
to stop thekafka-console-consumer
or Stop the generator and delete all the containers/networks/volumes:
$ chmod +x ./stop.sh
$ ./stop.sh