I am trying to figure out an appropriate production deployment strategy for an Apache Kafka cluster with High Availability.
I was unable to find a specific documentation which describes such a strategy. So based on the articles I found, I have come up with the following strategy.
- 3 zookeeper nodes
- 3 kafka brokers (each having a replica of all the topic partitions that I'm planning to use)
- Replication factor of 3 for each Topic
- on 3 physical machines (each having a zookeeper node and a broker node)
The reason why I have decided to have a zookeeper node and a broker node on each machine is to avoid a 'brain split' in an event of a network partitioning as described in this question and the accepted answer
I want to know,
- Whether there is a adverse performance impact in having both a zookeeper node and a broker node on a single machine? (and whether it would make more sense to go ahead with 6 physical machines by deploying such that each machine would either have a kafka broker or a zookeeper node?)
- Whether the deployment strategy I have come with is suitable for a production deployment? (Or how it can be improved?)
Also, if you have come across a guide which recommends a suitable deployment configuration, kindly include its link.
Appreciate any help on this matter. Thanks in advance.