1

I am trying to figure out an appropriate production deployment strategy for an Apache Kafka cluster with High Availability.

I was unable to find a specific documentation which describes such a strategy. So based on the articles I found, I have come up with the following strategy.

  • 3 zookeeper nodes
  • 3 kafka brokers (each having a replica of all the topic partitions that I'm planning to use)
  • Replication factor of 3 for each Topic
  • on 3 physical machines (each having a zookeeper node and a broker node)

The reason why I have decided to have a zookeeper node and a broker node on each machine is to avoid a 'brain split' in an event of a network partitioning as described in this question and the accepted answer

I want to know,

  1. Whether there is a adverse performance impact in having both a zookeeper node and a broker node on a single machine? (and whether it would make more sense to go ahead with 6 physical machines by deploying such that each machine would either have a kafka broker or a zookeeper node?)
  2. Whether the deployment strategy I have come with is suitable for a production deployment? (Or how it can be improved?)

Also, if you have come across a guide which recommends a suitable deployment configuration, kindly include its link.

Appreciate any help on this matter. Thanks in advance.

1 Answers1

2
  • Three is the minimum number of brokers you'll need, but you might want more for additional redundancy and/or capacity
  • Usually, people deploy their Kafka brokers and Zookeeper nodes on separate hardware.

This Reference Architecture should help you further.

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92