The answer you linked to uses NumPy to mix together two streams by averaging the frames from each stream.
You're worried that this might be too slow. I doubt it, because NumPy is just looping over a C array in C, the same way a dedicated software mixer would (whether in your sound server, your soundcard driver, or some OS-level mixer). But rather than guessing, let's find out.
First, let's assume that we're dealing with 20ms frames, and our callback will be called for every single frame, since that's just about the worst case scenario. And let's assume for concreteness that we've got 44.1KHz 16-bit stereo streams, so each one is 1764 samples. So, let's write this the most inefficient way I can think of and then test it:
In [4]: frame = np.zeros(1764, dtype=np.int16)
In [5]: %timeit np.mean([frame]*6, axis=0, dtype=np.int16)
1000 loops, best of 3: 1.01 ms per loop
To get to 20ms, I have to mix 387 streams. 6 isn't going to be a problem.
And if it is a problem, you need to do something trickier—e.g., prebuffer the mix so you have much larger chunks to work on than single frames (for more looping in C, less in Python), or even access a hardware mixer—which you probably can't do through PyAudio.