Why does spark shuffle when it is going to use broadcast while using Adaptive Query Execution

Question

I was going through the Spark SQL for a join optimised using Adaptive Query Execution,

On the right side, spark get to know the size of table is small enough for broadcast and therefore decides for broadcast hash join.

As we know, broadcast hash join in a narrow operation, why do we still have exchange in the left table (large one)

Even in the final Physical plan, exchange is there

thebluephantom · Answer 1 · 2021-12-24T09:37:21.923

1

You get a local shuffle based on Mappers, not Reducers. I.e. a localized shuffle instead of regular such shuffle.

edited Dec 24 '21 at 09:37

answered Dec 24 '21 at 00:24

thebluephantom

1 Answers1