I am curious to understand the behavior of spark framework when the file size is greater than the cluster memory size. Let us hypothetically assume that we have 2 Nodes in cluster with 64 GB Memory(32 GB + 32 GB) and the file to process is 100 GB. I have read that 50% of the Memory in the Node is allocated as Storage Memory for RDD persistence and remaining 50% of Memory in the Node is allocated as Working Memory. Working Memory can go up and down and may occupy Storage Memory if it is available for use.
So let us suppose that there are no persisted blocks and whole 64 GB Memory is available for use. In such case, will the spark process the 100 GB file or it will fail since it cannot fit 100 GB file into Memory?