I have a file on hdfs having size 11 gb. I want to split it into multiple files in 1 gb. How can I do that? My hadoop version is 2.7.3
Asked
Active
Viewed 3,034 times
0
-
Why you want to split the file ? – Sandeep Singh Jul 26 '17 at 17:01
-
Possible duplicate -https://stackoverflow.com/questions/29567139/how-to-divide-a-big-dataset-into-multiple-small-files-in-hadoop-in-an-efficient – Rahul Sharma Jul 26 '17 at 18:34
-
`hdfs dfs -Ddfs.block.size=1G -put file` – philantrovert Jul 27 '17 at 09:15
2 Answers
0
If you have spark, try below-
Below example splits input-file into 2 files:
spark-shell
scala> sc.textFile("/xyz-path/input-file",2).saveAsTextFile("/xyz-path/output-file")

Rahul Sharma
- 5,614
- 10
- 57
- 91