Change block size of dfs file

Question

My map is currently inefficient when parsing one particular set of files (a total of 2 TB). I'd like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can't find how to do it in the documentation for only one set of files and not the entire cluster.

Which command changes the block size when I upload? (Such as copying from local to dfs.)

Not sure if/when the parameter changed, but it is now called "dfs.block.size". — , Jan 26 '11 at 00:27
@ozw1z5rd AFAIK you can't change split size, or the number of splits. For MR2, it is dependent on your block size, and the number of splits is automatically computed on job submission. — ᐅdevrimbaris, Dec 21 '16 at 18:55

score 31 · Answer 1 · answered Jun 21 '11 at 15:52

31

For me, I had to slightly change Bkkbrad's answer to get it to work with my setup, in case anyone else finds this question later on. I've got Hadoop 0.20 running on Ubuntu 10.10:

hadoop fs -D dfs.block.size=134217728 -put local_name remote_location

The setting for me is not fs.local.block.size but rather dfs.block.size

answered Jun 21 '11 at 15:52

KWottrich

453
1
6
11

7

note the new change in hadoop 2.0.4: dfs.blocksize (http://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html) – Kiran Jun 07 '13 at 06:13
1

dfs.blocksize: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml – Shehaaz Jul 16 '13 at 20:42

Bkkbrad · Accepted Answer · 2010-04-21T00:06:47.840

I change my answer! You just need to set the fs.local.block.size configuration setting appropriately when you use the command line.

hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location

Original Answer

You can programatically specify the block size when you create a file with the Hadoop API. Unfortunately, you can't do this on the command line with the hadoop fs -put command. To do what you want, you'll have to write your own code to copy the local file to a remote location; it's not hard, just open a FileInputStream for the local file, create the remote OutputStream with FileSystem.create, and then use something like IOUtils.copy from Apache Commons IO to copy between the two streams.

After using the command mentioned above, I tried to check the blocks in the file. hdfs fsck hvr.out1 -files -blocks -locations. hvr.out1 is my file. It looks like it has not split using the blocksize I specified. Instead it used the default blocksize — ForeverLearner, Aug 21 '17 at 08:44

score 4 · Answer 3 · edited Oct 05 '16 at 16:42

4

In conf/ folder we can change the value of dfs.block.size in configuration file hdfs-site.xml. In hadoop version 1.0 default size is 64MB and in version 2.0 default size is 128MB.

<property> 
    <name>dfs.block.size<name> 
    <value>134217728<value> 
    <description>Block size<description> 
<property>

edited Oct 05 '16 at 16:42

RamenChef

5,557
11
31
43

answered Oct 05 '16 at 15:44

madhur

41
1

dfs.block.size 134217728 Block size we have to put close tags / – oula alshiekh May 10 '17 at 07:57
1

Will this update the block size for all the exiting files and newly coming files? – Hari Ram Jul 30 '19 at 12:37

score 3 · Answer 4 · edited Apr 14 '14 at 00:54

3

you can also modify your block size in your programs like this

Configuration conf = new Configuration() ;

conf.set( "dfs.block.size", 128*1024*1024) ;

edited Apr 14 '14 at 00:54

Mubashar

12,300
11
66
95

answered Apr 14 '14 at 00:33

inuyasha1027

31
1

How will this configuration setting effect data that is already stored at a particular default blocksize? Hadoop 2.5.2 128MB? – Jeremy Hajek Nov 29 '16 at 05:29

score 3 · Answer 5 · answered May 07 '16 at 07:05

3

We can change the block size using the property named dfs.block.size in the hdfs-site.xml file. Note: We should mention the size in bits. For example : 134217728 bits = 128 MB.

answered May 07 '16 at 07:05

Rengasamy

1,023
1
7
21

3

Are you sure that the size is bits? In your example, the value 134217728 is in bytes, not in bits. – Luca D'Amico Feb 19 '18 at 10:12

Change block size of dfs file

5 Answers5

Linked