0

I try hard to research Hadoop multi tenant solution and find spaceQuota is the key. After some test,I can`t really understand how hdfs calculate space quota?

I have a hdfs directory "/cb/kj_1" and set space quota of 1G,and some files of fixed size generated by dd command .

Firstly,I put four 10MB file to "/cb/kj_1",and count the space "hdfs dfs -count -q -v -h /cb/kj_1",the result as follows:

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             1 G           880 M            1           12               48 M /cb/kj_1

These file cost 4*10*3=120M space and 880M remaining,that`s all-right.

Then,I put two 100M file to "/cb/kj_1/",count result as follows:

       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             1 G           280 M            1           14              248 M /cb/kj_1

These file cost 2*100*3=600M space and 280M remaining,without a doubt.

Finally,I put one 1K size file to "/cb/kj_1",this time hdfs reject my request,and throw "org.apache.hadoop.hdfs.protocol.DSQuotaExceededException":

    Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.DSQuotaExceededException): The DiskSpace quota of /cb/kj_1 is exceeded: quota = 1073741824 B = 1 GB but diskspace consumed = 1182793728 B = 1.10 GB
    at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:214)
    at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:241)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1074)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:903)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:862)

The error message shows how many space it needs,after some calculation, (Diskspace-SPACE_QUOTA_size+REM_SPACE_QUOTA)/3=128M.

The last 1k file I put to hdfs would occupy one full block ,but the 10M file I put before cost 10M space,and 100M file cost 100M space.

It seems that one small file is not cost 128M(hdfs blocksize),if so,directory of 1G space quota can only contain 1024/3/128=2 file,but I put 6 files successfully.

The stat command shows the block size

hadoop fs -stat "%b %o" /cb/kj_1/10mb_1.dd
10485760 134217728

hadoop fs -stat "%b %o" /cb/kj_1/100mb_1.dd
104857600 134217728

I want to know how hdfs calculate the space(quota) when put files to directory ,and how it calculate space of files exists in directory.

moyiguke
  • 70
  • 1
  • 10
  • try ```hadoop fs -stat ``` command on /cb/kj_1 and every file you put in that directory, you may have more information on their blocks and space usage . btw: i can not see the picture you attached, maybe copy&paste better? – James Li Sep 06 '19 at 08:38
  • Thanks for your advice.I edit my question .The command "hadoop fs -stat "%b %o" /cb/kj_1/10mb_1.dd" shows "10485760 134217728".First number is 10M(file size),next is 128M(block size).What strange here is ,firstly 10M file cost 30M space ,but 1KB file cost 128M space when quota is reached – moyiguke Sep 06 '19 at 09:23
  • 1
    related question https://stackoverflow.com/questions/15678235/how-hdfs-calculate-the-available-blocks – James Li Sep 06 '19 at 09:55
  • 1
    From Hadoop docs (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html#Space_Quotas): "The space quota is a hard limit on the number of bytes used by files in the tree rooted at that directory. **Block allocations fail if the quota would not allow a full block to be written.** Each replica of a block counts against the quota." – mazaneicha Sep 06 '19 at 14:16
  • Thanks for resultful guiding.So the way to put a small file to the space-quotaed directory which is near exhausted,would be decrease block size when putting file.The command is like " hdfs dfs -D dfs.block.size=67108864 -put local_name remote_name". I tested it with hadoop 2.7.3. Thank you again for your help – moyiguke Sep 09 '19 at 06:00

0 Answers0