I try hard to research Hadoop multi tenant solution and find spaceQuota is the key. After some test,I can`t really understand how hdfs calculate space quota?
I have a hdfs directory "/cb/kj_1" and set space quota of 1G,and some files of fixed size generated by dd command .
Firstly,I put four 10MB file to "/cb/kj_1",and count the space "hdfs dfs -count -q -v -h /cb/kj_1",the result as follows:
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
none inf 1 G 880 M 1 12 48 M /cb/kj_1
These file cost 4*10*3=120M space and 880M remaining,that`s all-right.
Then,I put two 100M file to "/cb/kj_1/",count result as follows:
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
none inf 1 G 280 M 1 14 248 M /cb/kj_1
These file cost 2*100*3=600M space and 280M remaining,without a doubt.
Finally,I put one 1K size file to "/cb/kj_1",this time hdfs reject my request,and throw "org.apache.hadoop.hdfs.protocol.DSQuotaExceededException":
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.DSQuotaExceededException): The DiskSpace quota of /cb/kj_1 is exceeded: quota = 1073741824 B = 1 GB but diskspace consumed = 1182793728 B = 1.10 GB
at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:214)
at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:241)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1074)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:903)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:862)
The error message shows how many space it needs,after some calculation, (Diskspace-SPACE_QUOTA_size+REM_SPACE_QUOTA)/3=128M.
The last 1k file I put to hdfs would occupy one full block ,but the 10M file I put before cost 10M space,and 100M file cost 100M space.
It seems that one small file is not cost 128M(hdfs blocksize),if so,directory of 1G space quota can only contain 1024/3/128=2 file,but I put 6 files successfully.
The stat command shows the block size
hadoop fs -stat "%b %o" /cb/kj_1/10mb_1.dd
10485760 134217728
hadoop fs -stat "%b %o" /cb/kj_1/100mb_1.dd
104857600 134217728
I want to know how hdfs calculate the space(quota) when put files to directory ,and how it calculate space of files exists in directory.