I would like to find out the latest files from hdfs directory and keep them as it is and delete older files.
I have 4 files in hdfs directory /user/hive/warehouse/test :
-rwxrwx--x+ 3 hive hive 9 2018-11-13 04:13 /user/hive/warehouse/test/bc4151c16c98d191-72314e2e00000000_640731000_data.0.
-rwxrwx--x+ 3 hive hive 9 2018-11-13 04:35 /user/hive/warehouse/test/bc4151c16c98d191-72314e2e00000000_640731001_data.0.
-rwxrwx--x+ 3 hive hive 12 2018-11-13 08:31 /user/hive/warehouse/test/944adb43a3a5f955-659ed0e100000000_916442110_data.0.
-rwxrwx--x+ 3 hive hive 12 2018-11-13 08:31 /user/hive/warehouse/test/944adb43a3a5f955-659ed0e100000000_916442111_data.0.
I want to delete all files which are not latest.
That means my directory should contain the files with timestamp 2018-11-13 08:31
I can sort those files using hdfs dfs -ls /user/hive/warehouse/test | sort -k6,7
How to delete older files? hdfs commands do not have the command like find which would extract only the latest files.