0

I am using

from pyhdfs import HdfsClient

fs = HdfsClient(hosts=..., user_name='hdfs', ..)
fs.delete(path_table, recursive=True)

However, after I deleted the directory, I could not find it in the trash directory located in /user/hdfs/.Trash/Current/.

Does it look like the pyhdfs deletes the files with '-skipTrash'? How to make it softly delete the target? I could not find method like '-mv' to implement this request manually, either.

This is the description from the official document: enter image description here

user2894829
  • 775
  • 1
  • 6
  • 26
  • What is `-skipTrash`? [All pyhdfs seems to do is issue a WebHDFS `DELETE` request against the resource given.](https://github.com/jingw/pyhdfs/blob/master/pyhdfs/__init__.py#L626-L637) – AKX Mar 10 '22 at 07:58
  • hi @AKX Thanks for your reply. `-skipTrash` is a HDFS delete command: if you run `hdfs dfs -rm xxx`, the target will be moved to a 'trash' directory then it will be finally cleaned after a set retention time, you can find it back. If you use `hdfs dfs -rm -skipTrash xxx`, the target will be immediately destroyed. It's dangerous.. – user2894829 Mar 10 '22 at 08:12

1 Answers1

1

No skipTrash-esque option (or an alternative to not skip the trash) is documented for the WebHDFS delete command.

The hdfs dfs rm command primarily attempts to use the Trash API (unless told not to), and if trashing fails, it deletes the file using what I'd assume is the same underlying operation the WebHDFS API uses.

Trashing seems to be implemented ((1), (2)) as a create-directories-and-rename command.

It's likely you would need to implement that sequence by hand when using WebHDFS (and by extension pyhdfs).

AKX
  • 152,115
  • 15
  • 115
  • 172