1

I want to migrate one apache spark based usecase to apache flink. In this usecase, i distribute some files / directories to the working directory of task nodes. I use the api sc.addFile(). However i could not find the equivalent features in apache flink. I did see an api env.registerCachedFile() but it does not move the files/directories to the worker nodes. Can someone throw light in this issue? thanks

Bala
  • 675
  • 2
  • 7
  • 23
  • The registerCachedFile() method only accepts files that are already present in some distributed FS supported by Flink. They are then downloaded on demand into the workers local file system, and cached there to avoid unnecessary trips to the DFS. – Chesnay Schepler Mar 24 '17 at 08:36
  • Hi chesnay, thanks for your response. I have now put the file in hdfs and specified the location in registerCachedFile() method. In my taskmanager log, i do see a line called "obtaining local cache for file ". But then, I do not see this file in local working directory of the task nodes. Do we need some trick to get access to the file inside the mapper? please share with me some link or resources to understand how to get access to the cached file inside the mapper. I did a "ls -lh" inside the mapper to see the cached file but it was missing. Thanks again. – Bala Mar 27 '17 at 06:39
  • As per the documentation https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#distributed-cache you can get access to the file through the RuntimeContext. – Chesnay Schepler Mar 28 '17 at 09:04
  • I could get access to the cached files using RunTime context. However, i need cached files to be present in the current working directory. And I need to execute one such file. This is where, I am failing. I could locate the cached files in /tmp/ – Bala Mar 28 '17 at 09:15
  • You cannot define where the Distributed Cache will put these files. Why do you need them in the current working directory, and more importantly, what do you expect that to be? Since these files are downloaded on each TM I'm not sure if there even is a reasonable CWD. Anyway, you can of course copy these files to any location you wish; you will however have to take care of concurrency issues and such yourself. – Chesnay Schepler Mar 29 '17 at 10:12

0 Answers0