2

I have developed a small library of TensorFlow custom ops. These ops are programmed in C++ and make use of Eigen.

A strange thing happens when I compile my custom ops module with the -march=native compiler option. The resulting module will consistently abort with a "pointer being freed was not allocated" error originating from free() in libsystem_malloc.dylib. If I compile without the -march=native compiler option, then there is no crash; that is, compiling with this Bazel command:

bazel build -c opt --copt=-march=native --config=cuda //tensorflow/core/user_ops:custom_ops.so

.. produces a crashing module, but this is fine:

bazel build -c opt --config=cuda //tensorflow/core/user_ops:custom_ops.so

Running a test script within lldb and disassembling, I am seeing the exact same assembly, modulo ASLR.

So, what might be the reason why compiling with -march=native causes a "pointer being freed was not allocated" error?

My compiler is Apple LLVM version 8.0.0 (clang-800.0.42.1) and I am running macOS 'Sierra' 10.12.4 (16E195).

UPDATE Address Sanitizer reports:

==4445==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x62a000234220 in thread T29
    #0 0x1000d9db9 in wrap_free (libclang_rt.asan_osx_dynamic.dylib+0x4adb9)
...

0x62a000234220 is located 32 bytes inside of 24032-byte region [0x62a000234200,0x62a000239fe0)
allocated by thread T29 here:
    #0 0x1000d9bf0 in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib+0x4abf0)
    #1 0x12f4dec62 in Eigen::BDCSVD<Eigen::Matrix<double, -1, -1, 1, -1, -1> >::allocate(long, long, unsigned int) (custom_ops.so+0x26fc62)
    #2 0x12f4d20a1 in Eigen::BDCSVD<Eigen::Matrix<double, -1, -1, 1, -1, -1> >::compute(Eigen::Matrix<double, -1, -1, 1, -1, -1> const&, unsigned int) (custom_ops.so+0x2630a1)
    #3 0x12f6822d6 in Eigen::BDCSVD<Eigen::Matrix<double, -1, -1, 1, -1, -1> >::BDCSVD(Eigen::Matrix<double, -1, -1, 1, -1, -1> const&, unsigned int) (custom_ops.so+0x4132d6)
...
Daniel Trebbien
  • 38,421
  • 18
  • 121
  • 193
  • 1
    Have you tried [address sanitizer](https://clang.llvm.org/docs/AddressSanitizer.html)? Most likely you have a bug that just happens to only manifest with certain compiler optimizations. – Jesper Juhl May 08 '17 at 19:14
  • 1
    Hi @JesperJuhl: That's a good suggestion. Trying it out now. Currently I am seeing "Symbol not found: \___asan_option_detect_stack_use_after_return", so I need to figure this out... – Daniel Trebbien May 08 '17 at 19:22
  • I figured out that I need to link with `-fsanitize=address` as well. Now I am seeing "Interceptors are not working", and when I set the suggested `DYLD_INSERT_LIBRARIES` env var, I get "Library not loaded: @rpath/libcusolver.8.0.dylib", so it appears that `DYLD_LIBRARY_PATH` [is no longer working](http://stackoverflow.com/questions/34114587/dyld-library-path-dyld-insert-libraries-not-working). Still working on this... – Daniel Trebbien May 08 '17 at 19:43
  • @JesperJuhl: Figured it out. I have updated my question with the output from Address Sanitizer. – Daniel Trebbien May 08 '17 at 20:09
  • 2
    seems you may have found a bug in Eigen. – Jesper Juhl May 08 '17 at 20:14

0 Answers0