5

I'm having what appears to be a caching problem when using /dev/mem with mmap on a dual ARM processor system (Xilinx Zynq, to be exact). My configuration is asymmettric, with one processor running Linux and the other processor running a bare metal application. They communicate through a block of RAM that isn't in the Linux virtual memory space (it was excluded by the devicetree file). When my userspace Linux application writes to memory using the pointer returned from mmap(), it can take anywhere from 100 ms to well over a second for the second processor to detect the changed memory content.

On the open() call to /dev/mem, I tried to specify O_RDRW, O_SYNC, and O_DIRECT, but the O_DIRECT caused the open to fail, so I removed O_DIRECT. I thought O_SYNC should have guaranteed that data was written to memory before the write() call returned, but I'm using a memory pointer instead of writing through write(). I don't see any parameters on the mmap() call that would seem to address caching issues.

I've tried calling fsync(fd) and fdatasync() after writing to memory, but that didn't change the behavior.

What DID seem to work was spawning this command immediately after the memory write: sync; echo 3 /proc/sys/vm/drop_caches

What is the simplest way to get writes via a mapped memory pointer to flush immediately?

edj
  • 523
  • 7
  • 17
  • 1
    Have you tried [msync](http://pubs.opengroup.org/onlinepubs/009696699/functions/msync.html)? – Duck Dec 23 '13 at 20:19
  • 1
    Yes, I should have mentioned that I tried. It didn't seem to work, either, but I could have misused it. I wasn't sure which address to specify. The particular data I want to flush is not at the beginning of the mapped memory, so I wasn't sure whether I should use the original pointer address returned from mmap, or any page aligned address in mapped space. – edj Dec 23 '13 at 21:08
  • In theory the page aligned address containing the beginning of the data you want flushed and the length but in practice I wonder if it matters if you just use the base address and entire length. My guess is the kernel just flips past pages it knows aren't dirty anyway. Maybe a kernel hacker could confirm or correct that. I am surprised that didn't work but then I don't really understand what you are doing on the other side. – Duck Dec 23 '13 at 21:32
  • Digging into why msync isn't working, I realize that it is tossing an error. Depending on how I call it, I either get EINVAL or ENOMEM. The physical address I am mapping is 0x1FFFF000 with a size of 4096 bytes (the system page size). That is currently the only page I use. When I call msync(), should I be specifying the physical address (0x1FFFF000), or the virtual address that was returned from mmap()? – edj Dec 23 '13 at 22:13
  • Based on web examples, I now believe that the virtual address is what I need to pass in. I only mapped one page with mmap(), so I am passing to msync() that returned virtual address, a size of 4096 (mapping size equals system page size), and a flag value of MS_SYNC (0x0004). Getting EINVAL returned in errno. What's wrong with my parameters? – edj Dec 23 '13 at 22:42
  • Searched for "mmap invalid argument" and found posts indicating that some kernels have memory drivers that don't support fsync(), and that fsync() is required for msync() to work. Posters reported seeing EINVAL even though their arguments were correct. I have verified that I can call msync() with MS_ASYNC, but get EINVAL argument with MS_SYNC. So, am I out of luck? – edj Dec 23 '13 at 23:15
  • Taking a step back you give the physical addr as the last parm (offset) to mmap and NULL as the first parm so the kernel decides to map it, right? So the return addr from mmap is what you should be using in msync. What other flags do you set for mmap - read/write private, shared, fixed, etc - and you are 100% certain that call succeeds and you are seeing the changes (if slower than you would like) on the other side? My main question is what effect map_private/shared has (if any) for memory shared with an app linux does know about. Also wonder if msync w/ ms_async actually silently fails – Duck Dec 24 '13 at 01:24
  • even if the call returns immediately because it is just scheduling the sync. I hope you can an answer (and workaround) to the driver question. – Duck Dec 24 '13 at 01:26
  • I am as certain as I can be that mmap succeeded. My exact call to mmap is this, after which I test the returned pointer for MAP_FAILED to determine pass/fail: m_pMem = (char *) ::mmap( NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, m_fd, m_req_addr ); – edj Dec 26 '13 at 16:36
  • That looks right. Out of curiosity, did you try using MAP_FIXED? How did you make out on the memory driver angle? – Duck Dec 26 '13 at 23:56
  • I did not previously try MAP_FIXED, but I just did. No difference. – edj Dec 27 '13 at 15:42
  • unfortunately I am out of ideas. If you do resolve it please drop a line. I would be interested to know if and how it works out. – Duck Dec 27 '13 at 19:28
  • A possible problem here is that the CPUs don't share a cache coherency protocol. You probably need to flush the cache on the CPU after writing to make it visible to the other CPU. – Zan Lynx Feb 24 '14 at 18:51
  • Did you manage to solve it? – JohnTortugo Sep 22 '16 at 18:11
  • No, I never solved it, but the "flush the cache" link in the comment below gives me some options to try. Unfortunately, I'm working on other things these days and don't know when or if I will get back to it. – edj Nov 17 '16 at 00:23

1 Answers1

1

fsync, etc. all synchronize the memory mapped region to the backing block device (e.g., file).

They do not affect the CPU data cache. You will either need to use explicit cache clean calls to flush the CPU cache to DRAM or you will have to use the ACP port.

The ACP port is supposed to be cache coherent, but I've never gotten it to work.

Here's an answer for how to flush the cache. I believe that code needs to go in your device driver. We have that code packaged in a generic "portalmem" driver. It enables your application to allocate memory that you can share with your hardware, and it provides an ioctl for flushing the cache after your application writes to it.

Jamey Hicks
  • 2,340
  • 1
  • 14
  • 20