3

I have read multiple articles on this topic including below but things are still hazy to me: http://elinux.org/Tims_Notes_on_ARM_memory_allocation

ARM Linux kernel page table

Linux kernel ARM Translation table base (TTB0 and TTB1)

ARM hardware has 4096 entries of 4 byte each in L1 translation table. each entry translates a 1MB region in memory. At second level it has 256 entries of 4 bytes each. And each of second level entry translates a 4KB page in memory. So according to this any virtual address has to be divided into 12-8-12 to map to above scheme.

But on 32 bit ARM linux side this division is 11-9-12. Where L1 translation table consists of 2048 entries where each entry is 8 bytes. Here two 4 byte entries are clubbed together and the pointed second level translation tables are laid out one after the other in memory, so that at second level instead of 256 there are 512 entries. Additionally since Linux memory management expects various flags non native to ARM we define 512 more entries for linux page table(one for each 2nd level HW page table).

Now the question is Linux does not enforce PGD/PMD/PTE size (however it enforces page size to be 4K. Thus PAGE_SHIFT is set to 12), then why do we select 11-9-12 layout(i.e. 11 bits for PGD and 9 bits for HW PTE). Is it just to make sure that 512HW +512Linux PTE are aligned to a Page boundary ?

If someone could explain the logic behind this division in detail would be great....

Community
  • 1
  • 1
nagla
  • 87
  • 1
  • 9

2 Answers2

3

As you say, in the ARM short-descriptor format each second-level page table is 1KB in size. Even with the associated shadow page table that only makes 2KB, meaning 50% of every page allocated for second-level tables would be entirely wasted.

Linux just pretends that the section size is 2MB, rather than the actual 1MB of the hardware, by allocating first-level entries in pairs, so that the corresponding pair of second-level tables can be kept together in a single page, avoid that wastage, and keep the management of page table memory really simple.

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • Why can't Linux pretend that section size is 4MB and keep four second level page tables in one page ? – nagla Jul 15 '16 at 05:40
  • 1
    It could, but then it would also need to keep track of a second page somewhere for 4 shadow page tables. That's probably not worth the bother. – Notlikethat Jul 15 '16 at 08:33
  • *pretend* is a bit misleading. The generic MMU layer is told that the L1 entries are 2MB. The ARM MMU code is aware of the physical hardware. The challenge is to map the ARM MMU hardware (different between ARM CPU families) to the generic Linux MMU code/API. It can't use 4MB sections because of the added detail of managing/emulating the *dirty*, *young* and *accessed* bits. – artless noise Jul 16 '16 at 14:37
  • @artlessnoise _"The generic MMU layer is told that the L1 entries are 2MB..."_ - quite; the arch code ["behave(s) so as to make it appear that (the section size is 2MB) when in fact it is not"](http://www.oxforddictionaries.com/definition/english/pretend). I don't see what's misleading about that... – Notlikethat Jul 17 '16 at 11:17
  • Yes, but *pretend* doesn't answer the question *If someone could explain the logic behind this division in detail would be great* and it is certainly not a computer science term (thank for a dictionary reference any ways). Btw, I up voted your half answer. – artless noise Jul 17 '16 at 14:11
2

The ARM Linux and dirty bits should have all the answers. Mainly, the PTE tables have extra info to emulate bits resulting in the layout you observe.

I think a misconception is the memory an L2 table occupies versus what it maps. You must allocate physical memory for an L2 table and having it symmetric (4K size) make it the same as all pages. Now this 4k page could be four ARM MMU L2 page tables. However, we need some additional information to emulate dirty, young and accessed bits that the Linux generic MMU code requires. So the layout of the Linux L2 (PTE directory) is,

  1. Linux PTE [n]
  2. Linux PTE [n+1]
  3. ARM PTE [n]
  4. ARM PTE [n+1]

At the L1 level each entry is paired (n/n+1) so that it points to item 3 and 4 above. The pgtable-2level.h file has detailed comments on the layout (which should be correct for your version of Linux).

See: Tim's notes on ARM MM
         Page table entry (PTE) descriptor in Linux kernel for ARM

Community
  • 1
  • 1
artless noise
  • 21,212
  • 6
  • 68
  • 105
  • Thanks all for their valuable feedback. So in summary arm h/w and linux pte's are arranged due to the need of linux to have extra bits for management and the fact that by arranging them inside one 4KB page we just need to refer only one page. Otherwise we'll have to access two pages, one for h and one for linux pte. But does that really matter as kernel page tables are never swapped out ? – nagla Jul 19 '16 at 10:29
  • One more thing, is a bit is unclear to me. ARM hw will always use hw pte, does that mean that Linux pte's will only be used by Linux for internal use ? Suppose in my code I refer to some memory location for which correct page tables both hw and linux are setup, then as soon as cpu accesses this address, ARM hw will automatically use HW page tables and based on the permissions in hw page tables will allow or disallow me access. Question is in this case what use does Linux pte's have ... they seem to be useless ?? – nagla Jul 19 '16 at 10:33
  • 1
    Linux uses only the Linux PTE values (mainly); that is why they are first. There is a *commit* operation that the CPU MMU layer will translate a Linux PTE to hw PTE. Some of the referenced question alludes to this. Swap is not important (MMU use physical addesses, but the CPU to manage them uses virtual) so they must be physically present. However, TLB access will be reduced when grouped together. And as said, it reduces code complexity. Everything deals with 4k pages. The Linux PTE values are used by the generic MM/MMU code. The Linux kernel needs them. The CPU needs the hw PTE. – artless noise Jul 19 '16 at 20:27