0

I'm trying to disable paging completely with an LKM (don't ask me why I'm just experimenting).

I've tried just changing the value directly with the LKM.

void disable_paging(void)
{
    asm("movq %cr0, %rax\n\t"
        "movq $0xFFFFFFFEFFFFFFFF, %rbx\n\t"
        "and  %rbx, %rax\n\t"
        "movq %rax, %cr0\n\t");
}

Well the expected result would be the bit being flipped. The actual result is a segfault.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ace
  • 97
  • 8
  • 1
    Did you make sure you are executing in a page where the linear address matches the physical address? – prl Jul 20 '19 at 09:04
  • 1
    The code you showed tries to clear bit 32 of cr0, which is always 0, so it doesn’t do anything. PG is bit 31 of cr0. – prl Jul 20 '19 at 09:05
  • Are you running this in ring 0? – prl Jul 20 '19 at 09:05
  • Supposing that you got this to work... What would you expect to happen, other than #GP or processor shutdown? – prl Jul 20 '19 at 09:09
  • yes this is being run in ring zero – Ace Jul 20 '19 at 09:15
  • Try to use asm volatile instead asm. This avoid optimalization of your assembly inlines. – Roman Kwaśniewski Jul 20 '19 at 09:28
  • 1
    @RomanKwaśniewski : **basic** inline assembly statements are implicitly volatile. From the [GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html): _asm statements that have no output operands, including asm goto statements, are implicitly volatile._ – Michael Petch Jul 20 '19 at 10:38
  • 1
    Modifying registers without telling the compiler that they have been modified and clobbered can add subtle bugs to the code especially with optimizations on. In long mode paging needs to be enabled. – Michael Petch Jul 20 '19 at 10:44
  • 1
    Try using 32-bit instructions (CR0 is a 32-bit register and shoving 64 bits into a 32-bit register doesn't make much sense). Note that the expected result is that the computer resets/reboots (#PF then #DF then triple fault; because nothing is identity mapped including IDT). – Brendan Jul 20 '19 at 12:12
  • 1
    D'oh. I'm wrong. You can't disable paging while in long mode (you must disable long mode first); so #GP probably is the expected behavior. – Brendan Jul 20 '19 at 12:21
  • @Brendan : I believe control registers are 64-bit in long mode – Michael Petch Jul 20 '19 at 14:02
  • @MichaelPetch: GAS assembles both `mov %cr0, %eax` and `mov %cr0, %rax` to the same machine code, with no REX prefix. I think you were right about the missing clobber: the OP's mask clears bit 32 of RAX, leaving paging enabled with no change to the machine state. So any weirdness is just from lying to the compiler about RAX. – Peter Cordes Jul 20 '19 at 14:06
  • @MichaelPetch: Yes, when disassembling, GNU Binutils objdump prints it as RAX. Normally GAS doesn't optimize `mov $1, %rax` into 5-byte `mov r32, imm32`, but it does know that reading/writing control registers doesn't need a REX prefix. – Peter Cordes Jul 20 '19 at 14:14
  • @MichaelPetch: In 64-bit code; control registers `CR0` and `CR4` can be described as "64-bit where highest 32 bits must be zero" or "32 bit". There's no practical difference; but I prefer the latter because the former is stupid (it's like saying "`RAX` is a 256 bit register, you just can't use the highest 192 bits!"). – Brendan Jul 20 '19 at 14:28
  • @MichaelPetch: More specifically; I'd say that `CR2` and `CR3` had to be extended to 64 bit; and because the `MOV CRn` instruction ignores operand size they had to pretend that the other control registers are 64 bits when (internally) they aren't. – Brendan Jul 20 '19 at 14:30
  • @Brendan : Intel docs refers to all control registers between CR0 and CR7 (in long mode) as _extended control registers_ and take 64-bit registers. CR8 has the upper 60 bits considered reserved (and only the lower 4 bits used) – Michael Petch Jul 20 '19 at 14:36
  • @MichaelPetch: Intel docs also refer to an instruction that copies values as `mov` (even though the original value still exists after the value was "moved" elsewhere), refer to a flag that determines if IRQs are postponed or not as an "interrupt enable flag" (even though it does not effect other types of interrupts and only effects IRQs, and won't enable/disable IRQs), and says that some model specific registers are architectural (which makes as much sense as saying that wet things are dry). – Brendan Jul 20 '19 at 14:52

1 Answers1

4

TL:DR: This can't work, but your attempt didn't disable paging because you cleared bit 32 instead of bit 31. IDK why that would result in a SIGSEGV for any user-space process, though.

Any badness you get from this is from clobbering RAX + RBX without telling the compiler.


You're obviously building a module for x86-64 Linux which runs in long mode. But long mode requires paging to be enabled.

According to an osdev forum thread x86_64 - disabling paging?

If you disable paging in long mode, you will no longer be in long mode.

If that's actually true (rather than just trapping with a #GP exception or something), then obviously it's a complete disaster!!

Code fetch from EIP instead of RIP is extremely unlikely to fetch anything, and REX prefixes would decode as inc/dec if you do happen to end up with EIP pointing at some 64-bit code somewhere in the low 4GiB of physical address space. (Kernel addresses are in the upper canonical range, but it's remotely possible that the low 32 bits of RIP could be the physical address of some code.)

Also related: Why does long mode require paging - probably because supporting unpaged 64-bit mode is an unnecessary hardware expense that would never get much real use.


I'm not sure why you'd get a segfault. That's what I'd expect if you tried to run this code in user-space, where mov %cr0, %rax faults because it's privileged, and the kernel delivers SIGSEGV in response to that user-space #GP exception.

If you are running this function from an LKM's init function, like Brendan says the expected result would be crashing the kernel on that core. Or possibly the kernel would catch that and deliver SIGSEGV to modprobe(1).


Also, you're using GNU C Basic asm (without any clobbers), so GCC's code-gen assumes that registers (including RAX and RBX) aren't modified. Of course disabling paging is also a jump when your code isn't in an identity-mapped page, so it doesn't really matter whether make other small lies to the compiler or not. If this function doesn't inline into anything, then in practice clobbering RAX won't hurt. But clobbering RBX definitely can; it's call-preserved in the x86-64 System V calling convention.

And BTW, CR0 only has 32 significant bits. You could and $0x7fffffff, %eax to clear it. Or btr $31, %rax if you like to clear bit 31 in a 64-bit register. https://wiki.osdev.org/CPU_Registers_x86

According to Section 2.5 of the Intel manual Volume 3 (January 2019):

Bits 63:32 of CR0 and CR4 are reserved and must be written with zeros. Writing a nonzero value to any of the upper 32 bits results in a general-protection exception, #GP(0).

According to Section 3.1.1 of the AMD manual Volume 2 (December 2017):

In long mode, bits 63:32 are reserved and must be written with zero, otherwise a #GP occurs.

So it would be fine to truncate RAX to EAX, at least for the foreseeable future. New stuff tends to get added to MSRs, not CR bits. Since there's no way to do this in Linux without crashing, you might as well just keep it simple for silly computer tricks.


0xFFFFFFFEFFFFFFFF clears bit 32, not bit 31

All of the above is predicated on the assumption that you were actually clearing the paging-enable bit. So maybe SIGSEGV is simply due to corrupting registers with GNU C basic asm without actually changing the control register at all.

https://wiki.osdev.org/CPU_Registers_x86 shows that Paging is bit 31 of CR0, and that there are no real bits in the high half. https://en.wikipedia.org/wiki/Control_register#CR0 says CR0 is a 64-bit register in long mode. (But there still aren't any bits that do anything in the high half.)

Your mask actually clears bit 32, the low bit of the high half. The right AND mask is 0x7FFFFFFF. Or btr $31, %eax. Truncating RAX to EAX is fine.

This will actually crash your kernel in long mode like you were trying to:

// disable paging, should crash
    asm volatile(
        "mov  %%cr0, %%rax        \n\t"   // assembles with no REX prefix, same as mov %cr0,%eax
        "btr  $31, %%eax          \n\t"   // reset (clear) bit 31
        "mov  %%rax, %%cr0        \n\t"
        ::
        : "rax", "memory"
     );
Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • There is no way to disable paging in the Linux kernel at run-time without crashing. But there is a compile-time option called `CONFIG_MMU` that can be used to build the kernel with paging disabled and is supported on x86. There is very little documentation on how to actually do this (it's not enough to just tweak `CONFIG_MMU`). Regarding the segfault, in long mode, on both Intel and AMD processors, all reserved bits in the control registers must always be zero. The 4 instructions the OP has shown don't change the architectural state and I don't think they are the reason for the fault. – Hadi Brais Jul 20 '19 at 14:56
  • @HadiBrais: The OP's instructions don't change CR0, but they do change RAX (nitpick: which is also part of the architectural state). Without telling GCC, that's obviously a problem. :P – Peter Cordes Jul 20 '19 at 15:00
  • Oh yes. Isn't `rax` a call-clobbered register? But `rbx` isn't. Also like you said, gcc doesn't know that `rax` is modified. – Hadi Brais Jul 20 '19 at 15:03
  • @HadiBrais: oh right, I forgot they used 2 regs. But who knows what this inlines into? It's not called `sys_disable_paging` so it's probably not called directly from system-call dispatch. – Peter Cordes Jul 20 '19 at 15:08