The stack, as used in languages like C, is not a typical LIFO. It's called a stack because it is used in a way similar to a LIFO: When a procedure is called, a new frame is pushed onto the stack. The frame typically contains local variables and bookkeeping information like where to return to. Similarly, when a procedure returns, its frame is popped off the stack.
There's nothing magical about this. The compiler (not the operating system) allocates a register to be used as a stack pointer - let's call it SP. By convention, SP points to the memory location of the next free stack word:
+----------------+ (high address)
| argument 0 |
+----------------+
| argument 1 |
+----------------+
| return address |
+----------------+
| local 0 |
+----------------+
| local 1 |
+----------------+ +----+
| free slot | <-------------- | SP |
+----------------+ (low address) +----+
To push a value onto the stack, we do something like this (in pseudo-assembly):
STORE [SP], 42 ; store the value 42 at the address where SP points
SUB SP, 1 ; move down (the stack grows down!) to the next stack location
Where the notation [SP] is read as "the contents of the memory cell to which SP points". Some architectures, notably x86, provide a push instruction that does both the storing and subtraction. To pop (and discard) the n top values on the stack, we just add n to SP*.
Now, suppose we want to access the local 0 field above. Easy enough if our CPU has a base+offset addressing mode! Assume SP points to the free slot as in the picture above.
LOAD R0, [SP+2] ; load "local 0" into register R0
Notice how we didn't need to pop local 0 off the stack first, because we can reference any field using its offset from the stack pointer.
Depending on the compiler and machine architecture, there may be another register pointing to the area between locals and arguments (or thereabouts). This register, typically called a frame pointer, remains fixed as the stack pointer moves around.
I want to stress the fact that normally, the operating system isn't involved in stack manipulation at all. The kernel allocates the initial stack, and possibly monitors its growth, but leaves the pushing and popping of values to the user program.
*For simplicity, I've assumed that the machine word size is 1 byte, which is why we subtract 1 from SP. On a 32-bit machine, pushing a word onto the stack means subtracting (at least) four bytes.