The following is a general description and summary, based on programming concepts in general rather than any specific implementation.
The call to printf
starts with an ordinary subroutine call; kernel mode is not involved. In large part, printf
is ordinary code and may be written in C. The bulk of the printf
code itself is concerned with interpreting the format string, converting the arguments to strings to be written, and writing those strings to the output file. Much of this work will be done through subroutines that printf
calls, such as subroutines to convert numbers (objects like int
or float
) to numerals (strings of characters that represent the numbers).
printf
also likely calls malloc
or a related routine to get memory for a buffer where it prepares the strings. I will refrain from describing the malloc
call in this answer.
All of the work of interpreting the format string, converting the arguments, and preparing strings to be written can be done in C, although high-quality libraries may use a variety of target-specific optimizations, including assembly language, for speed or efficiency.
At some point, when printf
has a string to print, it will call a routine to write the string to stdout
. This may be fwrite
or some similar subroutine. For discussion, I will suppose it is fwrite
.
Usually, streams are buffered. So, when printf
calls fwrite
, fwrite
checks how full its buffer is. If the new string from printf
fits into the buffer, fwrite
merely adds the string to the buffer and returns. If the buffer is full, then fwrite
calls another routine to actually write the buffer contents to a file. (Typically, this involves filling the buffer with part of the incoming string, writing the buffer to a file [and marking the buffer empty], and then copying the rest of the incoming string into the newly empty buffer.) Certain other things might also trigger writing the buffer, such as detecting a newline character in the incoming string, depending on circumstances.
Let’s say that, to write the buffer, fwrite
calls the system routine write
. The face of write
is a library routine; fwrite
performs an ordinary subroutine call to call write
. System routines will have some portion that is an ordinary subroutine, but, when they need to do the nitty-gritty work, there is some sort of system-call instruction (sometimes called a trap).
When you execute a system-call instruction, the processor does several things. It saves processor registers in specified locations. This includes both general registers and special registers that describe the state of the user process. Then the processor switches to kernel mode, which typically involves setting bits to indicate the new execution state is privileged (allowed to change special processor registers, execute special instructions, et cetera) and loading the registers from some other location, or setting them to known values. In particular, the program counter (the location where the processor reads instructions to execute) is set to point to a particular place, where the operating system has code to handle system calls.
Now the processor is executing in kernel mode. Usually, the job of the processor at this point is to get out of kernel mode as quickly as possible, so that it can resume time-sharing between processes and being ready for other work. Additionally, there are many layers to modern operating systems, so it is difficult to say precisely what happens at this point.
One scenario is that the system call handler (the software that is called when a system call occurs) reads the saved registers and memory of the user process to determine what the process asked for. On each system, some method of passing parameters to a system call is specified. For example, a certain register might contain a number that indicates what the request is (0 means write, 1 means read, 2 means get current time, 3 means change memory map, et cetera), and each request will have certain parameters passed in other registers or in memory (one register might contain an address in memory, while another contains the length to write).
So, the system call handler figures out what request is being made and dispatches to code to handle that. This might involve collecting parameters for the request and forming them into a description of the work to be done, then putting that work on a queue and leaving the system call handler.
While there is work to be done, the operating system probably does not return to a user process. As I mentioned before, there are many layers in modern operating systems. There are device drivers, kernel extensions, microkernels, libraries of software within the operating system, and more. However the operating system is organized, at some time, it decides to do the work requested by the system call.
In the case of a write to standard output, the work is sent to a “device driver”, which is a name for software that handles the work for a “device”. Originally, devices were pieces of hardware connected to the system. A device driver would copy the data to be written to a special place in memory and issue a command to the device (using special instructions) to read that data from memory and send it to wherever the device sends it (a terminal, a disk drive, whatever). Another part of the device driver would be a routine that is called when the work is done. (This call is similar to a system call but is usually called an interrupt.) When the work is done, the device driver would pass a message back to other parts of the operating system, and eventually information about the result of the system call would be written into the memory or registers of a user process, and execution of the user process would be resumed.
Today, many “devices” are software that implement virtual devices. The standard output of a user process is likely some sort of pseudo-terminal. Since that pseudo-terminal has no actual hardware terminal, it has to handle write requests by asking other software to help.
When the pseudo-terminal is part of a terminal window on a graphic display, there is some software that implements the terminal window. That software accepts text being written to standard output, decides where in the window it should be placed, and calls other software to convert the characters into changes in pixels in the window. That is, some software is reading the characters, looking up descriptions of them in some tables and other data (descriptions of the typeface and so on), and drawing those characters in an image buffer.
When the image buffer is ready, more software is called to write the image buffer to the display. Again, this involves passing data to another device driver. Ultimately, it reaches an actual hardware device, which takes the data and makes it appear on the display.
To wrap up, there is a huge chain of events. Data goes up and down through multiple layers, likely involving several different user processes and several different device drivers, and many software libraries. It is difficult to get a comprehensive view of the entire process. Generally, one would not want to try to understand the entire process all at once but would learn separately about each of the steps. For example, at times in my career, I have had to deal with the minute details of a system call instruction. But, when thinking about how my entire system is working, I think about larger-level processes communicating with each other, without thinking about the details of how those communications are made to work.