0

I am taking the Operating Systems course and trying to understand how exactly fork() works. I know the basic definition: "fork() creates a new process by duplicating the calling process.".

I was told the child process will only execute the codes below the fork() statement. What I don't understand is, let's say we have a 10-lined code. If line 6 is fork(), Will the whole code from lines 1 to 10 run in the child process or only 6 to 10 ?

How will this code will work:

int main(){
pid_t smith;
int a=2; int b=3;
smith = fork();

if(smith == 0){
   fork();
   a++;
   fork();
}
else if(smith > 0){
   b++;
   fork();
}
printf("%d %d",a,b);
}

The fork()s in if blocks are confusing. How will they act? When a child process is created from the fork() in the else if, what code will it run?

Baran
  • 362
  • 2
  • 10
  • 2
    The child process will start off at exactly the same place as the parent process. It should not be confusing, because you should simply draw it out on paper. When you reach a `fork`, simply draw a fork in the road an continue reading the logic. – Cheatah Oct 29 '22 at 20:26
  • I was told the child process executes only the code below fork() is executed. Thats why I am confused. So you are saying that I miss understood that part, right? – Baran Oct 29 '22 at 20:29
  • 1
    [`fork`](https://man7.org/linux/man-pages/man2/fork.2.html#RETURN_VALUE) returns the child process's pid in the original process while it returns zero in the child process. Does that make things any clearer? – G.M. Oct 29 '22 at 20:30
  • Yes, it is clearer but what I want to know is, if the fork() statement is even at the end of the code, will the child process will run the whole code from start? – Baran Oct 29 '22 at 20:37
  • No, the child process will *continue* as if it were the original process. – Cheatah Oct 29 '22 at 20:39
  • Continue from where? – Baran Oct 29 '22 at 20:40
  • 2
    From where `fork` was called, of course. Did you draw it out on paper or not? – Cheatah Oct 29 '22 at 20:44
  • The print operation is woefully inadequate to help you understand what goes on. I'd use: `printf("PID = %d, PPID = %d: a = %d, b = %d\n", getpid(), getppid(), a, b):`. I'd also add a 'wait loop' — `int corpse; int status; while ((corpse = wait(&status)) > 0) print("PID = %d, PPID = %d: child %d exited with status 0x%.4X\n", getpid(), getppid(), corpse, status);` — immediately after the print. Studying that output will make things clearer. I might add `printf("PID = %d, PPID = %d\n", getpid(), getppid());` before the first call to `fork()` too. – Jonathan Leffler Oct 29 '22 at 21:06
  • The code shown doesn't print a newline. Often that will lead to a [`printf` anomaly after `fork`](https://stackoverflow.com/questions/2530663/printf-anomaly-after-fork/2530870#2530870), but since the only print operations are done after the forking is complete, the only issue is that the sequence in which the output appears is indeterminate. – Jonathan Leffler Oct 29 '22 at 21:10

3 Answers3

3

I was told the child process will only execute the codes below the fork() statement. What I don't understand is, let's say we have a 10-lined code. If line 6 is fork(), Will the whole code from lines 1 to 10 run in the child process or only 6 to 10 ?

Imagine I give you a piece of paper containing this list:

1. Apple
2. Banana
3. Cranberry
4. Daffodil
5. Eggplant
6. Fuchsia
7. Germanium
8. Hollyhock
9. Indigo
10. Jasmine

Imagine you start reading this list out loud, from top to bottom. Imagine you have just said "Five. Eggplant." Imagine that at this point I point a magic duplication ray at you, creating a second, identical copy of Baran Açıkgöz and a second identical list. The two Baran Açıkgözs are virtually identical: they have the same memories and the same thoughts in their heads. They're both doing the same thing that the original Baran Açıkgöz had been doing at the moment of duplication. So, what do the two Baran Açıkgözs do next?

That's how fork works.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • That was the best analogy for me to understand. Great thanks. Now I know how fork works – Baran Oct 29 '22 at 22:14
1

What you need to understand is that fork copies the entire memory space of the parent process and runs the child process into that copied environment. Of course when copying occurs, the instructions are copied as well, so the program that is run is exactly the same.

If you have difficulties seeing this through ifs and elses, looking at the assembly might help as the execution is more streamlined.

This code:

int main(){
    pid_t pid;
    int a=2; int b=3;
    pid = fork();

    if(pid == 0){
        a++;
    }
    else if(pid > 0){
        b++;
    }
    printf("%d %d",a,b);
    }

Compiles into this:


// String literal
.LC0:
        .ascii  "%d %d\000"

//    int a=2; int b=3;
        movs    r3, #2
        str     r3, [r7, #12]
        movs    r3, #3
        str     r3, [r7, #8]

// fork() 
        bl fork

// if pid == 0
// This code will be executed by the parent

        mov     r3, r0 // return is in r0
        str     r3, [r7, #4]
        ldr     r3, [r7, #4]
        cmp     r3, #0
        bne     .L2

// a++;
        ldr     r3, [r7, #12]
        adds    r3, r3, #1
        str     r3, [r7, #12]
// Look at how the next section is skipped.
        b       .L3

// This code will be executed by the child
.L2:
        ldr     r3, [r7, #4]
        cmp     r3, #0
        ble     .L3
        ldr     r3, [r7, #8]
        adds    r3, r3, #1
        str     r3, [r7, #8]

// Now both parents and child execute this last part
.L3:
        ldr     r2, [r7, #8]
        ldr     r1, [r7, #12]
        movw    r0, #:lower16:.LC0
        movt    r0, #:upper16:.LC0

// And then we have the printf
       bl printf

I hope it is clear now how the same assembly is executed by parent and child, the IFs determine which part of the code is executed.

If for instance you don't run conditionals on the return value of fork both child and parent will run exactly the same code.

Fra93
  • 1,992
  • 1
  • 9
  • 18
1

If you instrument the program as discussed in comment 1 and comment 2, you end up with code like this (source: fork67.c, executable: fork67):

#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
    printf("PID = %5d, PPID = %5d\n", getpid(), getppid());
    int a = 2;
    int b = 3;
    pid_t smith = fork();

    if (smith == 0)
    {
        fork();
        a++;
        fork();
    }
    else if (smith > 0)
    {
        b++;
        fork();
    }
    printf("PID = %5d, PPID = %5d: a = %d, b = %d\n", getpid(), getppid(), a, b);

    int corpse;
    int status;
    int count = 0;
    while ((corpse = wait(&status)) > 0)
    {
        printf("PID = %5d, PPID = %5d: child %5d exited with status 0x%.4X\n",
               getpid(), getppid(), corpse, status);
        count++;
    }

    printf("PID = %5d, PPID = %5d: reported on %5d children\n",
           getpid(), getppid(), count);
    return count;
}

With one example run, I got the output:

PID = 93578, PPID =  1320
PID = 93578, PPID =  1320: a = 2, b = 4
PID = 93580, PPID = 93578: a = 2, b = 4
PID = 93580, PPID = 93578: reported on     0 children
PID = 93579, PPID = 93578: a = 3, b = 3
PID = 93578, PPID =  1320: child 93580 exited with status 0x0000
PID = 93582, PPID = 93579: a = 3, b = 3
PID = 93581, PPID = 93579: a = 3, b = 3
PID = 93582, PPID = 93579: reported on     0 children
PID = 93583, PPID = 93581: a = 3, b = 3
PID = 93583, PPID = 93581: reported on     0 children
PID = 93581, PPID = 93579: child 93583 exited with status 0x0000
PID = 93579, PPID = 93578: child 93582 exited with status 0x0000
PID = 93581, PPID = 93579: reported on     1 children
PID = 93579, PPID = 93578: child 93581 exited with status 0x0100
PID = 93579, PPID = 93578: reported on     2 children
PID = 93578, PPID =  1320: child 93579 exited with status 0x0200
PID = 93578, PPID =  1320: reported on     2 children

My login shell has the PID 1320. You can see the processes' execution paths fairly clearly. A still more instrumented version captures the return value from each fork() and prints that information too, leading to fork79.c and program fork79:

#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
    printf("PID = %5d, PPID = %5d\n", getpid(), getppid());
    int a = 2;
    int b = 3;
    pid_t smith = fork();
    printf("PID = %5d, PPID = %5d: c0 = %5d\n", getpid(), getppid(), smith);

    if (smith == 0)
    {
        int c1 = fork();
        printf("PID = %5d, PPID = %5d: c1 = %5d\n", getpid(), getppid(), c1);
        a++;
        int c2 = fork();
        printf("PID = %5d, PPID = %5d: c2 = %5d\n", getpid(), getppid(), c2);
    }
    else if (smith > 0)
    {
        b++;
        int c3 = fork();
        printf("PID = %5d, PPID = %5d: c3 = %5d\n", getpid(), getppid(), c3);
    }
    printf("PID = %5d, PPID = %5d: a = %d, b = %d\n", getpid(), getppid(), a, b);

    int corpse;
    int status;
    int count = 0;
    while ((corpse = wait(&status)) > 0)
    {
        printf("PID = %5d, PPID = %5d: child %5d exited with status 0x%.4X\n",
               getpid(), getppid(), corpse, status);
        count++;
    }

    printf("PID = %5d, PPID = %5d: reported on %5d children\n",
           getpid(), getppid(), count);
    return count;
}

Sample run:

PID = 93985, PPID =  1320
PID = 93985, PPID =  1320: c0 = 93986
PID = 93985, PPID =  1320: c3 = 93987
PID = 93985, PPID =  1320: a = 2, b = 4
PID = 93986, PPID = 93985: c0 =     0
PID = 93987, PPID = 93985: c3 =     0
PID = 93987, PPID = 93985: a = 2, b = 4
PID = 93987, PPID = 93985: reported on     0 children
PID = 93986, PPID = 93985: c1 = 93988
PID = 93988, PPID = 93986: c1 =     0
PID = 93985, PPID =  1320: child 93987 exited with status 0x0000
PID = 93986, PPID = 93985: c2 = 93989
PID = 93986, PPID = 93985: a = 3, b = 3
PID = 93989, PPID = 93986: c2 =     0
PID = 93989, PPID = 93986: a = 3, b = 3
PID = 93988, PPID = 93986: c2 = 93990
PID = 93988, PPID = 93986: a = 3, b = 3
PID = 93989, PPID = 93986: reported on     0 children
PID = 93990, PPID = 93988: c2 =     0
PID = 93990, PPID = 93988: a = 3, b = 3
PID = 93986, PPID = 93985: child 93989 exited with status 0x0000
PID = 93990, PPID = 93988: reported on     0 children
PID = 93988, PPID = 93986: child 93990 exited with status 0x0000
PID = 93988, PPID = 93986: reported on     1 children
PID = 93986, PPID = 93985: child 93988 exited with status 0x0100
PID = 93986, PPID = 93985: reported on     2 children
PID = 93985, PPID =  1320: child 93986 exited with status 0x0200
PID = 93985, PPID =  1320: reported on     2 children

Because there is printing before and while the forking occurs, this code is vulnerable to the printf anomaly referenced in printf anomaly after fork(). Here's the output when its output is piped through cat:

$ fork79 | cat
PID = 94002, PPID =  1320
PID = 94002, PPID =  1320: c0 = 94004
PID = 94005, PPID = 94002: c3 =     0
PID = 94005, PPID = 94002: a = 2, b = 4
PID = 94005, PPID = 94002: reported on     0 children
PID = 94002, PPID =  1320
PID = 94004, PPID = 94002: c0 =     0
PID = 94004, PPID = 94002: c1 = 94006
PID = 94007, PPID = 94004: c2 =     0
PID = 94007, PPID = 94004: a = 3, b = 3
PID = 94007, PPID = 94004: reported on     0 children
PID = 94002, PPID =  1320
PID = 94004, PPID = 94002: c0 =     0
PID = 94006, PPID = 94004: c1 =     0
PID = 94008, PPID = 94006: c2 =     0
PID = 94008, PPID = 94006: a = 3, b = 3
PID = 94008, PPID = 94006: reported on     0 children
PID = 94002, PPID =  1320
PID = 94004, PPID = 94002: c0 =     0
PID = 94006, PPID = 94004: c1 =     0
PID = 94006, PPID = 94004: c2 = 94008
PID = 94006, PPID = 94004: a = 3, b = 3
PID = 94006, PPID = 94004: child 94008 exited with status 0x0000
PID = 94006, PPID = 94004: reported on     1 children
PID = 94002, PPID =  1320
PID = 94004, PPID = 94002: c0 =     0
PID = 94004, PPID = 94002: c1 = 94006
PID = 94004, PPID = 94002: c2 = 94007
PID = 94004, PPID = 94002: a = 3, b = 3
PID = 94004, PPID = 94002: child 94007 exited with status 0x0000
PID = 94004, PPID = 94002: child 94006 exited with status 0x0100
PID = 94004, PPID = 94002: reported on     2 children
PID = 94002, PPID =  1320
PID = 94002, PPID =  1320: c0 = 94004
PID = 94002, PPID =  1320: c3 = 94005
PID = 94002, PPID =  1320: a = 2, b = 4
PID = 94002, PPID =  1320: child 94005 exited with status 0x0000
PID = 94002, PPID =  1320: child 94004 exited with status 0x0200
PID = 94002, PPID =  1320: reported on     2 children
$

Note that the line PID = 94002, PPID = 1320 appears 6 times, once for each of the 6 processes that run: 94018, 94023, 94024, 94026, 94027, 94029.

There many variations that you can experiment with. On my machine, the original code (from the question) produces:

$ fork37
2 42 43 33 33 33 3$

because there's no newlines output, so the command prompt appears after the last 3 3. Curiously, I never got the prompt to appear before the last numbers — that surprises me. Adding the basic monitoring proposed in the comments gives output such as:

$ fork59
PID = 95109, PPID =  1320: a = 2 b = 4
PID = 95111, PPID =     1: a = 2 b = 4
PID = 95110, PPID =     1: a = 3 b = 3
PID = 95113, PPID = 95110: a = 3 b = 3
PID = 95112, PPID =     1: a = 3 b = 3
PID = 95114, PPID = 95112: a = 3 b = 3
$

When the parent PID (PPID) is 1, it means the original process that forked the current PID has already exited. With the wait loop, the processes are more synchronized.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278