Unix System Interface

This document was made by yarden kahanovitch
This document is made according to Marina's computer architecture course
this file is based on the presentation:
"lecture 8 - system calls" presentation by marina

What is the Unix System Interface?

this part will represent ONLY the lecture's presentation. the books material is at the end and is considered extra

Imagine you're writing a program that saves data to a file. In your C code, you call fwrite() and the data gets written. But have you wondered how this actually works? How does your program communicate with the hard drive, and how does the operating system ensure that your program can't accidentally corrupt another program's files?

The answer lies in the Unix System Interface, a carefully designed layer sitting between your programs and the hardware. This interface provides system callsβ€”special functions that allow programs to safely interact with files, processes, memory, and other system resources.

The Core Problem

Modern computers run multiple programs simultaneously. Allowing each program direct access to hardware would result in:

  • Security issues: Programs could read private files.
  • Data corruption: Programs could overwrite each other's data.
  • System crashes: A single buggy program could crash the entire system.
  • Resource conflicts: Multiple programs competing for the same hardware.

The Unix Solution

Unix solves these issues by creating a controlled interface:

  • System Calls: Programs request services from the OS through system calls.
  • Kernel Protection: The kernel controls all hardware access and enforces security.
  • Process Isolation: Each program runs within its own protected memory space.

Understanding Unix Processes

Every program you run becomes a processβ€”a self-contained instance of that program. Understanding processes is essential to understanding the Unix System Interface.

A process includes:

  • Memory Space: Contains the program's code, data, and variables.
  • Process ID (PID): Unique identifier for the process.
  • File Descriptor Table: Tracks open files and their permissions.
  • Environment Variables: Configuration settings for the process.
  • Working Directory: The process's current operating directory.
  • Security Context: Permissions and access rights.

How a Process Is Assembled

When you run a program, the operating system:

  1. Allocates memory space for the program.
  2. Loads the program instructions into memory.
  3. Assigns a unique PID.
  4. Sets up file descriptors (stdin, stdout, stderr).
  5. Sets working directory and environment variables.
  6. Assigns permissions based on user context.
  7. Starts executing the program instructions.

Each process operates in isolation, ensuring system security and stability.

File Descriptors:

Now that we understand the basic flow, let's talk about how Unix organizes file access. Instead of letting programs directly manipulate files, Unix uses a system called file descriptors.

Think of file descriptors as simple "tickets" or "handles" that your program gets when it opens a file. Each process maintains a file descriptor table with unique integer IDs (0, 1, 2, 3...) that track which files are open and what permissions they have.

File Descriptor Name Default Stream Flags Description
0 stdin Standard Input R Read input from keyboard/terminal
1 stdout Standard Output W Write normal output to terminal
2 stderr Standard Error W Write error messages to terminal
3 message.txt Our File W The file we created in our example
4+ Other Files User-defined R/W/A Additional files opened by the program
  • fd 0, 1, 2: Always pre-opened for every program (stdin, stdout, stderr)
  • fd 3+: Your program gets these when opening files with open() or fopen()
  • Flags: R=Read, W=Write, A=Append - determines what operations are allowed
  • Simple integers: Much easier than dealing with complex file paths and permissions

Process ID vs File Descriptor:

Don't confuse Process ID (PID) with File Descriptor (FD):

  • PID: Identifies the process itself (unique system-wide)
  • FD: Identifies an open file within that specific process

Each process has its own set of file descriptors (0, 1, 2, 3...), but only one PID. The same file descriptor number in different processes refers to completely different files.

Example - Two Text Editors Running Simultaneously:

  • Process A: TextEditor (PID 1234) opens "data.txt" β†’ gets FD 3
    File descriptor table: 0=stdin, 1=stdout, 2=stderr, 3=data.txt
  • Process B: TextEditor (PID 5678) opens "config.txt" β†’ also gets FD 3
    File descriptor table: 0=stdin, 1=stdout, 2=stderr, 3=config.txt

Key Point: Both processes have FD 3, but when Process A writes to FD 3, it modifies "data.txt", while Process B writing to FD 3 modifies "config.txt". The file descriptor numbers are process-local, not system-wide!

Flags:

  • O_RDONLY: Read-only access.
  • O_WRONLY: Write-only access.
  • O_RDWR: Read and write access.
  • O_CREAT: Create file if nonexistent.
  • O_APPEND: Append data to end of file.
  • O_TRUNC: Clear file contents upon opening.

Permissions (Who Can Do What):

Owner (User), Group, Others permissions with read (r), write (w), execute (x).

Common examples:

  • 0644: Owner read/write, others read-only.
  • 0755: Owner full access, others read/execute.
  • 0600: Owner read/write only.

How System Calls Work

let us start with the following example:
#include<stdio.h> int main() { char str[] = "Hi"; FILE* fp = fopen("a.txt", "w"); fwrite(str, 1, sizeof(str), fp); fclose(fp); return(0); }

we will inspect the function fwrite in this code, here's what happens:

  1. Program Call: You invoke a file-writing function (fwrite).
  2. C Library Translation: High-level calls internally call low-level system calls.
  3. Kernel Mode Switch: The CPU switches from user mode to kernel mode securely.
  4. Kernel Execution: Kernel routine performs the requested file operation safely.
  5. Hardware Access: Data is written safely to the storage device.

Understanding this low-level interaction provides precise control, performance improvements, deeper system knowledge, and better debugging capability.

πŸ‘€ Your Program
↓ File writing function call
πŸ“š C Library (glibc)
↓ write() wrapper function
🚧 USER / KERNEL BOUNDARY 🚧
Security checkpoint
↓ syscall instruction
πŸ—‚οΈ System Call Table
↓ sys_write() lookup
βš™οΈ Kernel Routine
↓ Safe hardware access
πŸ’Ύ Hard Drive

The Three Essential System Calls

System Call Functions - The Building Blocks:

Unix file operations are built on three fundamental system calls. Input and output uses the read and write system calls, which are accessed from C programs through two functions called read and write.

read() - Reading Data from Files:
int n_read = read(int fd, char *buf, int n);
  • fd: File descriptor (which file to read from)
  • buf: Character array in your program where the data will go
  • n: Number of bytes to be transferred
  • Returns: Count of bytes actually read. Zero indicates end of file, -1 indicates an error

Note: The number of bytes returned may be less than the number requested.

write() - Writing Data to Files:
int n_written = write(int fd, char *buf, int n);
  • fd: File descriptor (which file to write to)
  • buf: Character array in your program where the data comes from
  • n: Number of bytes to be transferred
  • Returns: Number of bytes actually written. An error has occurred if this isn't equal to the number requested
open() - Opening Files for Access:
int open(char *name, int flags, int perms);
  • name: Pathname of the file to open
  • flags: How you want to access the file (read, write, create, etc.)
  • perms: File permissions when creating new files (always zero for existing files)
  • Returns: File descriptor (positive integer) on success, -1 on error

Important: Other than the default standard input, output and error, you must explicitly open files in order to read or write them. There are two system calls for this, open and creat. However, creat is rarely used these days since it can be fully replaced with open.

Every file operation on Unix ultimately uses these system calls, no matter how high-level your programming language!

Example

Now that you understand the basics, let's see a more advanced example that demonstrates the power of low-level system calls. This example shows operations that would be difficult or impossible with high-level functions:

#include <stdio.h> #include <fcntl.h> #include <string.h> #include <unistd.h> int main() { // Step 1: Create file and write initial content int fd = open("demo.txt", O_RDWR | O_CREAT, 0644); write(fd, "Hello World\n", 12); printf("Created file with: Hello World\n"); // Step 2: Use lseek() to move to position 6 (after "Hello ") off_t pos = lseek(fd, 6, SEEK_SET); printf("Moved file pointer to position: %ld\n", pos); // Step 3: Overwrite "World" with "Unix" int bytes = write(fd, "Unix", 4); printf("Overwrote %d bytes at position 6\n", bytes); // Step 4: Close file close(fd); printf("Final result: Hello Unix\n"); return 0; }

What This Advanced Example Demonstrates:

  1. Initial Write: Creates "demo.txt" and writes "Hello World\n" (12 bytes)
  2. Precise Positioning: lseek() moves file pointer to byte 6 (after "Hello ")
  3. Selective Overwriting: Overwrites "World" (4 bytes) with "Unix" without affecting the rest
  4. Result: File now contains "Hello Unix\n" instead of "Hello World\n"

This type of precise file manipulation is essential for databases, editors, and other applications that need to modify specific parts of files efficiently.

Understanding lseek() - The File Position Controller:

  • SEEK_SET - Position from beginning of file: lseek(fd, 6, SEEK_SET) = "go to byte 6"
  • SEEK_CUR - Position relative to current location: lseek(fd, 5, SEEK_CUR) = "move forward 5 bytes"
  • SEEK_END - Position from end of file: lseek(fd, 0, SEEK_END) = "go to end of file"

lseek() is like a cursor in a text editor - it determines where the next read or write operation will happen

Why close() is Critical:

Always remember to close your file descriptors! Here's why:

  • Data Safety: Ensures all buffered data is actually written to disk
  • Resource Management: Frees the file descriptor for other processes to use
  • System Limits: Each process has a limit (~1024 file descriptors by default)
  • File Locking: Releases any locks on the file so other programs can access it

Forgetting to close files can lead to data loss, resource exhaustion, and system instability!

Here's how the advanced example looks in the terminal:
Terminal
user@vm:~/unix-demo $ gcc advanced_demo.c -o advanced_demo
user@vm:~/unix-demo $ ./advanced_demo
Created file with: Hello World
Moved file pointer to position: 6
Overwrote 4 bytes at position 6
Final result: Hello Unix
user@vm:~/unix-demo $ cat demo.txt
Hello Unix
user@vm:~/unix-demo $ β–ˆ

Assembly Level: System Calls in Their Purest Form

Now that you understand how system calls work at the C level, let's see what happens at the lowest level - assembly language. This is where system calls directly communicate with the kernel using CPU instructions.

When your C program calls write(), it eventually becomes assembly instructions that load values into specific CPU registers and execute the syscall instruction. Let's see exactly how this works.

sys_read

system call number (in rax): 0

arguments:

  • rdi: file descriptor (to read from it)
  • rsi: pointer to buffer (to keep a read data into it)
  • rdx: maximal number of bytes to read (maximal buffer size)

return value (in rax):

  • number of bytes received
  • On errors: negative number
section .bss
buffer: .space 1

.section .text
global _start
_start:

movq $0, %rax # system call number (sys_read)
movq $0, %rdi # file descriptor (stdin)
leaq buffer(%rip), %rsi # buffer to keep the read data
movq $1, %rdx # bytes to read
syscall # call kernel

movq $60, %rax # system call number (sys_exit)
movq $0, %rdi # exit status
syscall # call kernel

sys_write

system call number (in rax): 1

arguments:

  • rdi: file descriptor (to write to it)
  • rsi: pointer to buffer (data to write)
  • rdx: number of bytes to write

return value (in rax):

  • number of bytes written
  • On errors: negative number
section .data
msg: .ascii "Hello\n" # string to print
len = . - msg # length of string

.section .text
global _start
_start:

movq $1, %rax # system call number (sys_write)
movq $1, %rdi # file descriptor (stdout)
leaq msg(%rip), %rsi # message to write
movq $len, %rdx # message length
syscall # call kernel

movq $60, %rax # system call number (sys_exit)
movq $0, %rdi # exit status
syscall # call kernel

assembly code to print positive int

What this code does:

Converts the integer 234 to its ASCII string representation and prints it to stdout.

Key techniques:

  • Backward string building - Builds the string from right to left
  • Division by 10 - Extracts digits one by one
  • ASCII conversion - Adds '0' (48) to convert digit to ASCII
  • Dynamic length - Calculates string length during conversion

Register usage:

  • %rax - Holds the number being converted (quotient after division)
  • %rdx - Contains remainder after division (current digit)
  • %rcx - Holds divisor (10) for division operation
  • %rdi - Buffer pointer, moves backward as digits are stored
  • %rsi - Points to start of converted string for sys_write

Algorithm:

  1. Load number (234) into %rax register
  2. Point %rdi to end of buffer
  3. Divide %rax by 10, get remainder in %rdx (last digit)
  4. Convert digit in %rdx to ASCII and store at (%rdi)
  5. Repeat until %rax becomes 0
  6. Print using %rsi pointer to start of digits
.section .data
x: .long 234 # Define the number
buflen: .long 0 # Length of the converted string
.section .bss
buffer: .space 10 # Buffer to store the ASCII representation
# 10 bytes are for digits (according to maximum int value)

.section .text
.global _start
_start:
# Load the number into rax
xorq %rax, %rax
movl x, %eax

# Point rdi to end of buffer (we'll build string backwards)
leaq buffer, %rdi
addq $10, %rdi # Point to end of buffer
convert_loop:
# Divide by 10 to get last digit
movq $0, %rdx # Clear rdx for division
movq $10, %rcx # Divisor
divq %rcx # Divide rax by 10

# Convert remainder to ASCII and store
addq $'0', %rdx # Convert to ASCII
decq %rdi # Move buffer pointer back
movb %dl, (%rdi) # Store digit

# Increment length
incl buflen

# Continue if quotient is not zero
testq %rax, %rax
jnz convert_loop
# Write the number to stdout
movq $1, %rax # sys_write system call number
movq %rdi, %rsi # Pointer to start of digits
movq $1, %rdi # File descriptor 1 (stdout)
movq buflen, %rdx # Length of string
syscall

# Exit program
movq $60, %rax # sys_exit system call number
movq $0, %rdi # Exit status 0
syscall

print section text byte by byte

What this code does:

Prints each byte of the .text section (the executable code) one by one to stdout.

Key concepts:

  • Section boundaries - Uses labels to mark start and end of .text section
  • Byte-by-byte iteration - Loops through each byte in the section
  • Pointer arithmetic - Increments memory address to access next byte
  • Memory inspection - Reads and displays raw executable code

Register usage:

  • %rcx - Current byte pointer, starts at label1 (section start)
  • %rax - System call number for sys_write
  • %rdi - File descriptor (stdout = 1)
  • %rsi - Pointer to current byte to print
  • %rdx - Number of bytes to write (always 1)

Algorithm:

  1. Set %rcx to label1 (start of .text section)
  2. Print the byte at address %rcx
  3. Increment %rcx to point to next byte
  4. Check if %rcx reached label2 (end of section)
  5. If not at end, repeat from step 2
  6. Exit when entire section is printed
.section .text
.global _start
label1:
_start:
# Load start address of .text section
leaq label1(%rip), %rcx

print_loop:
# Check if we reached the end
leaq label2(%rip), %rax
cmpq %rax, %rcx
jge exit_program

# Print current byte
movq $1, %rax # sys_write system call number
movq $1, %rdi # file descriptor (stdout)
movq %rcx, %rsi # pointer to current byte
movq $1, %rdx # print 1 byte
syscall

# Move to next byte
incq %rcx # increment pointer
jmp print_loop # continue loop

exit_program:
# Exit program
movq $60, %rax # sys_exit system call number
movq $0, %rdi # exit status 0
syscall

label2:
# End of .text section marker

sys_open

system call number (in rax): 2

arguments:

  • rdi: pathname of the file to open/create
  • rsi: file access bits (bitwise OR'ed together)
    • O_RDONLY (0) - read only
    • O_WRONLY (1) - write only
    • O_RDRW (2) - read and write
    • O_APPEND (1024) - append to end
    • O_TRUNC (512) - truncate existing content
    • O_CREAT (64) - create if doesn't exist
  • rdx: file permissions (when O_CREAT is set)
    • S_IRWXU (0700) - RWX mask for owner
    • S_IRUSR (0400) - Read for owner
    • S_IWUSR (0200) - Write for owner
    • S_IXUSR (0100) - Execute for owner

return value (in rax):

  • file descriptor (positive integer)
  • On errors: negative number

Example breakdown:

$66 = O_RDWR | O_CREAT (2 + 64)
$0700 = S_IRWXU (owner: read/write/execute)

.section .data
fileName: .string "file.txt" # file name
fd: .quad 0 # file descriptor

.section .text
global main
main:

movq $2, %rax # system call number (sys_open)
leaq fileName(%rip), %rdi # set file name
movq $66, %rsi # flags: O_RDWR|O_CREAT (read+write, create if needed)
movq $0700, %rdx # permissions: S_IRWXU (owner read+write+execute)
syscall # call kernel
movq %rax, fd(%rip) # save file descriptor

movq $60, %rax # system call number (sys_exit)
movq $0, %rdi # exit status
syscall # call kernel

sys_close

system call number (in rax): 3

arguments:

  • rdi: file descriptor (obtained from sys_open)

return value (in rax):

  • 0 on success
  • On errors: negative number

What sys_close does:

  • Releases file descriptor - Makes it available for reuse
  • Flushes buffers - Ensures all data is written to disk
  • Frees kernel resources - Cleans up internal file structures
  • Prevents resource leaks - Essential for long-running programs

Example workflow:

  1. file_open: Opens "file.txt" with O_RDONLY
  2. file_close: Closes the file using saved descriptor
  3. exit_program: Clean program termination

Important notes:

Always close files when done! Process has limited file descriptors (~1024).
xorq %rdx, %rdx efficiently sets %rdx to 0.

.section .data
fileName: .string "file.txt" # file name
fd: .quad 0 # file descriptor

.section .text
global main
main:

file_open:
movq $2, %rax # system call number (sys_open)
leaq fileName(%rip), %rdi # set file name
movq $0, %rsi # flags: O_RDONLY (read-only access)
xorq %rdx, %rdx # permissions: not needed for O_RDONLY (read-only)
syscall # call kernel
movq %rax, fd(%rip) # save file descriptor

file_close:
movq $3, %rax # system call number (sys_close)
movq fd(%rip), %rdi # file descriptor
syscall # call kernel

exit_program:
movq $60, %rax # system call number (sys_exit)
xorq %rdi, %rdi # exit status 0
syscall # call kernel

sys_lseek

system call number (in rax): 8

arguments:

  • rdi: file descriptor
  • rsi: offset (number of bytes to move)
  • rdx: where to move from
    • SEEK_SET (0) - beginning of the file
    • SEEK_CUR (1) - current position of the file pointer
    • SEEK_END (2) - end of file

return value (in rax):

  • Current position of the file pointer
  • On errors: negative number

What sys_lseek does:

  • Repositions file pointer - Changes where next read/write will occur
  • Non-destructive operation - Doesn't modify file content, only pointer position
  • Enables random access - Jump to any position in file without reading sequentially
  • Essential for file editing - Allows overwriting specific parts of files

Example workflow:

  1. file_open: Opens "file.txt" with O_RDONLY (read-only)
  2. file_lseek: Moves file pointer 15 bytes from beginning (SEEK_SET)
  3. exit_program: Clean program termination

Important notes:

SEEK_SET (0) = absolute positioning from file start.
SEEK_CUR (1) = relative positioning from current location.
SEEK_END (2) = positioning relative to file end.
Moving beyond file end with write operations extends the file.

.section .data
fileName: .string "file.txt" # file name
fd: .quad 0 # file descriptor

.section .text
global main
main:

file_open:
movq $2, %rax # system call number (sys_open)
leaq fileName(%rip), %rdi # set file name
movq $0, %rsi # flags: O_RDONLY (read-only access)
xorq %rdx, %rdx # permissions: not needed for O_RDONLY (read-only)
syscall # call kernel
movq %rax, fd(%rip) # save file descriptor

file_lseek:
movq $8, %rax # system call number (sys_lseek)
movq fd(%rip), %rdi # file descriptor
movq $15, %rsi # offset: move 15 bytes
movq $0, %rdx # whence: SEEK_SET (from beginning)
syscall # call kernel

exit_program:
movq $60, %rax # system call number (sys_exit)
xorq %rdi, %rdi # exit status 0
syscall # call kernel

Unix Processes

Unix processes are independent programs running in memory. Each process has its own memory space, file descriptors, and process ID (PID). The operating system manages these processes and provides system calls to create, control, and communicate between them.

fork

The fork() system call creates a new process by duplicating the current process. The new process (child) is an exact copy of the original process (parent), except for the return value of fork().

How fork() works:

  • Returns 0 - In the child process
  • Returns child PID - In the parent process
  • Returns -1 - On error (fork failed)
#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { // Create a new process (duplicate current process) pid_t pid = fork(); if (pid == 0) { // Child process - fork() returned 0 printf("I'm the child! PID: %d\n", getpid()); } else if (pid > 0) { // Parent process - fork() returned child's PID printf("I'm the parent! Child PID: %d\n", pid); wait(NULL); // Wait for child to finish } else { // Fork failed - fork() returned -1 printf("Fork failed!\n"); } return 0; }

What happens step by step:

  1. fork() is called - Creates an identical copy of the current process
  2. Two processes exist - Parent and child, both continue from the fork() line
  3. Different return values - Child gets 0, parent gets child's PID
  4. Both execute if-else - Each process takes a different branch
  5. wait() synchronizes - Parent waits for child to complete
Here's how it looks when you run it:
Terminal
user@vm:~/SPLab $ gcc fork_demo.c -o fork_demo
user@vm:~/SPLab $ ./fork_demo
I'm the parent! Child PID: 1234
I'm the child! PID: 1234
user@vm:~/SPLab $ β–ˆ

Key Points:

  • Memory isolation - Child has its own copy of variables
  • File descriptors - Child inherits parent's open files
  • Process hierarchy - Parent-child relationship is established
  • Concurrent execution - Both processes run simultaneously
#include <stdio.h> #include <unistd.h> int main() { // Initialize counter variable (shared initial value) int counter = 0; printf("Before fork: counter = %d\n", counter); // Fork creates two separate processes with their own memory pid_t pid = fork(); if (pid == 0) { // Child process - modify its own copy of counter counter += 10; printf("Child: counter = %d\n", counter); } else if (pid > 0) { // Parent process - modify its own copy of counter counter += 5; printf("Parent: counter = %d\n", counter); } return 0; }

Memory Isolation Demo:

This example shows that each process has its own memory space. When the child modifies counter, it doesn't affect the parent's copy.

Output: Parent shows counter = 5, Child shows counter = 10

Understanding wait() - Process Synchronization

The wait() system call is crucial for parent-child process coordination. It allows the parent to wait for a child process to terminate and retrieve its exit status.

How wait() works:

  • Blocks the parent - Parent stops executing until a child terminates
  • Returns child PID - The PID of the terminated child process
  • Collects exit status - Stores termination information in the status parameter
  • Prevents zombie processes - Cleans up terminated child processes
#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Create child process int pid = fork(); // Variable to store child's exit status int status; if(pid) { // Parent process - wait for child to finish printf("Parent: waiting for child (PID %d) to finish...\n", pid); // Block until child terminates, get its PID back int child_pid = wait(&status); // Check if child exited normally (not killed by signal) if (WIFEXITED(status)) printf("Parent: child PID %d exited with status %d\n", child_pid, WEXITSTATUS(status)); } else { // Child process - do some work then exit printf("Child: I'm about to exit with status 18\n"); _exit(18); // Exit immediately with status 18 } return 0; }
Process Flow
1
Start: main() executes
↓
2
pid = fork()
↓
3
Check pid value
pid > 0 (Parent)
4
printf("Parent: waiting...")
↓
5
child_pid = wait(&status)
↓
7
WIFEXITED(status) check
↓
8
printf("child PID %d exited...")
pid == 0 (Child)
4
printf("Child: I'm about...")
↓
6
_exit(18)
↓
9
Program ends: return 0

Step-by-Step Execution:

  1. fork() creates child - Parent gets child PID, child gets 0
  2. Child executes else block - Prints message and exits with status 18
  3. Parent calls wait() - Blocks until child terminates
  4. wait() returns - Returns the PID of the terminated child
  5. Status is analyzed - WIFEXITED() checks if child exited normally
  6. Exit status extracted - WEXITSTATUS() gets the actual exit code (18)
Here's the terminal output showing the synchronization:
Terminal
user@vm:~/SPLab $ gcc wait_demo.c -o wait_demo
user@vm:~/SPLab $ ./wait_demo
Parent: waiting for child (PID 1234) to finish...
Child: I'm about to exit with status 18
Parent: child PID 1234 exited with status 18
user@vm:~/SPLab $ β–ˆ

Why wait() Returns the Child PID:

Think of it logically: a parent process might have multiple children running simultaneously. When wait() returns, the parent needs to know which specific child just terminated. The return value (child PID) identifies exactly which child process finished.

  • Multiple children scenario: Parent can distinguish between different child processes
  • Process tracking: Parent can maintain records of which children are still running
  • Selective waiting: Parent can take different actions based on which child terminated

Process Termination Status Explained:

  • status parameter: Contains packed information about how the child terminated
  • WIFEXITED(status): Returns true if child exited normally (not killed by signal)
  • WEXITSTATUS(status): Extracts the actual exit code passed to _exit() or return
  • _exit(18): Child terminates immediately with exit status 18

The parent can use this information to determine if the child completed successfully or encountered an error.

Advanced Process Management Examples

Now let's see practical examples of the three scenarios mentioned above. Each example demonstrates a different aspect of process management with accompanying flowcharts.

1. Multiple Children Scenario

This example shows how a parent process can create and distinguish between multiple child processes:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { pid_t child1, child2; // Create first child child1 = fork(); if (child1 == 0) { printf("Child 1: PID %d, doing task A\n", getpid()); sleep(2); return 10; // Exit code 10 } // Create second child child2 = fork(); if (child2 == 0) { printf("Child 2: PID %d, doing task B\n", getpid()); sleep(1); return 20; // Exit code 20 } // Parent waits for children int status; pid_t finished = wait(&status); if (finished == child1) { printf("Child 1 finished with code %d\n", WEXITSTATUS(status)); } else if (finished == child2) { printf("Child 2 finished with code %d\n", WEXITSTATUS(status)); } // Wait for remaining child finished = wait(&status); if (finished == child1) { printf("Child 1 finished with code %d\n", WEXITSTATUS(status)); } else { printf("Child 2 finished with code %d\n", WEXITSTATUS(status)); } printf("All children finished!\n"); return 0; }
Multiple Children Process Flow
1
Parent Process starts
↓
2
fork() β†’ Child 1
↓
3
fork() β†’ Child 2
↓
Child 1
4a
Task A
↓
5a
Exit(10)
Child 2
4b
Task B
↓
5b
Exit(20)
Parent
4c
wait() Γ— 2
↓
5c
Identify finished child by PID
Terminal - Multiple Children Output
user@vm:$ gcc multiple_children.c -o multiple_children
user@vm:$ ./multiple_children
Child 1: PID 1234, doing task A
Child 2: PID 1235, doing task B
Child 2 finished with code 20
Child 1 finished with code 10
All children finished!
user@vm:$ β–ˆ

2. Process Tracking Scenario

This example shows how a parent can maintain records of which children are still running:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { pid_t children[3]; int active_children = 3; // Create 3 children with different run times for (int i = 0; i < 3; i++) { children[i] = fork(); if (children[i] == 0) { printf("Child %d started (PID: %d)\n", i+1, getpid()); sleep((i+1) * 2); // Different sleep times printf("Child %d finishing\n", i+1); return i+1; } } // Track children as they finish while (active_children > 0) { int status; pid_t finished_pid = wait(&status); // Find which child finished for (int i = 0; i < 3; i++) { if (children[i] == finished_pid) { printf("Tracked: Child %d (PID %d) finished\n", i+1, finished_pid); children[i] = -1; // Mark as finished active_children--; break; } } printf("Remaining active children: %d\n", active_children); } printf("All children tracked and finished!\n"); return 0; }
Process Tracking Flow
1
Parent creates 3 children
↓
2
Initialize tracking array
children[3] = {PID1, PID2, PID3}
↓
Child 1
3a
sleep(2s)
Child 2
3b
sleep(4s)
Child 3
3c
sleep(6s)
↓
4
Parent waits in loop
wait() returns PID
↓
5
Find PID in array
Mark as finished (-1)
Update counter
↓
6
active_children = 0?
No
7a
Continue loop (back to step 4)
Yes
7b
All children finished
Terminal - Process Tracking Output
user@vm:$ gcc process_tracking.c -o process_tracking
user@vm:$ ./process_tracking
Child 1 started (PID: 1234)
Child 2 started (PID: 1235)
Child 3 started (PID: 1236)
Child 1 finishing
Tracked: Child 1 (PID 1234) finished
Remaining active children: 2
Child 2 finishing
Tracked: Child 2 (PID 1235) finished
Remaining active children: 1
Child 3 finishing
Tracked: Child 3 (PID 1236) finished
Remaining active children: 0
All children tracked and finished!
user@vm:$ β–ˆ

3. Selective Waiting Scenario

This example shows how a parent can take different actions based on which child terminated:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { pid_t worker_pid, monitor_pid; // Create worker child worker_pid = fork(); if (worker_pid == 0) { printf("Worker: Processing data...\n"); sleep(3); printf("Worker: Data processing complete\n"); return 0; // Success } // Create monitor child monitor_pid = fork(); if (monitor_pid == 0) { printf("Monitor: Watching system health...\n"); sleep(5); printf("Monitor: System check complete\n"); return 1; // Different exit code } // Parent handles children based on which finishes first int status; pid_t finished_pid = wait(&status); if (finished_pid == worker_pid) { printf("Action: Worker finished first\n"); printf("Action: Proceeding with next phase\n"); // Kill monitor since worker is done printf("Action: Terminating monitor\n"); } else if (finished_pid == monitor_pid) { printf("Action: Monitor finished first\n"); printf("Action: Checking if worker needs help\n"); // Could send signal to worker or take other action } // Wait for remaining child wait(&status); printf("Parent: All tasks completed\n"); return 0; }
Selective Waiting Flow
1
Parent Process starts
↓
fork() β†’ Worker
2a
Worker Child
Process Data (3s)
↓
3a
Exit(0)
fork() β†’ Monitor
2b
Monitor Child
Watch System (5s)
↓
3b
Exit(1)
↓
4
Parent: wait()
Returns PID of first to finish
↓
5
Which child finished?
Worker
6a
Proceed to next phase
Monitor
6b
Check worker status
↓
7
Wait for remaining child and continue
Terminal - Selective Waiting Output
user@vm:$ gcc selective_waiting.c -o selective_waiting
user@vm:$ ./selective_waiting
Worker: Processing data...
Monitor: Watching system health...
Worker: Data processing complete
Action: Worker finished first
Action: Proceeding with next phase
Action: Terminating monitor
Monitor: System check complete
Parent: All tasks completed
user@vm:$ β–ˆ

execvp() - Running External Programs

The execvp() system call is how Unix processes run external commands and programs. Think of it as "replacing" the current process with a completely different program. It's like taking over someone's body - the process ID stays the same, but everything else changes.

How execvp() works:

  • Completely replaces the process - The current program is thrown away and replaced with a new one
  • Same process ID (PID) - The process keeps its ID, but becomes a totally different program
  • Passes command-line arguments - The new program receives the arguments you specify
  • Never returns on success - If it works, your original program is gone forever
  • Returns only on failure - If the command doesn't exist, execvp() fails and your original program continues

Real-world example:

Imagine you're a manager (parent process) and you need to send an employee (child process) to deliver a presentation. You make a copy of the employee (fork), then completely transform that copy into a presentation-delivery specialist (execvp). The specialist delivers the presentation and then disappears (process ends). Meanwhile, you (the original manager) wait for the job to be done, then continue with your normal work.

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Create command array: program name + arguments + NULL terminator char* argv[3] = { "ls", "-l", 0 }; // Fork to create child process int pid = fork(); if (pid) // Parent: wait for child to complete wait(NULL); else // Child: replace self with "ls -l" command execvp(argv[0], argv); return 0; }
Process Flow (execvp Success)
1
Start: main() executes
↓
2
argv[3] = {"ls", "-l", 0}
↓
3
pid = fork()
↓
4
Check pid value
pid > 0 (Parent)
5
wait(NULL)
↓
8
Parent wakes up
↓
9
return 0
pid == 0 (Child)
6
execvp("ls", argv)
↓
7
Child BECOMES "ls -l"
↓
8
ls runs & exits
↓
10
Program ends

Code Breakdown - Step by Step:

  • argv[3] = {"ls", "-l", 0} - This creates the command "ls -l" (list files in long format). The array must end with 0 to mark the end of arguments.
  • fork() creates two processes - Parent (original program) and child (copy of the program)
  • Parent process (if pid > 0) - Calls wait() and sleeps until the child finishes
  • Child process (if pid == 0) - Calls execvp() and transforms into the "ls" program
  • execvp(argv[0], argv) - Child is completely replaced by the program named in argv[0] ("ls") with all arguments in argv
  • ls runs and exits - The transformed child process lists files and then terminates
  • Parent wakes up - wait() returns, parent continues and exits normally

Important: Understanding argv[0] and the argv array

argv[0] is NOT a process ID! It's the program name that the new process should identify itself as.

  • argv[0] = "ls" - The name of the program to execute
  • argv[1] = "-l" - First command-line argument
  • argv[2] = 0 - NULL terminator (marks end of arguments)

When execvp(argv[0], argv) runs, it searches for a program named "ls", and when that program starts, it receives the entire argv array as its command-line arguments. The "ls" program will see argv[0]="ls" and argv[1]="-l", just like if you typed "ls -l" in the terminal!

Key Point:

The child process doesn't "run the ls command" - it becomes the ls command! It's like a complete personality change. The child process stops being your program and starts being the ls program instead.

Here's the terminal output showing the ls command execution:
Terminal
user@vm:~/SPLab $ gcc execvp_demo.c -o execvp_demo
user@vm:~/SPLab $ ./execvp_demo
total 24
-rwxr-xr-x 1 user user 8760 Jan 15 10:30 execvp_demo
-rw-r--r-- 1 user user 245 Jan 15 10:29 execvp_demo.c
-rw-r--r-- 1 user user 123 Jan 15 09:45 demo.txt
user@vm:~/SPLab $ β–ˆ

What Actually Happened - The Complete Story:

  1. Program starts - Your main() function begins executing
  2. fork() creates identical twin - Now there are two copies of your program running simultaneously
  3. Parent says "I'll wait" - Parent process calls wait() and goes to sleep
  4. Child transforms completely - Child calls execvp() and becomes the "ls -l" program (your original program in the child is gone!)
  5. ls does its job - The transformed child (now ls) lists directory contents and prints them
  6. ls finishes and dies - The ls program completes and the child process terminates
  7. Parent wakes up and exits - wait() returns, parent realizes child is done, then parent exits too

Important: This is exactly how your shell works when you type "ls -l" in the terminal! The shell forks itself, transforms the child into ls, waits for ls to finish, then shows you the prompt again.

What if we change the code like this:

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Try to run non-existent command "junk" char* argv[2] = { "junk", 0 }; // Create child process int pid = fork(); if (pid) // Parent: wait for child (even if execvp fails) wait(NULL); else // Child: try to exec "junk" - this will fail! execvp(argv[0], argv); return 0; }
Terminal
user@vm:~/SPLab $ gcc junk_demo.c -o junk_demo
user@vm:~/SPLab $ ./junk_demo
sh: 1: junk: not found
user@vm:~/SPLab $ β–ˆ
Process Flow (execvp Failure)
1
Start: main() executes
↓
2
pid = fork()
↓
3
Check pid value
pid > 0 (Parent)
4
wait(NULL)
↓
6
Parent resumes after child exits
↓
7
return 0
pid == 0 (Child)
4
execvp("junk", argv)
↓
5
execvp() FAILS! "junk" not found
↓
5
Child continues with original process
↓
5
return 0 (child exits)
↓
8
Program ends

Understanding Command Failure with "junk":

We intentionally use "junk" - a command that doesn't exist - to demonstrate what happens when execvp() fails. This is educational because it shows the difference between success and failure.

What happens when execvp() fails:

  1. Child tries to become "junk" - execvp() looks for a program called "junk"
  2. System can't find "junk" - No such program exists on the system
  3. execvp() fails and returns - Instead of transforming, the child continues as your original program
  4. Child reaches return 0 - Since execvp() failed, the child process continues and exits normally
  5. Parent wakes up - wait() returns when child exits, parent continues and exits

Key insight: When execvp() succeeds, it never returns (the process becomes something else). When it fails, it returns an error and your original program continues running.

Back to our working example:

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Command array: "ls -l" (program + argument + NULL) char* argv[3] = { "ls", "-l", 0 }; // Fork creates child process int pid = fork(); if (pid) // Parent waits for child to finish wait(NULL); else // Child becomes the "ls" program execvp(argv[0], argv); return 0; }

πŸ€” Critical Question:

In the last example we can see that the parent and the child processes will not execute in parallel. So why even bother having a child process? Aren't there simpler solutions? Like for example using the same process and adding a few more loops?

let's find out...

Unix Shell

A Unix shell is a command-line interpreter that provides a user interface for the Unix operating system. It acts as an intermediary between the user and the kernel, allowing users to execute commands, run programs, and manage files. The shell reads commands from the user, interprets them, and executes the appropriate programs.

πŸ’‘ Why does the Shell need a child process, if they do not work in parallel?

The Answer:

If the shell called execvp() directly on itself, we would lose the shell process entirely! The shell would be replaced by the command being executed, and when that command finishes, there would be no shell left to return to.

Here's how the shell solves this problem:

  1. Shell displays prompt - Waits for user input
  2. User types command - Shell parses the command and parameters
  3. Shell forks a child - Creates a copy of itself
  4. Child executes command - Child process image is replaced by the command
  5. Parent waits - Shell waits for command to complete
  6. Command finishes - Child process terminates
  7. Shell continues - Returns to step 1, ready for next command

Key insight:

The shell preserves itself by using a child process as a "sacrificial" process that gets replaced by the user's command. This way, the original shell process remains alive and can continue accepting new commands after each one completes.

// Simplified Unix Shell Implementation while (TRUE) { // Display shell prompt (e.g., "$ ") typePrompt(); // Read user input and parse command + arguments getCommand(&command, &parameters); if (fork() > 0) /* Parent (shell): wait for command to complete */ wait(); else /* Child: replace self with user's command */ execvp(command, parameters); }

Assembly and Fork

Understanding how fork() and execve() work at the assembly level provides deep insight into Unix process management. This low-level implementation shows exactly how system calls interact with the kernel, how processes are created and managed, and how program execution is transferred. Assembly code reveals the raw mechanics behind C library functions, demonstrating the direct syscall interface and register usage patterns that high-level languages abstract away.

Equivalent C Code:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> #include <stdlib.h> int main() { // 18: xorq %rax, %rax // 19: movq $57, %rax // 20: syscall pid_t pid = fork(); // 22: testq %rax, %rax // 23: js fork_error if (pid < 0) { // 63: movq $60, %rax // 64: movq $2, %rdi // 65: syscall exit(2); // Fork error } // 24: jz child_process else if (pid == 0) { // 46: movq $1, %rdi // 47: leaq child_msg(%rip), %rsi // 48: movq $6, %rdx // 49: movq $1, %rax // 50: syscall write(1, "child\n", 6); // 52: leaq bin_ls(%rip), %rdi // 53: leaq argv(%rip), %rsi // 54: leaq envp(%rip), %rdx // 55: movq $59, %rax // 56: syscall char* argv[] = { "/bin/ls", NULL }; char* envp[] = { NULL }; execve("/bin/ls", argv, envp); // 59: movq $60, %rax // 60: movq $1, %rdi // 61: syscall exit(1); // execve failed } else { // 34: movq %rax, %rdi // 35: xorq %rax, %rax // 36: movq $61, %rax // 37: xorq %rsi, %rsi // 38: xorq %rdx, %rdx // 39: xorq %r10, %r10 // 40: syscall wait(NULL); // 43: movq $1, %rdi // 44: leaq parent_msg(%rip), %rsi // 45: movq $7, %rdx // 46: movq $1, %rax // 47: syscall write(1, "parent\n", 7); // 50: movq $60, %rax // 51: xorq %rdi, %rdi // 52: syscall exit(0); } return 0; }
Terminal
user@vm:~/assembly $ gcc -o fork_demo fork_demo.c
user@vm:~/assembly $ ./fork_demo
child
demo.txt fork_demo fork_demo.c fork_demo.s
parent
user@vm:~/assembly $ β–ˆ

Assembly Implementation:

.section .data
bin_ls: .string "/bin/ls\0"
argv:
# Pointer to filename
.quad bin_ls
# Null-terminated argument list
.quad 0
# Null-terminated environment # pointer for execve
envp: .quad 0
child_msg: .string "child\n"
parent_msg: .string "parent\n"

.section .bss
pid: .skip 4

.section .text
.globl main
main:
# fork() system call
xorq %rax, %rax # Clear rax for syscall number
movq $57, %rax # syscall number
syscall

# Check fork result
testq %rax, %rax # Check if rax is zero (child process)
js fork_error # Jump to error handler if rax < 0
jz child_process # Jump if rax == 0 (child process)

parent_process:
# wait() system call
movq %rax, %rdi # Save child's PID in rdi
xorq %rax, %rax # Zero out rax for syscall number
movq $61, %rax # syscall number
xorq %rsi, %rsi # rsi = 0 (wait for any child process)
xorq %rdx, %rdx # rdx = 0 (no options)
xorq %r10, %r10 # r10 = 0 (no usage of rusage)
syscall

# Print a message using write() syscall
movq $1, %rdi # file descriptor 1 (stdout)
leaq parent_msg(%rip), %rsi # pointer to parent message
movq $7, %rdx # size of the parent message
movq $1, %rax # syscall number for write()
syscall

exit: # Exit parent process
movq $60, %rax # syscall number for exit()
xorq %rdi, %rdi # rdi = 0 (exit code)
syscall

child_process:
# Print a message using write() syscall
movq $1, %rdi # file descriptor 1 (stdout)
leaq child_msg(%rip), %rsi # pointer to child message
movq $6, %rdx # size of the child message
movq $1, %rax # syscall number for write()
syscall

# execve("/bin/ls", argv, envp) system call
leaq bin_ls(%rip), %rdi # rdi = pointer to filename
leaq argv(%rip), %rsi # rsi = pointer to argv array
leaq envp(%rip), %rdx # rdx = pointer to envp array
movq $59, %rax # syscall number
syscall

# If execve returns, it failed
execve_error:
movq $60, %rax # syscall number for exit()
movq $1, %rdi # rdi = 1 (exit code)
syscall

fork_error:
movq $60, %rax # syscall number for exit()
movq $2, %rdi # rdi = 2 (exit code)
syscall

How This Assembly Implementation Works:

  1. Data Section Setup: The assembly defines string literals and data structures in memory, including the program path ("/bin/ls"), argument array, and environment pointer - all stored in the .data section with proper null termination.
  2. Direct System Call Interface: Instead of using C library functions, this code makes direct syscalls to the kernel using specific syscall numbers (57 for fork, 61 for wait, 59 for execve, 60 for exit) and the syscall instruction.
  3. Register-Based Parameter Passing: System call arguments are passed through specific registers (%rdi, %rsi, %rdx, %r10) following the x86-64 calling convention, with %rax holding the syscall number and receiving the return value.
  4. Conditional Branching Logic: The assembly uses conditional jumps (js, jz) to handle fork's three possible outcomes: error (negative), child process (zero), and parent process (positive PID).
  5. Process Transformation: In the child process, execve() completely replaces the process image with the new program (/bin/ls), while the parent waits and then prints its message, demonstrating the fundamental Unix process model at the lowest level.

Key Insight: This assembly code reveals the raw mechanics that C library functions abstract away - every high-level operation like fork(), wait(), and execve() ultimately translates to these precise register manipulations and syscall invocations, providing direct communication with the kernel.