Unix System Interface

What is the Unix System Interface?

this part will represent ONLY the lecture's presentation. the books material is at the end and is considered extra

Imagine you're writing a program that saves data to a file. In your C code, you call fwrite() and the data gets written. But have you wondered how this actually works? How does your program communicate with the hard drive, and how does the operating system ensure that your program can't accidentally corrupt another program's files?

The answer lies in the Unix System Interface, a carefully designed layer sitting between your programs and the hardware. This interface provides system calls—special functions that allow programs to safely interact with files, processes, memory, and other system resources.

The Core Problem

Modern computers run multiple programs simultaneously. Allowing each program direct access to hardware would result in:

Security issues: Programs could read private files.
Data corruption: Programs could overwrite each other's data.
System crashes: A single buggy program could crash the entire system.
Resource conflicts: Multiple programs competing for the same hardware.

The Unix Solution

Unix solves these issues by creating a controlled interface:

System Calls: Programs request services from the OS through system calls.
Kernel Protection: The kernel controls all hardware access and enforces security.
Process Isolation: Each program runs within its own protected memory space.

Understanding Unix Processes

Every program you run becomes a process—a self-contained instance of that program. Understanding processes is essential to understanding the Unix System Interface.

A process includes:

Memory Space: Contains the program's code, data, and variables.
Process ID (PID): Unique identifier for the process.
File Descriptor Table: Tracks open files and their permissions.
Environment Variables: Configuration settings for the process.
Working Directory: The process's current operating directory.
Security Context: Permissions and access rights.

How a Process Is Assembled

When you run a program, the operating system:

Allocates memory space for the program.
Loads the program instructions into memory.
Assigns a unique PID.
Sets up file descriptors (stdin, stdout, stderr).
Sets working directory and environment variables.
Assigns permissions based on user context.
Starts executing the program instructions.

Each process operates in isolation, ensuring system security and stability.

File Descriptors:

Now that we understand the basic flow, let's talk about how Unix organizes file access. Instead of letting programs directly manipulate files, Unix uses a system called file descriptors.

Think of file descriptors as simple "tickets" or "handles" that your program gets when it opens a file. Each process maintains a file descriptor table with unique integer IDs (0, 1, 2, 3...) that track which files are open and what permissions they have.

File Descriptor	Name	Default Stream	Flags	Description
0	`stdin`	Standard Input	R	Read input from keyboard/terminal
1	`stdout`	Standard Output	W	Write normal output to terminal
2	`stderr`	Standard Error	W	Write error messages to terminal
3	message.txt	Our File	W	The file we created in our example
4+	Other Files	User-defined	R/W/A	Additional files opened by the program

fd 0, 1, 2: Always pre-opened for every program (stdin, stdout, stderr)
fd 3+: Your program gets these when opening files with open() or fopen()
Flags: R=Read, W=Write, A=Append - determines what operations are allowed
Simple integers: Much easier than dealing with complex file paths and permissions

Process ID vs File Descriptor:

Don't confuse Process ID (PID) with File Descriptor (FD):

PID: Identifies the process itself (unique system-wide)
FD: Identifies an open file within that specific process

Each process has its own set of file descriptors (0, 1, 2, 3...), but only one PID. The same file descriptor number in different processes refers to completely different files.

Example - Two Text Editors Running Simultaneously:

Process A: TextEditor (PID 1234) opens "data.txt" → gets FD 3
File descriptor table: 0=stdin, 1=stdout, 2=stderr, 3=data.txt
Process B: TextEditor (PID 5678) opens "config.txt" → also gets FD 3
File descriptor table: 0=stdin, 1=stdout, 2=stderr, 3=config.txt

Key Point: Both processes have FD 3, but when Process A writes to FD 3, it modifies "data.txt", while Process B writing to FD 3 modifies "config.txt". The file descriptor numbers are process-local, not system-wide!

Flags:

O_RDONLY: Read-only access.
O_WRONLY: Write-only access.
O_RDWR: Read and write access.
O_CREAT: Create file if nonexistent.
O_APPEND: Append data to end of file.
O_TRUNC: Clear file contents upon opening.

Permissions (Who Can Do What):

Owner (User), Group, Others permissions with read (r), write (w), execute (x).

Common examples:

0644: Owner read/write, others read-only.
0755: Owner full access, others read/execute.
0600: Owner read/write only.

How System Calls Work

let us start with the following example:

#include<stdio.h> int main() { char str[] = "Hi"; FILE* fp = fopen("a.txt", "w"); fwrite(str, 1, sizeof(str), fp); fclose(fp); return(0); }

we will inspect the function fwrite in this code, here's what happens:

Program Call: You invoke a file-writing function (fwrite).
C Library Translation: High-level calls internally call low-level system calls.
Kernel Mode Switch: The CPU switches from user mode to kernel mode securely.
Kernel Execution: Kernel routine performs the requested file operation safely.
Hardware Access: Data is written safely to the storage device.

Understanding this low-level interaction provides precise control, performance improvements, deeper system knowledge, and better debugging capability.

👤 Your Program

↓ File writing function call

📚 C Library (glibc)

↓ write() wrapper function

🚧 USER / KERNEL BOUNDARY 🚧
Security checkpoint

↓ syscall instruction

🗂️ System Call Table

↓ sys_write() lookup

⚙️ Kernel Routine

↓ Safe hardware access

💾 Hard Drive

The Three Essential System Calls

System Call Functions - The Building Blocks:

Unix file operations are built on three fundamental system calls. Input and output uses the read and write system calls, which are accessed from C programs through two functions called read and write.

read() - Reading Data from Files:

int n_read = read(int fd, char *buf, int n);

fd: File descriptor (which file to read from)
buf: Character array in your program where the data will go
n: Number of bytes to be transferred
Returns: Count of bytes actually read. Zero indicates end of file, -1 indicates an error

Note: The number of bytes returned may be less than the number requested.

write() - Writing Data to Files:

int n_written = write(int fd, char *buf, int n);

fd: File descriptor (which file to write to)
buf: Character array in your program where the data comes from
n: Number of bytes to be transferred
Returns: Number of bytes actually written. An error has occurred if this isn't equal to the number requested

open() - Opening Files for Access:

int open(char *name, int flags, int perms);

name: Pathname of the file to open
flags: How you want to access the file (read, write, create, etc.)
perms: File permissions when creating new files (always zero for existing files)
Returns: File descriptor (positive integer) on success, -1 on error

Important: Other than the default standard input, output and error, you must explicitly open files in order to read or write them. There are two system calls for this, open and creat. However, creat is rarely used these days since it can be fully replaced with open.

Every file operation on Unix ultimately uses these system calls, no matter how high-level your programming language!

Example

Now that you understand the basics, let's see a more advanced example that demonstrates the power of low-level system calls. This example shows operations that would be difficult or impossible with high-level functions:

#include <stdio.h> #include <fcntl.h> #include <string.h> #include <unistd.h> int main() { // Step 1: Create file and write initial content int fd = open("demo.txt", O_RDWR | O_CREAT, 0644); write(fd, "Hello World\n", 12); printf("Created file with: Hello World\n"); // Step 2: Use lseek() to move to position 6 (after "Hello ") off_t pos = lseek(fd, 6, SEEK_SET); printf("Moved file pointer to position: %ld\n", pos); // Step 3: Overwrite "World" with "Unix" int bytes = write(fd, "Unix", 4); printf("Overwrote %d bytes at position 6\n", bytes); // Step 4: Close file close(fd); printf("Final result: Hello Unix\n"); return 0; }

What This Advanced Example Demonstrates:

Initial Write: Creates "demo.txt" and writes "Hello World\n" (12 bytes)
Precise Positioning: lseek() moves file pointer to byte 6 (after "Hello ")
Selective Overwriting: Overwrites "World" (4 bytes) with "Unix" without affecting the rest
Result: File now contains "Hello Unix\n" instead of "Hello World\n"

This type of precise file manipulation is essential for databases, editors, and other applications that need to modify specific parts of files efficiently.

Understanding lseek() - The File Position Controller:

SEEK_SET - Position from beginning of file: lseek(fd, 6, SEEK_SET) = "go to byte 6"
SEEK_CUR - Position relative to current location: lseek(fd, 5, SEEK_CUR) = "move forward 5 bytes"
SEEK_END - Position from end of file: lseek(fd, 0, SEEK_END) = "go to end of file"

lseek() is like a cursor in a text editor - it determines where the next read or write operation will happen

Why close() is Critical:

Always remember to close your file descriptors! Here's why:

Data Safety: Ensures all buffered data is actually written to disk
Resource Management: Frees the file descriptor for other processes to use
System Limits: Each process has a limit (~1024 file descriptors by default)
File Locking: Releases any locks on the file so other programs can access it

Forgetting to close files can lead to data loss, resource exhaustion, and system instability!

Here's how the advanced example looks in the terminal:

Terminal

user@vm:~/unix-demo $ gcc advanced_demo.c -o advanced_demo

user@vm:~/unix-demo $ ./advanced_demo

Created file with: Hello World

Moved file pointer to position: 6

Overwrote 4 bytes at position 6

Final result: Hello Unix

user@vm:~/unix-demo $ cat demo.txt

Hello Unix

user@vm:~/unix-demo $ █

Assembly Level: System Calls in Their Purest Form

Now that you understand how system calls work at the C level, let's see what happens at the lowest level - assembly language. This is where system calls directly communicate with the kernel using CPU instructions.

When your C program calls write(), it eventually becomes assembly instructions that load values into specific CPU registers and execute the syscall instruction. Let's see exactly how this works.

sys_read

system call number (in rax): 0

arguments:

rdi: file descriptor (to read from it)
rsi: pointer to buffer (to keep a read data into it)
rdx: maximal number of bytes to read (maximal buffer size)

return value (in rax):

number of bytes received
On errors: negative number

section .bss

buffer: .space 1

.section .text

global _start

_start:

movq $0, %rax # system call number (sys_read)

movq $0, %rdi # file descriptor (stdin)

leaq buffer(%rip), %rsi # buffer to keep the read data

movq $1, %rdx # bytes to read

syscall # call kernel

movq $60, %rax # system call number (sys_exit)

movq $0, %rdi # exit status

syscall # call kernel

sys_write

system call number (in rax): 1

arguments:

rdi: file descriptor (to write to it)
rsi: pointer to buffer (data to write)
rdx: number of bytes to write

return value (in rax):

number of bytes written
On errors: negative number

section .data

msg: .ascii "Hello\n" # string to print

len = . - msg # length of string

.section .text

global _start

_start:

movq $1, %rax # system call number (sys_write)

movq $1, %rdi # file descriptor (stdout)

leaq msg(%rip), %rsi # message to write

movq $len, %rdx # message length

syscall # call kernel

movq $60, %rax # system call number (sys_exit)

movq $0, %rdi # exit status

syscall # call kernel

assembly code to print positive int

What this code does:

Converts the integer 234 to its ASCII string representation and prints it to stdout.

Key techniques:

Backward string building - Builds the string from right to left
Division by 10 - Extracts digits one by one
ASCII conversion - Adds '0' (48) to convert digit to ASCII
Dynamic length - Calculates string length during conversion

Register usage:

%rax - Holds the number being converted (quotient after division)
%rdx - Contains remainder after division (current digit)
%rcx - Holds divisor (10) for division operation
%rdi - Buffer pointer, moves backward as digits are stored
%rsi - Points to start of converted string for sys_write

Algorithm:

Load number (234) into %rax register
Point %rdi to end of buffer
Divide %rax by 10, get remainder in %rdx (last digit)
Convert digit in %rdx to ASCII and store at (%rdi)
Repeat until %rax becomes 0
Print using %rsi pointer to start of digits

.section .data

x: .long 234 # Define the number

buflen: .long 0 # Length of the converted string

.section .bss

buffer: .space 10 # Buffer to store the ASCII representation

# 10 bytes are for digits (according to maximum int value)

.section .text

.global _start

_start:

# Load the number into rax

xorq %rax, %rax

movl x, %eax

# Point rdi to end of buffer (we'll build string backwards)

leaq buffer, %rdi

addq $10, %rdi # Point to end of buffer

convert_loop:

# Divide by 10 to get last digit

movq $0, %rdx # Clear rdx for division

movq $10, %rcx # Divisor

divq %rcx # Divide rax by 10

# Convert remainder to ASCII and store

addq $'0', %rdx # Convert to ASCII

decq %rdi # Move buffer pointer back

movb %dl, (%rdi) # Store digit

# Increment length

incl buflen

# Continue if quotient is not zero

testq %rax, %rax

jnz convert_loop

# Write the number to stdout

movq $1, %rax # sys_write system call number

movq %rdi, %rsi # Pointer to start of digits

movq $1, %rdi # File descriptor 1 (stdout)

movq buflen, %rdx # Length of string

syscall

# Exit program

movq $60, %rax # sys_exit system call number

movq $0, %rdi # Exit status 0

syscall

print section text byte by byte

What this code does:

Prints each byte of the .text section (the executable code) one by one to stdout.

Key concepts:

Section boundaries - Uses labels to mark start and end of .text section
Byte-by-byte iteration - Loops through each byte in the section
Pointer arithmetic - Increments memory address to access next byte
Memory inspection - Reads and displays raw executable code

Register usage:

%rcx - Current byte pointer, starts at label1 (section start)
%rax - System call number for sys_write
%rdi - File descriptor (stdout = 1)
%rsi - Pointer to current byte to print
%rdx - Number of bytes to write (always 1)

Algorithm:

Set %rcx to label1 (start of .text section)
Print the byte at address %rcx
Increment %rcx to point to next byte
Check if %rcx reached label2 (end of section)
If not at end, repeat from step 2
Exit when entire section is printed

.section .text

.global _start

label1:

_start:

# Load start address of .text section

leaq label1(%rip), %rcx

print_loop:

# Check if we reached the end

leaq label2(%rip), %rax

cmpq %rax, %rcx

jge exit_program

# Print current byte

movq $1, %rax # sys_write system call number

movq $1, %rdi # file descriptor (stdout)

movq %rcx, %rsi # pointer to current byte

movq $1, %rdx # print 1 byte

syscall

# Move to next byte

incq %rcx # increment pointer

jmp print_loop # continue loop

exit_program:

# Exit program

movq $60, %rax # sys_exit system call number

movq $0, %rdi # exit status 0

syscall

label2:

# End of .text section marker

sys_open

system call number (in rax): 2

arguments:

rdi: pathname of the file to open/create
rsi: file access bits (bitwise OR'ed together)
- O_RDONLY (0) - read only
- O_WRONLY (1) - write only
- O_RDRW (2) - read and write
- O_APPEND (1024) - append to end
- O_TRUNC (512) - truncate existing content
- O_CREAT (64) - create if doesn't exist
rdx: file permissions (when O_CREAT is set)
- S_IRWXU (0700) - RWX mask for owner
- S_IRUSR (0400) - Read for owner
- S_IWUSR (0200) - Write for owner
- S_IXUSR (0100) - Execute for owner

return value (in rax):

file descriptor (positive integer)
On errors: negative number

Example breakdown:

$66 = O_RDWR | O_CREAT (2 + 64)
$0700 = S_IRWXU (owner: read/write/execute)

.section .data

fileName: .string "file.txt" # file name

fd: .quad 0 # file descriptor

.section .text

global main

main:

movq $2, %rax # system call number (sys_open)

leaq fileName(%rip), %rdi # set file name

movq $66, %rsi # flags: O_RDWR|O_CREAT (read+write, create if needed)

movq $0700, %rdx # permissions: S_IRWXU (owner read+write+execute)

syscall # call kernel

movq %rax, fd(%rip) # save file descriptor

movq $60, %rax # system call number (sys_exit)

movq $0, %rdi # exit status

syscall # call kernel

sys_close

system call number (in rax): 3

arguments:

rdi: file descriptor (obtained from sys_open)

return value (in rax):

0 on success
On errors: negative number

What sys_close does:

Releases file descriptor - Makes it available for reuse
Flushes buffers - Ensures all data is written to disk
Frees kernel resources - Cleans up internal file structures
Prevents resource leaks - Essential for long-running programs

Example workflow:

file_open: Opens "file.txt" with O_RDONLY
file_close: Closes the file using saved descriptor
exit_program: Clean program termination

Important notes:

Always close files when done! Process has limited file descriptors (~1024).
xorq %rdx, %rdx efficiently sets %rdx to 0.

.section .data

fileName: .string "file.txt" # file name

fd: .quad 0 # file descriptor

.section .text

global main

main:

file_open:

movq $2, %rax # system call number (sys_open)

leaq fileName(%rip), %rdi # set file name

movq $0, %rsi # flags: O_RDONLY (read-only access)

xorq %rdx, %rdx # permissions: not needed for O_RDONLY (read-only)

syscall # call kernel

movq %rax, fd(%rip) # save file descriptor

file_close:

movq $3, %rax # system call number (sys_close)

movq fd(%rip), %rdi # file descriptor

syscall # call kernel

exit_program:

movq $60, %rax # system call number (sys_exit)

xorq %rdi, %rdi # exit status 0

syscall # call kernel

sys_lseek

system call number (in rax): 8

arguments:

rdi: file descriptor
rsi: offset (number of bytes to move)
rdx: where to move from
- SEEK_SET (0) - beginning of the file
- SEEK_CUR (1) - current position of the file pointer
- SEEK_END (2) - end of file

return value (in rax):

Current position of the file pointer
On errors: negative number

What sys_lseek does:

Repositions file pointer - Changes where next read/write will occur
Non-destructive operation - Doesn't modify file content, only pointer position
Enables random access - Jump to any position in file without reading sequentially
Essential for file editing - Allows overwriting specific parts of files

Example workflow:

file_open: Opens "file.txt" with O_RDONLY (read-only)
file_lseek: Moves file pointer 15 bytes from beginning (SEEK_SET)
exit_program: Clean program termination

Important notes:

SEEK_SET (0) = absolute positioning from file start.
SEEK_CUR (1) = relative positioning from current location.
SEEK_END (2) = positioning relative to file end.
Moving beyond file end with write operations extends the file.

.section .data

fileName: .string "file.txt" # file name

fd: .quad 0 # file descriptor

.section .text

global main

main:

file_open:

movq $2, %rax # system call number (sys_open)

leaq fileName(%rip), %rdi # set file name

movq $0, %rsi # flags: O_RDONLY (read-only access)

xorq %rdx, %rdx # permissions: not needed for O_RDONLY (read-only)

syscall # call kernel

movq %rax, fd(%rip) # save file descriptor

file_lseek:

movq $8, %rax # system call number (sys_lseek)

movq fd(%rip), %rdi # file descriptor

movq $15, %rsi # offset: move 15 bytes

movq $0, %rdx # whence: SEEK_SET (from beginning)

syscall # call kernel

exit_program:

movq $60, %rax # system call number (sys_exit)

xorq %rdi, %rdi # exit status 0

syscall # call kernel

Unix Processes

Unix processes are independent programs running in memory. Each process has its own memory space, file descriptors, and process ID (PID). The operating system manages these processes and provides system calls to create, control, and communicate between them.

fork

The fork() system call creates a new process by duplicating the current process. The new process (child) is an exact copy of the original process (parent), except for the return value of fork().

How fork() works:

Returns 0 - In the child process
Returns child PID - In the parent process
Returns -1 - On error (fork failed)

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { // Create a new process (duplicate current process) pid_t pid = fork(); if (pid == 0) { // Child process - fork() returned 0 printf("I'm the child! PID: %d\n", getpid()); } else if (pid > 0) { // Parent process - fork() returned child's PID printf("I'm the parent! Child PID: %d\n", pid); wait(NULL); // Wait for child to finish } else { // Fork failed - fork() returned -1 printf("Fork failed!\n"); } return 0; }

What happens step by step:

fork() is called - Creates an identical copy of the current process
Two processes exist - Parent and child, both continue from the fork() line
Different return values - Child gets 0, parent gets child's PID
Both execute if-else - Each process takes a different branch
wait() synchronizes - Parent waits for child to complete

Here's how it looks when you run it:

Terminal

user@vm:~/SPLab $ gcc fork_demo.c -o fork_demo

user@vm:~/SPLab $ ./fork_demo

I'm the parent! Child PID: 1234

I'm the child! PID: 1234

user@vm:~/SPLab $ █

Key Points:

Memory isolation - Child has its own copy of variables
File descriptors - Child inherits parent's open files
Process hierarchy - Parent-child relationship is established
Concurrent execution - Both processes run simultaneously

#include <stdio.h> #include <unistd.h> int main() { // Initialize counter variable (shared initial value) int counter = 0; printf("Before fork: counter = %d\n", counter); // Fork creates two separate processes with their own memory pid_t pid = fork(); if (pid == 0) { // Child process - modify its own copy of counter counter += 10; printf("Child: counter = %d\n", counter); } else if (pid > 0) { // Parent process - modify its own copy of counter counter += 5; printf("Parent: counter = %d\n", counter); } return 0; }

Memory Isolation Demo:

This example shows that each process has its own memory space. When the child modifies counter, it doesn't affect the parent's copy.

Output: Parent shows counter = 5, Child shows counter = 10

Understanding wait() - Process Synchronization

The wait() system call is crucial for parent-child process coordination. It allows the parent to wait for a child process to terminate and retrieve its exit status.

How wait() works:

Blocks the parent - Parent stops executing until a child terminates
Returns child PID - The PID of the terminated child process
Collects exit status - Stores termination information in the status parameter
Prevents zombie processes - Cleans up terminated child processes

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Create child process int pid = fork(); // Variable to store child's exit status int status; if(pid) { // Parent process - wait for child to finish printf("Parent: waiting for child (PID %d) to finish...\n", pid); // Block until child terminates, get its PID back int child_pid = wait(&status); // Check if child exited normally (not killed by signal) if (WIFEXITED(status)) printf("Parent: child PID %d exited with status %d\n", child_pid, WEXITSTATUS(status)); } else { // Child process - do some work then exit printf("Child: I'm about to exit with status 18\n"); _exit(18); // Exit immediately with status 18 } return 0; }

Process Flow

Start: main() executes

↓

pid = fork()

↓

Check pid value

pid > 0 (Parent)

printf("Parent: waiting...")

↓

child_pid = wait(&status)

↓

WIFEXITED(status) check

↓

printf("child PID %d exited...")

pid == 0 (Child)

printf("Child: I'm about...")

↓

_exit(18)

↓

Program ends: return 0

Step-by-Step Execution:

fork() creates child - Parent gets child PID, child gets 0
Child executes else block - Prints message and exits with status 18
Parent calls wait() - Blocks until child terminates
wait() returns - Returns the PID of the terminated child
Status is analyzed - WIFEXITED() checks if child exited normally
Exit status extracted - WEXITSTATUS() gets the actual exit code (18)

Here's the terminal output showing the synchronization:

Terminal

user@vm:~/SPLab $ gcc wait_demo.c -o wait_demo

user@vm:~/SPLab $ ./wait_demo

Parent: waiting for child (PID 1234) to finish...

Child: I'm about to exit with status 18

Parent: child PID 1234 exited with status 18

user@vm:~/SPLab $ █

Why wait() Returns the Child PID:

Think of it logically: a parent process might have multiple children running simultaneously. When wait() returns, the parent needs to know which specific child just terminated. The return value (child PID) identifies exactly which child process finished.

Multiple children scenario: Parent can distinguish between different child processes
Process tracking: Parent can maintain records of which children are still running
Selective waiting: Parent can take different actions based on which child terminated

Process Termination Status Explained:

status parameter: Contains packed information about how the child terminated
WIFEXITED(status): Returns true if child exited normally (not killed by signal)
WEXITSTATUS(status): Extracts the actual exit code passed to _exit() or return
_exit(18): Child terminates immediately with exit status 18

The parent can use this information to determine if the child completed successfully or encountered an error.

Advanced Process Management Examples

Now let's see practical examples of the three scenarios mentioned above. Each example demonstrates a different aspect of process management with accompanying flowcharts.

1. Multiple Children Scenario

This example shows how a parent process can create and distinguish between multiple child processes:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { pid_t child1, child2; // Create first child child1 = fork(); if (child1 == 0) { printf("Child 1: PID %d, doing task A\n", getpid()); sleep(2); return 10; // Exit code 10 } // Create second child child2 = fork(); if (child2 == 0) { printf("Child 2: PID %d, doing task B\n", getpid()); sleep(1); return 20; // Exit code 20 } // Parent waits for children int status; pid_t finished = wait(&status); if (finished == child1) { printf("Child 1 finished with code %d\n", WEXITSTATUS(status)); } else if (finished == child2) { printf("Child 2 finished with code %d\n", WEXITSTATUS(status)); } // Wait for remaining child finished = wait(&status); if (finished == child1) { printf("Child 1 finished with code %d\n", WEXITSTATUS(status)); } else { printf("Child 2 finished with code %d\n", WEXITSTATUS(status)); } printf("All children finished!\n"); return 0; }

Multiple Children Process Flow

Parent Process starts

↓

fork() → Child 1

↓

fork() → Child 2

↓

Child 1

Task A

↓

Exit(10)

Child 2

Task B

↓

Exit(20)

Parent

wait() × 2

↓

Identify finished child by PID

Terminal - Multiple Children Output

user@vm:$ gcc multiple_children.c -o multiple_children

user@vm:$ ./multiple_children

Child 1: PID 1234, doing task A

Child 2: PID 1235, doing task B

Child 2 finished with code 20

Child 1 finished with code 10

All children finished!

user@vm:$ █

2. Process Tracking Scenario

This example shows how a parent can maintain records of which children are still running:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { pid_t children[3]; int active_children = 3; // Create 3 children with different run times for (int i = 0; i < 3; i++) { children[i] = fork(); if (children[i] == 0) { printf("Child %d started (PID: %d)\n", i+1, getpid()); sleep((i+1) * 2); // Different sleep times printf("Child %d finishing\n", i+1); return i+1; } } // Track children as they finish while (active_children > 0) { int status; pid_t finished_pid = wait(&status); // Find which child finished for (int i = 0; i < 3; i++) { if (children[i] == finished_pid) { printf("Tracked: Child %d (PID %d) finished\n", i+1, finished_pid); children[i] = -1; // Mark as finished active_children--; break; } } printf("Remaining active children: %d\n", active_children); } printf("All children tracked and finished!\n"); return 0; }

Process Tracking Flow

Parent creates 3 children

↓

Initialize tracking array
children[3] = {PID1, PID2, PID3}

↓

Child 1

sleep(2s)

Child 2

sleep(4s)

Child 3

sleep(6s)

↓

Parent waits in loop
wait() returns PID

↓

Find PID in array
Mark as finished (-1)
Update counter

↓

active_children = 0?

Continue loop (back to step 4)

Yes

All children finished

Terminal - Process Tracking Output

user@vm:$ gcc process_tracking.c -o process_tracking

user@vm:$ ./process_tracking

Child 1 started (PID: 1234)

Child 2 started (PID: 1235)

Child 3 started (PID: 1236)

Child 1 finishing

Tracked: Child 1 (PID 1234) finished

Remaining active children: 2

Child 2 finishing

Tracked: Child 2 (PID 1235) finished

Remaining active children: 1

Child 3 finishing

Tracked: Child 3 (PID 1236) finished

Remaining active children: 0

All children tracked and finished!

user@vm:$ █

3. Selective Waiting Scenario

This example shows how a parent can take different actions based on which child terminated:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> int main() { pid_t worker_pid, monitor_pid; // Create worker child worker_pid = fork(); if (worker_pid == 0) { printf("Worker: Processing data...\n"); sleep(3); printf("Worker: Data processing complete\n"); return 0; // Success } // Create monitor child monitor_pid = fork(); if (monitor_pid == 0) { printf("Monitor: Watching system health...\n"); sleep(5); printf("Monitor: System check complete\n"); return 1; // Different exit code } // Parent handles children based on which finishes first int status; pid_t finished_pid = wait(&status); if (finished_pid == worker_pid) { printf("Action: Worker finished first\n"); printf("Action: Proceeding with next phase\n"); // Kill monitor since worker is done printf("Action: Terminating monitor\n"); } else if (finished_pid == monitor_pid) { printf("Action: Monitor finished first\n"); printf("Action: Checking if worker needs help\n"); // Could send signal to worker or take other action } // Wait for remaining child wait(&status); printf("Parent: All tasks completed\n"); return 0; }

Selective Waiting Flow

Parent Process starts

↓

fork() → Worker

Worker Child
Process Data (3s)

↓

Exit(0)

fork() → Monitor

Monitor Child
Watch System (5s)

↓

Exit(1)

↓

Parent: wait()
Returns PID of first to finish

↓

Which child finished?

Worker

Proceed to next phase

Monitor

Check worker status

↓

Wait for remaining child and continue

Terminal - Selective Waiting Output

user@vm:$ gcc selective_waiting.c -o selective_waiting

user@vm:$ ./selective_waiting

Worker: Processing data...

Monitor: Watching system health...

Worker: Data processing complete

Action: Worker finished first

Action: Proceeding with next phase

Action: Terminating monitor

Monitor: System check complete

Parent: All tasks completed

user@vm:$ █

execvp() - Running External Programs

The execvp() system call is how Unix processes run external commands and programs. Think of it as "replacing" the current process with a completely different program. It's like taking over someone's body - the process ID stays the same, but everything else changes.

How execvp() works:

Completely replaces the process - The current program is thrown away and replaced with a new one
Same process ID (PID) - The process keeps its ID, but becomes a totally different program
Passes command-line arguments - The new program receives the arguments you specify
Never returns on success - If it works, your original program is gone forever
Returns only on failure - If the command doesn't exist, execvp() fails and your original program continues

Real-world example:

Imagine you're a manager (parent process) and you need to send an employee (child process) to deliver a presentation. You make a copy of the employee (fork), then completely transform that copy into a presentation-delivery specialist (execvp). The specialist delivers the presentation and then disappears (process ends). Meanwhile, you (the original manager) wait for the job to be done, then continue with your normal work.

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Create command array: program name + arguments + NULL terminator char* argv[3] = { "ls", "-l", 0 }; // Fork to create child process int pid = fork(); if (pid) // Parent: wait for child to complete wait(NULL); else // Child: replace self with "ls -l" command execvp(argv[0], argv); return 0; }

Process Flow (execvp Success)

Start: main() executes

↓

argv[3] = {"ls", "-l", 0}

↓

pid = fork()

↓

Check pid value

pid > 0 (Parent)

wait(NULL)

↓

Parent wakes up

↓

return 0

pid == 0 (Child)

execvp("ls", argv)

↓

Child BECOMES "ls -l"

↓

ls runs & exits

↓

Program ends

Code Breakdown - Step by Step:

argv[3] = {"ls", "-l", 0} - This creates the command "ls -l" (list files in long format). The array must end with 0 to mark the end of arguments.
fork() creates two processes - Parent (original program) and child (copy of the program)
Parent process (if pid > 0) - Calls wait() and sleeps until the child finishes
Child process (if pid == 0) - Calls execvp() and transforms into the "ls" program
execvp(argv[0], argv) - Child is completely replaced by the program named in argv[0] ("ls") with all arguments in argv
ls runs and exits - The transformed child process lists files and then terminates
Parent wakes up - wait() returns, parent continues and exits normally

Important: Understanding argv[0] and the argv array

argv[0] is NOT a process ID! It's the program name that the new process should identify itself as.

argv[0] = "ls" - The name of the program to execute
argv[1] = "-l" - First command-line argument
argv[2] = 0 - NULL terminator (marks end of arguments)

When execvp(argv[0], argv) runs, it searches for a program named "ls", and when that program starts, it receives the entire argv array as its command-line arguments. The "ls" program will see argv[0]="ls" and argv[1]="-l", just like if you typed "ls -l" in the terminal!

Key Point:

The child process doesn't "run the ls command" - it becomes the ls command! It's like a complete personality change. The child process stops being your program and starts being the ls program instead.

Here's the terminal output showing the ls command execution:

Terminal

user@vm:~/SPLab $ gcc execvp_demo.c -o execvp_demo

user@vm:~/SPLab $ ./execvp_demo

total 24

-rwxr-xr-x 1 user user 8760 Jan 15 10:30 execvp_demo

-rw-r--r-- 1 user user 245 Jan 15 10:29 execvp_demo.c

-rw-r--r-- 1 user user 123 Jan 15 09:45 demo.txt

user@vm:~/SPLab $ █

What Actually Happened - The Complete Story:

Program starts - Your main() function begins executing
fork() creates identical twin - Now there are two copies of your program running simultaneously
Parent says "I'll wait" - Parent process calls wait() and goes to sleep
Child transforms completely - Child calls execvp() and becomes the "ls -l" program (your original program in the child is gone!)
ls does its job - The transformed child (now ls) lists directory contents and prints them
ls finishes and dies - The ls program completes and the child process terminates
Parent wakes up and exits - wait() returns, parent realizes child is done, then parent exits too

Important: This is exactly how your shell works when you type "ls -l" in the terminal! The shell forks itself, transforms the child into ls, waits for ls to finish, then shows you the prompt again.

What if we change the code like this:

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Try to run non-existent command "junk" char* argv[2] = { "junk", 0 }; // Create child process int pid = fork(); if (pid) // Parent: wait for child (even if execvp fails) wait(NULL); else // Child: try to exec "junk" - this will fail! execvp(argv[0], argv); return 0; }

Terminal

user@vm:~/SPLab $ gcc junk_demo.c -o junk_demo

user@vm:~/SPLab $ ./junk_demo

sh: 1: junk: not found

user@vm:~/SPLab $ █

Process Flow (execvp Failure)

Start: main() executes

↓

pid = fork()

↓

Check pid value

pid > 0 (Parent)

wait(NULL)

↓

Parent resumes after child exits

↓

return 0

pid == 0 (Child)

execvp("junk", argv)

↓

execvp() FAILS! "junk" not found

↓

Child continues with original process

↓

return 0 (child exits)

↓

Program ends

Understanding Command Failure with "junk":

We intentionally use "junk" - a command that doesn't exist - to demonstrate what happens when execvp() fails. This is educational because it shows the difference between success and failure.

What happens when execvp() fails:

Child tries to become "junk" - execvp() looks for a program called "junk"
System can't find "junk" - No such program exists on the system
execvp() fails and returns - Instead of transforming, the child continues as your original program
Child reaches return 0 - Since execvp() failed, the child process continues and exits normally
Parent wakes up - wait() returns when child exits, parent continues and exits

Key insight: When execvp() succeeds, it never returns (the process becomes something else). When it fails, it returns an error and your original program continues running.

Back to our working example:

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/wait.h> int main() { // Command array: "ls -l" (program + argument + NULL) char* argv[3] = { "ls", "-l", 0 }; // Fork creates child process int pid = fork(); if (pid) // Parent waits for child to finish wait(NULL); else // Child becomes the "ls" program execvp(argv[0], argv); return 0; }

🤔 Critical Question:

In the last example we can see that the parent and the child processes will not execute in parallel. So why even bother having a child process? Aren't there simpler solutions? Like for example using the same process and adding a few more loops?

let's find out...

Unix Shell

A Unix shell is a command-line interpreter that provides a user interface for the Unix operating system. It acts as an intermediary between the user and the kernel, allowing users to execute commands, run programs, and manage files. The shell reads commands from the user, interprets them, and executes the appropriate programs.

💡 Why does the Shell need a child process, if they do not work in parallel?

The Answer:

If the shell called execvp() directly on itself, we would lose the shell process entirely! The shell would be replaced by the command being executed, and when that command finishes, there would be no shell left to return to.

Here's how the shell solves this problem:

Shell displays prompt - Waits for user input
User types command - Shell parses the command and parameters
Shell forks a child - Creates a copy of itself
Child executes command - Child process image is replaced by the command
Parent waits - Shell waits for command to complete
Command finishes - Child process terminates
Shell continues - Returns to step 1, ready for next command

Key insight:

The shell preserves itself by using a child process as a "sacrificial" process that gets replaced by the user's command. This way, the original shell process remains alive and can continue accepting new commands after each one completes.

// Simplified Unix Shell Implementation while (TRUE) { // Display shell prompt (e.g., "$ ") typePrompt(); // Read user input and parse command + arguments getCommand(&command, &parameters); if (fork() > 0) /* Parent (shell): wait for command to complete */ wait(); else /* Child: replace self with user's command */ execvp(command, parameters); }

Assembly and Fork

Understanding how fork() and execve() work at the assembly level provides deep insight into Unix process management. This low-level implementation shows exactly how system calls interact with the kernel, how processes are created and managed, and how program execution is transferred. Assembly code reveals the raw mechanics behind C library functions, demonstrating the direct syscall interface and register usage patterns that high-level languages abstract away.

Equivalent C Code:

#include <stdio.h> #include <unistd.h> #include <sys/wait.h> #include <stdlib.h> int main() { // 18: xorq %rax, %rax // 19: movq $57, %rax // 20: syscall pid_t pid = fork(); // 22: testq %rax, %rax // 23: js fork_error if (pid < 0) { // 63: movq $60, %rax // 64: movq $2, %rdi // 65: syscall exit(2); // Fork error } // 24: jz child_process else if (pid == 0) { // 46: movq $1, %rdi // 47: leaq child_msg(%rip), %rsi // 48: movq $6, %rdx // 49: movq $1, %rax // 50: syscall write(1, "child\n", 6); // 52: leaq bin_ls(%rip), %rdi // 53: leaq argv(%rip), %rsi // 54: leaq envp(%rip), %rdx // 55: movq $59, %rax // 56: syscall char* argv[] = { "/bin/ls", NULL }; char* envp[] = { NULL }; execve("/bin/ls", argv, envp); // 59: movq $60, %rax // 60: movq $1, %rdi // 61: syscall exit(1); // execve failed } else { // 34: movq %rax, %rdi // 35: xorq %rax, %rax // 36: movq $61, %rax // 37: xorq %rsi, %rsi // 38: xorq %rdx, %rdx // 39: xorq %r10, %r10 // 40: syscall wait(NULL); // 43: movq $1, %rdi // 44: leaq parent_msg(%rip), %rsi // 45: movq $7, %rdx // 46: movq $1, %rax // 47: syscall write(1, "parent\n", 7); // 50: movq $60, %rax // 51: xorq %rdi, %rdi // 52: syscall exit(0); } return 0; }

Terminal

user@vm:~/assembly $ gcc -o fork_demo fork_demo.c

user@vm:~/assembly $ ./fork_demo

child

demo.txt fork_demo fork_demo.c fork_demo.s

parent

user@vm:~/assembly $ █

Assembly Implementation:

.section .data

bin_ls: .string "/bin/ls\0"

argv:

# Pointer to filename

.quad bin_ls

# Null-terminated argument list

.quad 0

# Null-terminated environment # pointer for execve

envp: .quad 0

child_msg: .string "child\n"

parent_msg: .string "parent\n"

.section .bss

pid: .skip 4

.section .text

.globl main

main:

# fork() system call

xorq %rax, %rax # Clear rax for syscall number

movq $57, %rax # syscall number

syscall

# Check fork result

testq %rax, %rax # Check if rax is zero (child process)

js fork_error # Jump to error handler if rax < 0

jz child_process # Jump if rax == 0 (child process)

parent_process:

# wait() system call

movq %rax, %rdi # Save child's PID in rdi

xorq %rax, %rax # Zero out rax for syscall number

movq $61, %rax # syscall number

xorq %rsi, %rsi # rsi = 0 (wait for any child process)

xorq %rdx, %rdx # rdx = 0 (no options)

xorq %r10, %r10 # r10 = 0 (no usage of rusage)

syscall

# Print a message using write() syscall

movq $1, %rdi # file descriptor 1 (stdout)

leaq parent_msg(%rip), %rsi # pointer to parent message

movq $7, %rdx # size of the parent message

movq $1, %rax # syscall number for write()

syscall

exit: # Exit parent process

movq $60, %rax # syscall number for exit()

xorq %rdi, %rdi # rdi = 0 (exit code)

syscall

child_process:

# Print a message using write() syscall

movq $1, %rdi # file descriptor 1 (stdout)

leaq child_msg(%rip), %rsi # pointer to child message

movq $6, %rdx # size of the child message

movq $1, %rax # syscall number for write()

syscall

# execve("/bin/ls", argv, envp) system call

leaq bin_ls(%rip), %rdi # rdi = pointer to filename

leaq argv(%rip), %rsi # rsi = pointer to argv array

leaq envp(%rip), %rdx # rdx = pointer to envp array

movq $59, %rax # syscall number

syscall

# If execve returns, it failed

execve_error:

movq $60, %rax # syscall number for exit()

movq $1, %rdi # rdi = 1 (exit code)

syscall

fork_error:

movq $60, %rax # syscall number for exit()

movq $2, %rdi # rdi = 2 (exit code)

syscall

How This Assembly Implementation Works:

Data Section Setup: The assembly defines string literals and data structures in memory, including the program path ("/bin/ls"), argument array, and environment pointer - all stored in the .data section with proper null termination.
Direct System Call Interface: Instead of using C library functions, this code makes direct syscalls to the kernel using specific syscall numbers (57 for fork, 61 for wait, 59 for execve, 60 for exit) and the syscall instruction.
Register-Based Parameter Passing: System call arguments are passed through specific registers (%rdi, %rsi, %rdx, %r10) following the x86-64 calling convention, with %rax holding the syscall number and receiving the return value.
Conditional Branching Logic: The assembly uses conditional jumps (js, jz) to handle fork's three possible outcomes: error (negative), child process (zero), and parent process (positive PID).
Process Transformation: In the child process, execve() completely replaces the process image with the new program (/bin/ls), while the parent waits and then prints its message, demonstrating the fundamental Unix process model at the lowest level.

Key Insight: This assembly code reveals the raw mechanics that C library functions abstract away - every high-level operation like fork(), wait(), and execve() ultimately translates to these precise register manipulations and syscall invocations, providing direct communication with the kernel.

Unix System Interface

This document is made according to Marina's computer architecture course

this file is based on the presentation: "lecture 8 - system calls" presentation by marina

What is the Unix System Interface?

this part will represent ONLY the lecture's presentation. the books material is at the end and is considered extra

The Core Problem

The Unix Solution

Understanding Unix Processes

A process includes:

How a Process Is Assembled

File Descriptors:

Process ID vs File Descriptor:

Flags:

Permissions (Who Can Do What):

How System Calls Work

The Three Essential System Calls

System Call Functions - The Building Blocks:

read() - Reading Data from Files:

write() - Writing Data to Files:

open() - Opening Files for Access:

Example

What This Advanced Example Demonstrates:

Understanding lseek() - The File Position Controller:

Why close() is Critical:

Assembly Level: System Calls in Their Purest Form

sys_read

arguments:

return value (in rax):

sys_write

arguments:

return value (in rax):

assembly code to print positive int

What this code does:

Key techniques:

Register usage:

Algorithm:

print section text byte by byte

What this code does:

Key concepts:

Register usage:

Algorithm:

sys_open

arguments:

return value (in rax):

Example breakdown:

sys_close

arguments:

return value (in rax):

What sys_close does:

Example workflow:

Important notes:

sys_lseek

arguments:

return value (in rax):

What sys_lseek does:

Example workflow:

Important notes:

Unix Processes

fork

How fork() works:

What happens step by step:

Key Points:

Memory Isolation Demo:

Understanding wait() - Process Synchronization

How wait() works:

Step-by-Step Execution:

Why wait() Returns the Child PID:

Process Termination Status Explained:

Advanced Process Management Examples

1. Multiple Children Scenario

2. Process Tracking Scenario

3. Selective Waiting Scenario

execvp() - Running External Programs

How execvp() works:

Real-world example:

Code Breakdown - Step by Step:

Important: Understanding argv[0] and the argv array

Key Point:

What Actually Happened - The Complete Story:

What if we change the code like this:

this file is based on the presentation:
"lecture 8 - system calls" presentation by marina