Skip to main content Link Menu Expand (external link) Document Search Copy Copied

C Programming Guide for xv6

Introduction

Welcome to the C Programming Guide for xv6! This guide is designed to help you learn the essential C language concepts needed to understand and modify the user programs in the xv6-riscv operating system from MIT. We’ll cover everything from basic types and variables all the way through pointers, arrays, structs, strings, and command-line argument processing.

While xv6 is a teaching operating system, its user-space programs are still real C programs running on a Unix-like environment. By the end of this guide, you should feel comfortable reading, understanding, and writing C code in xv6 user space.


1. Basic Concepts

1.1 Variables and Types

In C, a variable is a named space in memory that can store data. Each variable has a type, which determines the kind of data it can hold (e.g., integer, character, floating-point, etc.). In xv6 user programs, you will frequently see the following types:

  • int: Stores integer values (e.g., 1, 2, 3…). Often 32 bits on modern systems.
  • char: Stores a single character (e.g., ‘a’, ‘b’, ‘c’). Also used for small integers or buffers.
  • uint or unsigned int: Stores non-negative integer values.
  • short, long: Larger or smaller integer types, although usage is less frequent in xv6.
  • void: Used in functions that do not return a value or for pointers to an unknown type.

You will also notice custom type definitions in xv6 like uint64 in some files, or any custom types defined in header files (e.g., typedef unsigned int uint;). These are used for kernel or hardware-specific definitions.

Here is a simple example of defining variables in an xv6 user program:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
    int number = 42;
    char letter = 'x';
    printf("Number: %d\n", number);
    printf("Letter: %c\n", letter);
    exit(0);
}

Explanation:

  1. The header files types.h, stat.h, and user.h are commonly included in xv6 user programs to provide basic types and user-space system call wrappers (e.g., printf, exit).
  2. int main(int argc, char *argv[]) is the standard signature for main in a C program, indicating the program can receive command-line arguments.
  3. We declare an integer and a character, then print them.
  4. exit(0); terminates the program, returning status 0 to the shell.

1.2 Functions

A function in C is a reusable block of code that performs a specific task. You’ve already seen the main() function. Functions generally have the form:

type function_name(type parameter1, type parameter2, ...)
{
    // function body
}

The return type (e.g., int, char, void) specifies the type of data the function will return. In xv6, user programs often define their own helper functions to keep the code organized.

Below is an example function:

// function that returns the sum of two integers
int sum(int a, int b)
{
    return a + b;
}

int
main(int argc, char *argv[])
{
    int result = sum(10, 32);
    printf("Sum: %d\n", result);
    exit(0);
}

2. Arrays

An array in C is a sequence of elements of the same type, stored in contiguous memory. The syntax for declaring an array is:

type array_name[size];

For example:

int arr[5];

arr[0] = 10;
arr[1] = 20;
// etc.

Arrays are zero-indexed, meaning their first element is at index 0 and the last element is at index size - 1. When you pass an array to a function, it decays into a pointer to its first element. This means the function receiving the array can treat it as a pointer.

2.1 Array Initialization

You can initialize an array at the time of declaration:

int arr[5] = {10, 20, 30, 40, 50};

Or, if you omit the size, the compiler will count:

int arr[] = {10, 20, 30, 40, 50}; // automatically size 5

2.2 Using Arrays in xv6

In xv6 user programs, you might see arrays used for:

  • Buffering input and output (e.g., reading data from a file).
  • Storing command-line arguments in argv[].
  • Temporary storage of data in small utilities.

Since the xv6 environment is minimal, you typically see small fixed-size arrays rather than dynamic allocations for simple tasks.

2.3 Array Example Function

Below is a simple example of a function that manipulates an integer array, computing its sum and average. Notice how we pass the array and its size to the function:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

// This function returns the sum of elements in arr and places the average in *avg.
int array_sum_and_average(int arr[], int size, float *avg)
{
    int i;
    int sum = 0;

    for (i = 0; i < size; i++) {
        sum += arr[i];
    }

    // We must cast to float to avoid integer division for the average.
    *avg = (float)sum / (float)size;
    return sum;
}

int
main(int argc, char *argv[])
{
    int myarr[5] = {2, 4, 6, 8, 10};
    float average;

    int sum = array_sum_and_average(myarr, 5, &average);
    printf("Sum: %d\n", sum);
    printf("Average: %d.%d\n", (int)average, (int)((average - (int)average) * 10));

    exit(0);
}

Explanation:

  1. The function array_sum_and_average() takes a pointer to an array (int arr[]) and its size, plus a pointer to a float where it will store the computed average.
  2. We iterate over the array to compute a running sum.
  3. We then store the computed average in *avg.
  4. In main(), we print the sum and average.

3. Pointers

A pointer is a variable that stores the address of another variable. In C, pointers are declared using the * operator. For example:

int x = 100;
int *p = &x; // p points to x
  • &x means “the address of x.”
  • p is now holding the memory address of x.
  • *p refers to the value pointed to by p. For example, *p = 200; will change x to 200.

Pointers are powerful and heavily used for:

  • Dynamic memory allocation
  • Passing arrays to functions
  • Working with strings (null-terminated character arrays)

In xv6 user programs, you’ll see pointers in system calls (like char *buf when calling read) and command-line arguments (char *argv[]).

3.1 Pointer Example

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

void increment(int *n)
{
    // increments the value pointed to by n
    *n = *n + 1;
}

int
main(int argc, char *argv[])
{
    int value = 0;
    printf("Before: %d\n", value);
    increment(&value);
    printf("After: %d\n", value);
    exit(0);
}

Explanation:

  1. increment(int *n) receives a pointer to an integer and modifies the integer’s value in place.
  2. We print the value before and after the function call to see the effect.

4. Structs

A struct is a collection of variables (called members) under one name. You can store different types of data in a single struct. In xv6, structs are used for more complex data structures, such as process information or file system metadata.

4.1 Defining a Struct

To define a struct:

typedef struct {
    int pid;
    char name[16];
} Process;

This example defines a Process struct with an integer pid and a character array name of size 16. You can create an instance of it:

Process p;
p.pid = 123;

// copying a string into p.name, for example:
// use a string function from xv6 (e.g., strncpy if available)
// in minimal cases, you may do it manually:
int i;
char *src = "init";
for(i = 0; i < 16 && src[i] != '\0'; i++) {
    p.name[i] = src[i];
}
p.name[i] = '\0'; // null-terminate

printf("PID: %d, Name: %s\n", p.pid, p.name);

4.2 Passing Structs Around

You can pass structs to functions by value or by reference (pointer). Passing by pointer is typically more efficient if the struct is large.

Example of passing a struct by pointer:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

typedef struct {
    int pid;
    char name[16];
} Process;

void print_process_info(Process *proc)
{
    printf("Process info -> PID: %d, Name: %s\n", proc->pid, proc->name);
}

int
main(int argc, char *argv[])
{
    Process p;
    p.pid = 202;

    // Let's say we want the name to be "shell"
    // We'll copy it manually for demonstration
    int i;
    char *src = "shell";
    for(i = 0; i < 16 && src[i] != '\0'; i++) {
        p.name[i] = src[i];
    }
    p.name[i] = '\0';

    print_process_info(&p);
    exit(0);
}

Explanation:

  1. We define Process with a pid and name.
  2. We use a function print_process_info() that takes a pointer to Process.
  3. Inside the function, we use the -> operator to access struct fields via the pointer.

4.3 Nested Structs

You may also encounter nested structs in more complex code. The principle is the same—just be mindful of how you access each level of nesting.


5. Strings and String Manipulation in xv6

A string in C is represented as a sequence of characters terminated by a null character ('\0'). In xv6, many of the standard C library string functions (like strlen, strcmp, strcpy, strncpy) are re-implemented in the user environment or kernel for basic usage.

Here are some commonly used string functions in xv6:

  • strlen(char *str): Returns the length of the string (not counting the null terminator).
  • strcmp(char *s1, char *s2): Compares two strings. Returns 0 if they are identical, a negative value if s1 < s2, and a positive value otherwise.
  • strcpy(char *dst, char *src): Copies the string src to dst.
  • strncmp(char *s1, char *s2, int n): Compares up to n characters of two strings.

Below is a simple example that shows how to use these functions in an xv6 user program:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
    if (argc < 2) {
        printf("Usage: strdemo <some_string>\n");
        exit(1);
    }

    char buffer[32];

    // Copy argument to buffer
    strcpy(buffer, argv[1]);

    // Print original and copy
    printf("Original: %s\n", argv[1]);
    printf("Copied: %s\n", buffer);

    // Compare strings
    if (strcmp(argv[1], buffer) == 0) {
        printf("Strings match!\n");
    } else {
        printf("Strings do not match!\n");
    }

    // Print length
    printf("Length of '%s' is %d\n", buffer, strlen(buffer));

    exit(0);
}

Explanation:

  1. We include the usual xv6 headers (types.h, stat.h, and user.h).
  2. We ensure the user provided at least one argument.
  3. We declare a character buffer of fixed size (32) and copy argv[1] into buffer using strcpy.
  4. We compare the two strings with strcmp and print the result.
  5. Finally, we compute the length of the string using strlen.

Important Note: Always be mindful of buffer sizes. In xv6, you generally have to ensure you don’t write more data than the buffer can hold, as there is no built-in protection against out-of-bounds writes.


6. Command-Line Arguments

In xv6 user programs, command-line arguments are received by the main function as int argc, char *argv[]. Here:

  • argc is the number of command-line arguments (including the program name itself).
  • argv is an array of character pointers (strings). Each argv[i] is one argument.

6.1 Simple Command-Line Arguments

Here’s a program that simply prints out the arguments passed to it:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
    int i;
    for (i = 0; i < argc; i++) {
        printf("Argument %d: %s\n", i, argv[i]);
    }
    exit(0);
}

Compile this program inside xv6 and run:

$ echoarg one two three

You’ll see:

Argument 0: echoarg
Argument 1: one
Argument 2: two
Argument 3: three

6.2 Handling Optional Flags Without getopt()

In standard Unix/C, you might use the library function getopt() to parse command-line options. However, in xv6, we typically do not have these library calls readily available. Instead, we manually parse arguments.

Below is an example that handles the following options:

  • -a (no argument)
  • -r num (requires an integer argument)

We want to handle commands like:

foo -a
foo -r num
foo -a -r num
foo -r num -a

We can do something like this:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
    int i = 1;
    int have_a = 0;
    int r_value = -1;

    // We'll iterate through argv[] starting from index 1 (since index 0 is the program name)
    while (i < argc) {
        if (argv[i][0] == '-') {
            // this is an option
            switch (argv[i][1]) {
            case 'a':
                // -a option
                have_a = 1;
                i++;
                break;
            case 'r':
                // -r option, must have another argument if present
                if (i + 1 < argc) {
                    i++;
                    r_value = atoi(argv[i]);
                } else {
                    printf("Error: -r requires an argument\n");
                    exit(1);
                }
                i++;
                break;
            default:
                printf("Unknown option: %s\n", argv[i]);
                exit(1);
            }
        } else {
            // It's not an option, maybe it's a filename or something else
            printf("Non-option argument: %s\n", argv[i]);
            i++;
        }
    }

    // now we can do something with have_a and r_value
    if (have_a) {
        printf("-a was provided\n");
    } else {
        printf("-a was not provided\n");
    }

    if (r_value != -1) {
        printf("-r was provided with value %d\n", r_value);
    } else {
        printf("-r was not provided\n");
    }

    exit(0);
}

Explanation:

  1. We keep track of whether we saw -a (have_a) and the value of -r (r_value).
  2. We loop through the arguments, checking if the first character is '-'. If it is, we look at the second character to decide which option it is.
  3. For -r, we consume an additional argument to parse the num value.
  4. We handle any unknown options with an error message.

7. Reading from a File or from stdin

Sometimes your program needs to read from a file specified on the command line, or from standard input (stdin) if no file is provided. Below is a simple example that demonstrates this logic. It reads from a file if a filename is given, or from stdin otherwise.

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
#include "kernel/fs.h"

// read content from fd and print it line by line.
void
read_and_print(int fd)
{
    char buf[128];
    int n;

    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        // in a real program, you might process the buffer
        // for simplicity, let's just write it back out to stdout
        write(1, buf, n);
    }
}

int
main(int argc, char *argv[])
{
    int fd;

    if (argc > 1) {
        // if a filename is provided
        fd = open(argv[1], 0);
        if (fd < 0) {
            printf("Error: cannot open %s\n", argv[1]);
            exit(1);
        }
    } else {
        // read from stdin
        fd = 0; // file descriptor 0 is stdin in xv6
    }

    read_and_print(fd);

    // if we opened a file, close it
    if (argc > 1) {
        close(fd);
    }

    exit(0);
}

Explanation:

  1. read_and_print(int fd) repeatedly calls read() on the file descriptor and writes the contents to stdout (file descriptor 1) until there’s no more data.
  2. In main(), if the user supplied a filename, we open() it. Otherwise, we default to stdin (fd = 0).
  3. We close the file if we opened one.

Conclusion

With these topics covered—variables, types, functions, arrays, pointers, structs, strings, and command-line processing—you should have the core C knowledge needed to explore and modify xv6 user programs. Here are some final tips:

  • Spend time reading through the existing xv6 user programs (like ls.c, echo.c, grep.c, etc.) to see how these concepts are used.
  • Practice writing small programs that utilize these features.
  • Always remember that the xv6 environment is minimal, so you may not have all the usual library functions available.
  • Pay attention to memory usage, especially when working with arrays and strings, to avoid out-of-bounds issues.

We hope this guide serves as a helpful reference as you dive deeper into xv6. Keep experimenting, and have fun learning!