Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Project01 Heap Allocator (10%)

Tests 01 through 04 due Mon Feb 24 by 11:59pm in your Project01 GitHub repo

Tests 01 through 10 due Mon Mar 3 by 11:59pm in your Project01 GitHub repo

Interactive Grading on Tue Mar 4

  • A copy of xv6-riscv with your new implementation of malloc() and free() in user/umalloc.c
  • Your implemenation should pass all of the memtest tests from the Project01 Autograder tests.
  • You source code should conform to xv6 formatting conventions.
  • Your Project01 repo should not have any extraneous files or build artifacts
  • Make sure to test your repo by cloning from GitHub to a different location and run the Autograder. This will catch problems like missing files.

Contents

  1. Overview
  2. Preliminaries
  3. Heap Allocator
  4. Grading Rubric
  5. Code and Repo Quality
  6. Autograder Tests

Overview

To get a better idea of how user-level heap memory is managed and how a user-level process can request more memory from the kernel you will implement your own versions of malloc() and free() that conform the the requirements specified below. Note, that xv6 comes with its own implemenations of malloc() and free() which come from the book “The C Programming Language”, 2nd Edition by Dennis Ritchie and Brian Kernighan. In the project01 starter code I’ve moved the xv6 implemenation of malloc()/free() to umalloc_xv6.c. You are welcome to study this implemenation, but we will be doing things a bit differently. For one, the current implemenation uses a singly linked list for keeping track of free blocks, we are going to use a doubly linked list implemenation that will be provided. We are also going to add code and data to make it easier to visualize used and free blocks.

Preliminaries

When xv6 starts a user-level process the memory layout looks like this:

HEAP (size 0)
STACK (size 4096)
GUARD (size 4096)
DATA (size 4096*n)
CODE (size 4096*n, addr 0)

See Figure 2.3 on page 26 and Figure 3.4 on page 36 in the xv6 Book. The CODE (also called TEXT) section starts at virtual address 0 and is a multiple of 4096 bytes. 4096 is the native page size of the RISC-V processor and is a common page size in other computer processors. This means that when mapping virtual addresses the processor and the kernel do so in multiples of 4096. The DATA will start on a 4096 page and will also be a multiple of 4096. The GUARD page is unmapped and will help catch programs that try to use too much stack space (that is more than 4096 bytes). The STACK section also starts on a multiple of 4096 and is 4096 bytes in size. Initially, there is no actual HEAP memory allocated when a process starts. If a user process needs heap memory it must request memory from the kernel.

sbrk()

The xv6 kernel, as well as other Unix kernels, provides the sbrk() system call to request more memory for a process and this memory is generally used for heap memory. Here is the prototype for sbrk():

char* sbrk(int nbytes);

The nbytes argument is the number additional bytes of memory you want the kernel to extend the process’s address space with usable memory. The name sbrk() comes from set break where the break is the end of the processes address space. Interestly, nbytes can both be a positive value (grow the process address space) and a negative value (shrink the process address space). In practice, it is difficult to shrink the address space of a process because once a C program has a pointer to memory the memory must exist as long as the code thinks it should be allocated. That is, in C, we do not have any form of garbage collection. The return value of p = sbrk(nbytes) is a pointer (address) to the beginning of the allocated region of memory, and all addresses from p to p + (nbytes - 1) are valid.

Given this description of sbrk() we can create a very simple implementation of malloc() and free() that uses sbrk() for new malloc() requests, but free() does nothing. That is, free() won’t actually free up any used memory:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
#include "kernel/param.h"

void
free(void *ap)
{
  /* This allocator does not support freeing allocated memory */
  return;
}

void*
malloc(uint nbytes)
{
  char *p;
  p = sbrk(nbytes);
  return p;
}

This will work, but is very wasteful and a long running program could run out of memory. Instead, we want to request heap memory from the kernel, but manage this heap memory so that memory blocks that are deallocated can eventually be reused by future calls to malloc(). In order to do this, we need a way to keep track of free memory and used memory. When we need more memory from the kernel, we will call sbrk(). When we free memory, we not only want to mark the block as free, but also see if this free block can be merged with neighboring free blocks to create a larger free block.

sbrksz()

To support testing of your memory allocator, I’ve added a new system call to xv6:

long sbrksz(void);

This system call will return the current size of the process address space, the break. This will be used by some of the tests to make sure you are not requesting more memory from the kernel than necessary for a particular sequence of allocations and deallocations. For example, consider the following code snippet:

uint64 s1, s2, diff;
s1 = sbrksz();
sbrk(16)
s2 = sbrksz();
diff = s2 - s1;

In this case diff should be 16.

sh.c Modifcations

The started code for Project01 inludes a modified sh.c to allow more than the default of 10 arguments passed to a commond. This has been increased to 64 to support the heap allocator test program (memtest.c) described below.

param.h Modifcation

I increased the MAXARG macro from 32 to 64 to allow for more command line arguments to be passed to the memtest program.

user.h Modification

I added a print_debug() macro to help with debug messages:

#define DEBUG 0
#define debug_print(fmt, ...) \
    do { if (DEBUG) fprintf(2, "%s:%d:%s(): " fmt, __FILE__, \
         __LINE__, __func__, __VA_ARGS__); } while (0)

In order to use this, just add something like:

debug_print("heap size request = %d\n", heap_req_size);

Then change DEBUG to 1:

#define DEBUG 1

Heap Allocator

The basic idea behind our heap allocator is that we will request memory from the kernel (via sbrk()) as needed to extend the heap space. We will manage all the heap space that we allocate from the kernel. That is, we will keep track of used memory and free memory. We will try to minimize calls to sbrk() by looking for free blocks in the existing heap space and only extend the heap if we cannot find a free block. As blocks are freed we need to see if we can combine neighboring free blocks to create a larger free block.

Here are some basic parameters we will work with:

  • The minimum size of a free block will be 4 bytes.
    • If malloc() is called with a value less that 4 bytes, you will bump this up to 4 bytes.
  • The minimum size for a request to sbrk() will be 4096.
  • If we need more than 4096 bytes, then you need to request a multple of 4096 bytes from sbrk()
  • That is, the process size as returned by sbrksz() should always be a multiple of 4096, sbrkzs() % 4096 == 0 (this is called an invariant).
  • When a free(p) call is made, you should merge the new free block with neighboring free blocks. (Hint: do this one pair of blocks at a time).
  • If a call to malloc(n) requires a call to sbrk() you should factor the space in a free block that is at the end of the current heap space.
  • When looking for a free block, you should use a first fit approach. That is, you should find the first free block that can accommodate the malloc() request.
  • When splitting a free block, you should convert the free block to a used block, then create a new free block in the remained space of the original free block, then add this to the block list. Another way to say this is that when splitting a free block, the used block should go first and the new free block should come second, right after the the new used block.
  • If you find a free block, but if spitting won;t result in enough space for the used block AND a new free block, then you just allocate all of the free block space to the used block.

Memory Blocks

To keep track of memory blocks, we are going to use a memory block header:

struct block_hdr {
  struct list_elem elem;
  char name[8];
  uint used;
  uint size;
};

Every block of memory that we manage will have a block header (both used blocks and free blocks). Since we are implementing malloc()/free(), we can’t use malloc()/free() for block headers. Instead we will embed the block headers in the heap space. That is, each block of memory will have a header associated with it. So, a block will have a header and then the usable space in the block.

The size field represents the size of the useable data in the block. Note: it does not include the size of the block header. So the total size for a block is: sizeof(struct block_hdr) + size.

Understanding memtest commands

The memtest program can create different memory allocation scenarios. It is used to test your heap allocator.

(m)alloc

$ memtest m <idx> <size> <name>

The m command means (m)alloc, the idx is the index of an internal pointer table used to keep track of malloc() pointers. The size is the number of bytes to allocate and you can give the block a name (7 characters max).

(f)ree

$ memtest f <idx>

The f command means (f)ree. It will attempt to deallocate the block pointed to by idx.

(p)rint

The p command prints the current block list.

(s)ummary

The s command prints a summary of the heap allocator. Here is an example of summary output:

(c)heck

$ memtest c <maxsize>

The c command performs and internal memory check to ensure all the block headers and data areas are intact and that the heap is less than or equal to heapstart plus maxsize.

For example:

$ memtest m 0 3000 A0 p s c 4096
[USED] name = A0 boff = 0 bhsz = 32 bsz = 3032 doff = 32 dsz = 3000
[FREE] name = NONE boff = 3032 bhsz = 32 bsz = 1064 doff = 3064 dsz = 1032
malloc_summary():
used  : 3000
free  : 1032
over  : 64
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

Used is the amount of useable used data. Free is the amount of useable free data. Overhead the the sum of all of the memory on the heap used by the block headers (struct block_hdr).

Understanding memtest output

Here is an diagram of the memtest for a single block:

Diagram

Terminology:

  • A block consists of a block header and a data section.
  • A block can have a 7 character name (this is optional).
  • heapstart is the address of where the heap starts in user memory. This will always be a multiple of 4096.
  • boff is the block offset relative to heapstart. A boff == 0 means this is the first block in the heap.
  • doff is the data offset relative to heapstart.
  • bhsize is the size in bytes of the block_hdr. This is 32 bytes and should never change. It is included for completeness of explaining different values.
  • dsz is the size of the usable data section.
  • bsz is the total size of the block including the block_hdr and the data section.

Here is a diagram of the output of the first test (Test01):

Diagram

Grading Rubric

Your Project01 score will come from two parts: functionaly that is determined by passing the autograder tests and repo and code quality.

  • Tests (80%)
  • Quality (10%)
  • Interactive Grading (10%)

You should plan to complete the project on time. If you don’t complete functionality on team, that is, if you don’t pass all the tests, push what you have working.

Code and Repo Quality

Code

Writing consice, consistent, and well structured code takes practice. Here are things we are looking for in terms of code quality:

  • Consistent formatting of code
    • In our case, we will follow the xv6 formatting conventions
    • You should get uncrustified setup to help with this
  • Consistent indentation
  • Consistent vertical spacing (newlines)
  • Consistent spacing around expressions and statemments
  • Consistent naming of functions and variables
    • We are using Snake Case (e.g., malloc_name())
  • Consistent comments, such as capitalization and spacing
  • No commented out code
  • No redundant code
  • No excessively long functions
    • If a function becomes long, break up the code in to multiple functions
  • Pick easier to understand code over shorter, but more complicated code

Repo

Your repo should be complete in that if we clone it, it should build and run as exepected. However, your repo should not include an extra files or generated files like object files and executables. Please check that your repo is complete by cloning it from GitHub to a new location and run the Autograder. Also, check that you don’t have any extra files.

Autograder Tests

Here is an overview of each of the tests and the type of functionality that is being tested.

Test 01 - Simple allocation

$ memtest m 0 3000 A0 p s c 4096
[USED] name = A0 boff = 0 bhsz = 32 bsz = 3032 doff = 32 dsz = 3000
[FREE] name = NONE boff = 3032 bhsz = 32 bsz = 1064 doff = 3064 dsz = 1032
malloc_summary():
used  : 3000
free  : 1032
over  : 64
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

This test does a single malloc(3000). You need to request at least 4096 (BLOCK_MIN_INCR) amount of heap space using sbrk() then create a USED block followed by a FREE block. You could do this directly like I have shown in class, but I recommend that you first create a large FREE block, then use your code to walk the block list to find a FREE block. The idea is that you only want one piece of code that looks for a FREE block, then splits a found FREE block into a FREE block and a USED block.

Test 02 - Two allocations and no room for a free block

$ memtest m 0 3000 A0 m 1 1000 A1 p s c 4096
[USED] name = A0 boff = 0 bhsz = 32 bsz = 3032 doff = 32 dsz = 3000
[USED] name = A1 boff = 3032 bhsz = 32 bsz = 1064 doff = 3064 dsz = 1032
malloc_summary():
used  : 4032
free  : 0
over  : 64
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

This test is doing two calls to malloc(). However, in the second allocation (A1) there is not enought space to split FREE block into a USED block and a FREE block. That is there is not enough room for sizeof(struct block_hdr) + BLOCK_MIN_SIZE, which is 32 + 4 = 36 bytes. In this case you just give the remaining amout to the second allocation by adding this amount to its size field and you do not create a new FREE block.

Test 03 - Three allocations with a remaining free block

$ memtest m 0 2000 A0 m 1 1000 A1 m 2 500 A2 p s c 4096
[USED] name = A0 boff = 0 bhsz = 32 bsz = 2032 doff = 32 dsz = 2000
[USED] name = A1 boff = 2032 bhsz = 32 bsz = 1032 doff = 2064 dsz = 1000
[USED] name = A2 boff = 3064 bhsz = 32 bsz = 532 doff = 3096 dsz = 500
[FREE] name = NONE boff = 3596 bhsz = 32 bsz = 500 doff = 3628 dsz = 468
malloc_summary():
used  : 3500
free  : 468
over  : 128
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

This test does 3 calls to malloc() and leaves a FREE block at the end of the block list.

Test 04 - Free with no merges

$ memtest m 0 500 A0 m 1 500 A1 m 2 500 A2 m 3 500 A3 f 2 p s c 4096
[USED] name = A0 boff = 0 bhsz = 32 bsz = 532 doff = 32 dsz = 500
[USED] name = A1 boff = 532 bhsz = 32 bsz = 532 doff = 564 dsz = 500
[FREE] name = NONE boff = 1064 bhsz = 32 bsz = 532 doff = 1096 dsz = 500
[USED] name = A3 boff = 1596 bhsz = 32 bsz = 532 doff = 1628 dsz = 500
[FREE] name = NONE boff = 2128 bhsz = 32 bsz = 1968 doff = 2160 dsz = 1936
malloc_summary():
used  : 1500
free  : 2436
over  : 160
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

In this case we are calling free() on A2, which has two USED neighbors. So, in this case you just need to convert the A2 USED block into a FREE block by setting the used field to 0 and changing name to be the empty string. No merging is necessary.

Test 05 - Free with one merge

$ memtest m 0 1000 A0 f 0 p s c 4096
[FREE] name = NONE boff = 0 bhsz = 32 bsz = 4096 doff = 32 dsz = 4064
malloc_summary():
used  : 0
free  : 4064
over  : 32
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

This test does a single allocation following by a free(). In this case the USED block needs to be converted to a FREE block, then the FREE block needs to be merged with the existing FREE block above it. You will need to implement block merging for this test to work.

Test 06 - Free with two merges

$ memtest m 0 500 A0 m 1 500 A1 m 2 500 A2 m 3 500 A3 f 0 f 2 f 1 p s c 4096
[FREE] name = NONE boff = 0 bhsz = 32 bsz = 1596 doff = 32 dsz = 1564
[USED] name = A3 boff = 1596 bhsz = 32 bsz = 532 doff = 1628 dsz = 500
[FREE] name = NONE boff = 2128 bhsz = 32 bsz = 1968 doff = 2160 dsz = 1936
malloc_summary():
used  : 500
free  : 3500
over  : 96
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

This test create a situation in which a USED block is surrounded by two FREE blocks. That is A1 will have a FREE block below and a FREE block above. So, when we free A1, it will require two merges to make one large FREE block by combining both neighbors.

Test 07 - Multiple malloc()s and free()s

$ memtest m 0 1000 A0 m 1 1000 A1 m 2 1000 A2 f 0 f 1 f 2 m 0 1000 A3 m 1 1000 A4 m 2 1000 A5 f 0 f 1 f 2 m 0 1000 A6 m 1 1000 A7 m 2 1000 A8 p s c 4096
[USED] name = A6 boff = 0 bhsz = 32 bsz = 1032 doff = 32 dsz = 1000
[USED] name = A7 boff = 1032 bhsz = 32 bsz = 1032 doff = 1064 dsz = 1000
[USED] name = A8 boff = 2064 bhsz = 32 bsz = 1032 doff = 2096 dsz = 1000
[FREE] name = NONE boff = 3096 bhsz = 32 bsz = 1000 doff = 3128 dsz = 968
malloc_summary():
used  : 3000
free  : 968
over  : 128
u+f+o : 4096
total : 4096
mem_check() passed with heap <= 4096

The goal of this test is to make sure you are properly converting USED blocks into FREE blocks, then reusing these FREE blocks for future malloc() calls. The test makes sure you don’t grow the heap with sbrk() when we don’t need to.

Test 08 - Large malloc() greater than 4096

name = "08"
input = ["python3", "runxv6.py", "memtest m 0 10000 A0 p s c 12288"]
expected = """
$ memtest m 0 10000 A0 p s c 12288
[USED] name = A0 boff = 0 bhsz = 32 bsz = 10032 doff = 32 dsz = 10000
[FREE] name = NONE boff = 10032 bhsz = 32 bsz = 2256 doff = 10064 dsz = 2224
malloc_summary():
used  : 10000
free  : 2224
over  : 64
u+f+o : 12288
total : 12288
mem_check() passed with heap <= 12288

This test makes sure you can request more than 4096 bytes from the kernel with sbrk(). That is, you need to request a multiple of 4096 that can satisify the malloc() request.

Test 09 - Merging last FREE block with sbrk() memory


$ memtest m 0 3000 a0 m 1 5000 a1 p s c 8192
[USED] name = a0 boff = 0 bhsz = 32 bsz = 3032 doff = 32 dsz = 3000
[USED] name = a1 boff = 3032 bhsz = 32 bsz = 5032 doff = 3064 dsz = 5000
[FREE] name = NONE boff = 8064 bhsz = 32 bsz = 128 doff = 8096 dsz = 96
malloc_summary():
used  : 8000
free  : 96
over  : 96
u+f+o : 8192
total : 8192
mem_check() passed with heap <= 8192

In this case, there is not enough space in the heap for the second malloc(). However, you don’t just want to increase the heap by a multiple of 4096. You need to only increase the heap by the amount needed if we combine the last FREE block with a new FREE block from extending the heap. That is, this test make sure you don’t allocate more heap space than needed for the request.

Test 10 - Large malloc()s and free()s

$ memtest m 0 5000 A0 m 1 5000 A1 m 2 5000 A2 m 3 5000 A3 f 2 f 3 p s c 20480
[USED] name = A0 boff = 0 bhsz = 32 bsz = 5032 doff = 32 dsz = 5000
[USED] name = A1 boff = 5032 bhsz = 32 bsz = 5032 doff = 5064 dsz = 5000
[FREE] name = NONE boff = 10064 bhsz = 32 bsz = 10416 doff = 10096 dsz = 10384
malloc_summary():
used  : 10000
free  : 10384
over  : 96
u+f+o : 20480
total : 20480
mem_check() passed with heap <= 20480

This test does 4 large (greater than 4096) malloc() calls and two free() calls to make sure you are managing the heap (block list) properly with larger requests.