Lab: page tables

该实验涉及对页表的初步认识，页表为虚拟化提供了必要的硬件支持，是实现隔离不可或缺的一环。本次实验需要对 PTE 的结构、内核为进程分配物理内存、填充页表实现地址映射的过程所有了解。

Speed up system calls

When each process is created, map one read-only page at USYSCALL (a virtual address defined in memlayout.h). At the start of this page, store a struct usyscall (also defined in memlayout.h), and initialize it to store the PID of the current process. For this lab, ugetpid() has been provided on the userspace side and will automatically use the USYSCALL mapping. You will receive full credit for this part of the lab if the ugetpid test case passes when running pgtbltest.

这部分虽然被标注为 easy 难度，但实现过程中其实有很多细节需要花费时间思考。在何时为共享页(USYSCALL) 分配物理内存？何时将其映射至页表？映射时的权限如何？何时将其释放？内核对于内存的处理应该十分严谨，任意一处的内存未及时释放都会造成十分恶劣的影响。
Tips: USYSCALL 与 TRAPFRAME 在某些特性上十分相似，可以参照对 trapframe 的处理完成实验。

首先是进程创建 allocproc 函数：

static struct proc*
allocproc(void)
{
    ...
found:
  p->pid = allocpid();
  p->state = USED;

  // Allocate a trapframe page.
  if((p->trapframe = (struct trapframe *)kalloc()) == 0){
    freeproc(p);
    release(&p->lock);
    return 0;
  }

  // Allocate a usyscall page.
  if((p->usyscall = (struct usyscall *)kalloc()) == 0){
    freeproc(p);
    release(&p->lock);
    return 0;
  }

  // An empty user page table.
  p->pagetable = proc_pagetable(p);
  if(p->pagetable == 0){
    freeproc(p);
    release(&p->lock);
    return 0;
  }

  // Set up new context to start executing at forkret,
  // which returns to user space.
  memset(&p->context, 0, sizeof(p->context));
  p->context.ra = (uint64)forkret;
  p->context.sp = p->kstack + PGSIZE;

  p->usyscall->pid = p->pid;

  return p;
}

在分配进程时，使用 kalloc 为 usyscall 分配一页物理内存，在这里得到 pid 后就可以将其存入 usyscall->pid 中。可以看看 kalloc 和 kfree 的内部实现，加深对内存管理的理解。

进程创建过程中出现问题或释放进程时，需调用 free_proc 函数，这里涉及到 usyscall 的释放：

static void
freeproc(struct proc *p)
{
  if(p->trapframe)
    kfree((void*)p->trapframe);
  p->trapframe = 0;
  if(p->usyscall)
    kfree((void*)p->usyscall);
  p->usyscall = 0;
  if(p->pagetable)
    proc_freepagetable(p->pagetable, p->sz);
  p->pagetable = 0;
    ...
}

进一步的，涉及到填入及释放页表：

// Create a user page table for a given process, with no user memory,
// but with trampoline and trapframe pages.
pagetable_t
proc_pagetable(struct proc *p)
{
    ...
  if(mappages(pagetable, TRAPFRAME, PGSIZE,
              (uint64)(p->trapframe), PTE_R | PTE_W) < 0){
    uvmunmap(pagetable, TRAMPOLINE, 1, 0);
    uvmfree(pagetable, 0);
    return 0;
  }

  // map the usyscall page below the trapframe.
  // user program can read it, but not write it.
  if(mappages(pagetable, USYSCALL, PGSIZE,
              (uint64)(p->usyscall), PTE_R | PTE_U) < 0){
    uvmunmap(pagetable, TRAPFRAME, 1, 0);
    uvmunmap(pagetable, TRAMPOLINE, 1, 0);
    uvmfree(pagetable, 0);
    return 0;
  }

  return pagetable;
}

// Free a process's page table, and free the
// physical memory it refers to.
void
proc_freepagetable(pagetable_t pagetable, uint64 sz)
{
  uvmunmap(pagetable, TRAMPOLINE, 1, 0);
  uvmunmap(pagetable, TRAPFRAME, 1, 0);
  uvmunmap(pagetable, USYSCALL, 1, 0);
  uvmfree(pagetable, sz);
}

问题设置的也比较有趣：
Which other xv6 system call(s) could be made faster using this shared page? Explain how.
这是我的回答：

Any system call need to read unsensitive data from kernel space to user space could be made faster using the shared page.
This would reduce the overhead of copying data between user and kernel space.
For example, the pgaccess system call need to copy mask from kernel to user space,
so it could added into the defination of usyscall, stored in the shared page USYSCALL.

Print a page table

Define a function called vmprint(). It should take a pagetable_t argument, and print that pagetable in the format described below. Insert if(p->pid==1) vmprint(p->pagetable) in exec.c just before the return argc, to print the first process’s page table. You receive full credit for this part of the lab if you pass the pte printout test of make grade.

这部分比较无聊，懒得写了。

Detect which pages have been accessed

Your job is to implement pgaccess(), a system call that reports which pages have been accessed. The system call takes three arguments. First, it takes the starting virtual address of the first user page to check. Second, it takes the number of pages to check. Finally, it takes a user address to a buffer to store the results into a bitmask (a datastructure that uses one bit per page and where the first page corresponds to the least significant bit). You will receive full credit for this part of the lab if the pgaccess test case passes when running pgtbltest.

添加系统调用的过程省略，我假设最多检查 32 页，关键函数实现如下：

int
sys_pgaccess(void)
{
    ...
  uint mask = checkaccess(myproc()->pagetable, PGROUNDDOWN(addr), n);
  copyout(myproc()->pagetable, buf_addr, (char*)&mask, sizeof(mask));
    ...
}

// Inspect the access bits for n pages from virtual address(n <= 32)
// Return the mask of access bits while clearing the access bits
uint checkaccess(pagetable_t pagetable, uint64 addr, int num)
{
    if (num > 32)
        panic("pgaccess: num > 32");

    uint mask = 0;

    for(int i = 0; i < num; i++) {
        pte_t *pte = walk(pagetable, addr + i * PGSIZE, 0);

        if (*pte) {
            mask |= (*pte & PTE_A) ? (1 << i) : 0;
            *pte &= ~PTE_A; // clear the access bit
        }
    }

    return mask;
}

感觉…没什么难点(不如编译原理实验)，可能是过太久时间忘了，看来 blog 还得写完实验立即写，但实验实在是太多了(哭)。

另外，完成每项实验要求固然重要，但将更多时间花在实验要求外的思考上更加有意义。如何做？都可以怎么做？这些实现思路对当下及未来的影响会是怎样的？这值得我们去思考。