Assignment 5: Memory-Mapped Encrypted Files
In this assignment you will implement memory-mapped access to data that is stored in encrypted files. Here's how this will work:
- A region of the virtual address space of a process will be allocated, large enough to hold the contents of a file.
- The file won't initially be read into memory, and there won't even be physical pages assigned to this region.
- If the process attempts to read or write from the file's region in virtual memory, page faults will occur.
- Your code will catch the page faults, allocate physical pages, and read in the file. Only the pages of the file that are actually accessed will be read into memory. Once a page has been loaded into memory, the process will be able to access data in that page without additional faults.
- The process can also modify the file's bytes in memory. When this happens, your code will remember which pages are dirty and write those pages back to memory when the memory-mapped file is closed (or "flushed"). Only pages that were actually modified will be written.
- As an additional twist, the data in the file is stored in encrypted form to thwart eavesdroppers. Information is decrypted when read into memory and re-encrypted when flushed back to disk.
Normally, code like this would be written
inside the operating system kernel. For this assignment, we have used the
Linux mmap facility to emulate features that are typically available
in the kernel (such as page faults), so you can write your code at user level.
The goal of this assignment is to teach you
about the following concepts:
- How demand paging works
- How page protection can be used to emulate hardware features such as dirty bits
Getting Started
Login to the myth cluster and clone the starter repo with this command:
git clone /afs/ir/class/archive/cs/cs111/cs111.1246/repos/assign5/$USER assign5
This will create a new directory assign5 in your current
directory and clone a Git starter repository into that
directory. Do your work for the assignment
in this directory.
The files mcryptfile.hh and mcryptfile.cc contain a skeleton for all
of the code you need to write. You will add to the declarations in
mcryptfile.hh and fill in the bodies of the methods in
mcryptfile.cc to implement
the facilities described below. You can also create additional methods or
classes as needed to implement your solution.
The directory also contains a Makefile; if you type make, it
will compile your code for both problems together with
a test program test.cc, producing an executable file
test. You can invoke ./test with an argument giving the
name of a test to run (invoke it with no arguments to see a list
of available tests). The same test program is used for both this
assignment and Assignment 6; this assigment will use only the
tests up through update_three_files.
You can also invoke the command tools/sanitycheck to run a series of basic
tests on your solution.
Try this now: the starter code should compile but almost all the tests will
fail.
As usual, we do not guarantee that the tests we have provided are exhaustive, so passing all of the tests is not necessarily sufficient to ensure a perfect score (CAs may discover other problems in reading through your code).
Assignment Overview
You will implement a class MCryptFile that provides the following
methods:
-
MCryptFile(Key key, std::string path)
Constructs anMCryptFileobject that can be used to access data in the file given bypath. The file is encrypted. When data is read from the file into memory, it will be decrypted usingkey; when data is written from memory back to the file, it will be encrypted usingkey.Keyobjects can be constructed from strings. If the file doesn't currently exist, a new file will be created. Throws astd::system_errorexception if the file cannot be opened or created. Most of the functionality of this constructor (such as throwing exceptions) is actually implemented in theCryptFilesuperclass constructor described below. -
char *map_file(size_t min_size = page_size)
Maps the associated file into a contiguous region of virtual memory and returns the virtual address of the first byte of that region. Byteiof the decrypted file contents will henceforth be directly accessible at offsetiinto the region. Themin_sizeargument can be used to create a new file or grow an existing file. The actual size of the virtual region will be the larger ofmin_sizeand the original file length; if bytes are written in the region beyond the original file length, the file will be extended whenflush_fileis called. If you want to grow a file beyond the current size of the region, invokeunmap_fileto unmap the file, then invokemap_fileto map it again with a larger size. Note: file sizes are always rounded up to page boundaries. This method does not allocate physical memory for the region or read information in from the file; that should happen later, on demand, as pages in the region are accessed. -
void flush_file()
Encrypts all pages that have been modified since the last call toflush_fileand writes them back to the associated file. All pages currently in memory should remain in memory. This operation may grow the size of the encrypted file in the file system. -
void unmap_file()
Flush any dirty pages and remove the mapping created bymap. After this method returns, the caller must no longer use any references into the previously mapped region. -
char *map_base()
Returns the address of the first byte of the memory mapped file, ornullptrif the associated file is not currently mapped. -
size_t map_size()
Returns the current size of the mapped region, or 0 if the associated file is not currently mapped. -
static void set_memory_size(size_t npages)
Invoked to specify how many pages should be in the pool of physical memory that is shared by allMCryptFileobjects. This method should only be invoked before the firstMCryptFileis created; it will have no effect after that. The test infrastructure will invoke this method as appropriate for the tests being run.
Supporting Code
We have written several classes for you to use in implementing MCryptFile:
CryptFile
The CryptFile class provides basic mechanisms for reading and writing
encrypted files (but not for memory-mapping them). Your MCryptFile class
will be a subclass of CryptFile.
The CryptFile class has the following methods:
-
CryptFile(Key key, std::string path)
The API for this method is identical to that for theMCryptFileconstructor (see above). -
size_t file_size()
Returns the number of bytes in the file (which is the same as the number of bytes required to hold the decrypted file in memory). -
int aligned_pread(void *dst, size_t len, size_t offset)
Reads information from the file into memory. More precisely, readslenbytes of data at positionoffsetin the file, decrypts it, and stores the unencrypted information atdst. Bothlenandoffsetmust be multiples of the AES encryption algorithm's block size (16), which is accessible viaCrypteFile::blocksize. The block size will not be an issue for this assignment, because you will only be reading and writing full pages aligned on page-size boundaries. Returns the number of bytes read or -1 on error. -
int aligned_pwrite(const void *src, size_t len, size_t offset)
Write information from memory to the associated file. Encryptslenbytes starting atsrcand writes them to positionoffsetin the associated file at positionoffset. Returnslenon success and -1 on error. Bothlenandoffsetmust be multiples ofCryptFile::blocksize.
VMRegion
The VMRegion class provides basic mechanisms for mapping pages into
a region of virtual memory, taking page faults, and managing permissions.
A VMRegion corresponds roughly to a contiguous range of page map
entries for one process in an operating system.
You will create one VMRegion object for each mapped MCryptFile.
The VMRegion class is defined in the header file vm.hh and
has the following methods:
-
VMRegion(size_t nbytes, std::function<void(char *fault_address)> handler)
Constructs aVMRegionobject and allocates a region of virtual memory in the process that is currently unused. The size of the region will benbytes; ifnbytesisn't a multiple of the page size, theVMRegionwill behave as ifnbyteswere rounded up to the next higher multiple of the page size. Ifnbytesis zero, it will be rounded up to one full page. Thehandlerparameter specifies a function that will be invoked whenever a page fault occurs in the region. Page faults occur whenever an unmapped page is referenced or an attempt is made to write a page that is currently read-only.handleris invoked with a single argument,fault_address, giving the virtual address that triggered the page fault. -
VPage get_base()
Returns the address of the first page in the virtual region.VPageis a type that refers to the first byte of a virtual page in aVMRegion. It is equivalent tochar *, so you can add an offset to it to get the address of a value in the middle of a page. You can also add multiples of the page size to the value returned byget_baseto produceVPages for other pages in the virtual region. -
size_t get_size()
Returns the total number of bytes in the virtual region. -
void map_page(VPage va, PPage pa, Prot prot)
Sets the mapping for a particular VPage inside a VMRegion, so that accesses to that page will be directed topa(PPages are obtained using thePhysMemclass discussed below). If a different page was previously mapped atVPage, the old mapping is removed. Theprotargument specifies what sort of accesses are allowed; it should be eitherPROT_NONEto prohibit both loads and stores,PROT_READto allow loads but not stores, orPROT_READ|PROT_WRITE(bitwise OR of two values) to allow both loads and stores. This function's behavior is equivalent to setting the contents of a page map entry. -
void unmap_page(VPage va)
Removes the mapping forva, if there is one; future references to the page will cause page faults. This function's behavior is roughly equivalent to clearing thepresentbit in a page map entry.
In addition to these methods, VMRegion also exports a variable
page_size, which contains the number of bytes in each page on
the current machine. Page sizes are 4096 bytes on the myth cluster
as well as Windows or MacBook laptops, but you should use the
page_size variable to ensure portability; you can assume
page_size will always be an even power of two.
PhysMem and PPages
The PhysMem class provides a mechanism for allocating and freeing
pages of physical memory. Each physical page is identified with a
PPage object, which you can pass to methods such as VMRegion::map_page
and VMRegion::unmap_page.
In addition, a PPage is a valid virtual address (it is a char *
pointer), which you can use to access the bytes of the page.
Unlike VPages, a PPage is always accessible and writable; references
to it will never generate page faults.
This is useful because it allows your MCryptFile implementation
to access physical pages that are not currently be mapped as VPages,
such as when transferring page contents to or from encrypted files.
PPages are mapped into virtual memory by the PhysMem class,
at virtual addresses different from those in VMRegions.
This is an example of aliasing, where a single physical page
appears at multiple virtual addresses. The first alias for each
physical page is its PPage; the second alias is the corresponding
VPage (if the page has been mapped). In principle you could map a
single PPage as multiple different VPages, but we won't do that
for this assignment: there will be only one VPage per PPage.
The PhysMem class is defined in the header file vm.hh and
has the following methods:
-
PhysMem(size_t npages)
Allocatesnpagesphysical memory pages, each of which may be mapped into anyVMRegion. -
PPage page_alloc()
Allocates a page and returns itsPPage, ornullptrif there are no free pages. -
void page_free(PPage p)
Returnspto the free page pool for thisPhysMem. The caller must ensure that this page is not mapped in anyVMRegion. -
size_t npages()
Returns the total number of pages in this object (free or allocated). -
size_t nfree()
Returns the number of pages that are not currently allocated. -
PPage pool_base()
Returns the address of the first page in the pool (the pages in the pool occupy a range of contiguous PPage addresses).
Exercise 1: page_fault and map_page
(You will go through most of this exercise in section with your CA)
The file page_fault.cc contains a simple program that illustrates
how to create a VMRegion and then take a page fault in the region,
but it doesn't actually allocate physical memory or set up a
virtual-to-physical mapping.
The file map_page.cc adds functionality to allocate a physical
page when the page fault occurs and map it into the VMRegion, so
that memory accesses to the VMRegion can complete.
Read through the code of both files to familiarize yourself with them, then run the programs and observe their output:
make
./page_fault
./map_page
Once you have run the programs, answer the questions in questions.txt.
The fault_handler function in map_page is currently a bit hacky,
in that it uses region.get_base() to determine the VPage at which
to map the PPage. This only works because the VMRegion in this example
contains only a single page. A better approach is to compute the VPage
from the faulting address: this is just the first byte of the page
containing fault_addr (in general, fault_addr could point anywhere
in a page, but the VPage must refer to the first byte of the page).
This change would allow the fault handler to work with
regions containing multiple pages. Modify map_page.cc so that fault_handler
computes a VPage rather than calling region.get_base() and
make sure that the program still runs.
C++ Proficiency: Deleting While Iterating
At some point in this assignment you will need to scan a C++ container and
delete some of the entries in it. This is tricky in C++ because object
deletion is not generally safe while iterating. For example, suppose
you try to iterate over an std::unordered_map and delete some of its
entries using code like this:
std::unordered_map<int, Foo*> foo_map;
for (auto it = foo_map.begin(); it != foo_map.end(); ++it) {
if (...) {
foo_map.erase(it);
}
}
This code is unsafe, because deleting the element leaves the iterator
it in an undefined state; bad things will happen if you keep
using that iterator.
However, the following code is safe:
std::unordered_map<int, Foo*> foo_map;
for (auto it = foo_map.begin(); it != foo_map.end(); ) {
if (...) {
it = foo_map.erase(it);
} else {
++it;
}
}
The erase method returns a new iterator that refers to the next element
of the map after the deleted one, so it is safe to continue iterating.
Notice in this case that the for statement no longer increments
it: that happens only if the element isn't deleted.
Implementation Milestones
Milestone 1: Faults Forever
Implement the MCryptFile constructor and map_file method. You will
need to create a VMRegion object in the map_file method to manage the
virtual addresses for this mapped file. We have defined a skeleton page
fault handler function fault_handler in the MCryptFile class,
which you should pass to the VMRegion constructor as the handler argument.
In the starter repo, fault_handler just prints the virtual address that
caused the page fault (you will replace this body in a later milestone).
In addition, you should implement the map_base and map_size methods.
Be sure to change the return value of the map_file method (it should not
return nullptr). The map_size test should now pass.
Now run the read test (./test read). You should see that the file is
successfully mapped, but the test will hang because fault_handler
doesn't actually make the page accessible; thus page faults will
happen repeatedly (as soon as fault_handler returns, the
application retries the faulting instruction, which causes another
page fault).
Milestone 2: Map Pages
Create enough new functionality to load pages into memory on demand.
First, add code to allocate a PhysMem object during the first
call to map_file (don't allocate the PhysMem until the first call
to map_file). A single PhysMem will be shared across all MCryptFiles and
used to allocate physical memory pages, just as an operating system
uses a single physical memory to allocate pages for all processes.
MCryptFile::set_memory_size may have been invoked to specify
how large the pool of physical memory should be. If
MCryptFile::set_memory_size has not been invoked by the time
the PhysMem is created, use 1000 pages for the PhysMem object.
Then replace the code in fault_handler with code to allocate a
physical page, fill it with the appropriate information from the
file, and make it accessible at the correct virtual address.
If physical memory runs out, PhysMem::page_alloc will return
an error; if this happens you can print an error message and
exit the application (once you implement page replacement in Assignment
6 this error should never occur).
For now, set the permissions
on each page to be PROT_READ|PROT_WRITE. If you run the read test
again, you should see that 3 pages are successfully read, but the test
will generate an error because pages are not being unmapped.
Milestone 3: Supplemental Page Map, Destructor, and Unmap
Implement the destructor and the unmap_file method. In order to do this, you
will need to unmap all of the VPages that have been mapped for
that file and return their PPages to the PhysMem. This will
require you to define an additional data
structure called a supplemental page map. The supplemental page
map will provide information for each VPage, such as whether
it is mapped and, if so, the associated PPage. As you work through the
assignment you'll discover other information that needs to be
stored in the supplemental page map. It's up to you to determine
the structure of the supplemental page map; it should be implemented
so that you can easily look up the information for a page given
its VPage (such as when a page fault occurs for that page).
Once you have an initial implementation of the supplemental page map,
you should be able to implement the destructor and the unmap_file
method. At this point, the read test will pass except for a mismatch
in protections (this will be fixed in Milestone 5).
Milestone 4: Flush
Implement the flush_file method. For now, flush all pages in physical memory that
belong to the MCryptFile without considering whether they are dirty.
Remember to flush in the unmap_file method.
All the tests should now complete, but you will get errors because
pages that aren't dirty are getting written back to the file.
Milestone 5: Tracking Dirty Pages
Make flush_file more efficient by keeping track of the dirty pages
and only writing dirty pages back to the file (clean pages should
not be written back).
Paging hardware usually provides a "dirty" bit in page map entries,
which is set by the hardware when a page is written.
Unfortunately, this information is not passed through by the mmap
mechanism we are using for this assignment, so you will have to
use clever software to emulate a dirty bit for each page.
You can use page protections for this:
if a page's protection is set to PROT_READ, then a page fault will
occur the first time the page is written (and a page fault will not
occur unless the page is written). Given this, you should be able
to emulate dirty bits for each page. Use the emulated dirty bits to avoid
writing back clean pages during flushes.
Page faults may now happen multiple times on the same virtual page, but you should only read each page from the file once (unless the file is unmapped and re-mapped).
Your dirty bit should behave like a dirty bit in a page map, i.e.
it should be reset when pages are flushed (and therefore no longer
dirty). Since you are notified about writes by page faults, this
implies something about the permissions after flush_file is called.
At this point all of the tests should pass. Congratulations!
Milestone 6: Odds and Ends
If you haven't already done so, implement set_memory_size. Also,
go over the Miscellaneous Notes below and implement anything else
that is needed.
Miscellaneous Notes
-
For this assignment you may assume that there are enough physical pages to accommodate all of the pages in all of the open
MCryptFiles: you need not worry about page replacement. You can exit the program (with an informative message, of course) ifPhysMem::page_allocreturnsnullptr. -
Load pages into memory on demand, so that no memory is wasted on pages that are never accessed. You should only read pages from disk when responding to page faults.
-
You may assume that your code is used only in single-threaded environments; you do not need to worry about synchronization for this assignment.
-
Your solution must support multiple
MCryptFileobjects mapped at the same time, with oneVMRegionperMCryptFile. All of theMCryptFiles must share the samePhysMem. -
If you use
gdbto debug your assignment, you will notice that it catches the SIGSEGV signals used to signal page faults and stops the application before it can handle those page faults. If you typecontinuethen the signal will be transmitted to your application so the page fault will be handled. If you get tired of typingcontinueyou can changegdb's behavior with the following command:handle SIGSEGV noprint nostop passThe arguments indicate that, when SIGSEGV signals occur, they should be passed to the application;
gdbwill not stop the application or print any indication that the signal occurred. You may find other combinations of argument values convenient as well.
Submitting Your Work
Once you are finished working and have saved all your changes, submit by
running tools/submit. Make sure that you have answered the
questions in questions.txt before submitting.
We recommend you do a trial submission in advance of the deadline to allow time to work through any snags. You may submit as many times as you like; we will grade the latest submission. Submitting a stable but unpolished/unfinished version is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, the earlier submit will earn points. Without a submission, we cannot grade your work. You can confirm the timestamp of your latest submission in your course gradebook.
Grading
Here is a recap of the work that will be graded on this assignment:
questions.txt: answer all of the questions.map_page.cc: modifyfault_handleras described in Exercise 1.mcryptfile.hhandmcryptfile.cc: flesh out theMCryptFileclass.
We will grade your code using the provided sanity check tests and possible additional autograder tests. We will also review your code for additional errors as well as style and complexity. Check out our course style guide for tips and guidelines for writing code with good style!