{"componentChunkName":"component---src-templates-md-page-js","path":"/proj_mcrypt/","result":{"data":{"markdownRemark":{"rawMarkdownBody":"\nIn this project you will implement memory-mapped access to data that\nis stored in encrypted files. Normally, code like this would be written\ninside the operating system kernel; for this project, we have used the\nLinux `mmap` facility to simulate features that are typically available\nin the kernel, so you can write your code at user level.\nThe goal of this project is to teach you\nabout the following concepts:\n\n* How demand paging works\n* How page protection can be used to emulate hardware features such\n  as dirty bits\n* How to use _intrusive containers_, in contrast to the non-intrusive\n  containers found in the C++ library.\n\n## Project Overview\nYou will implement a class `MCryptFile` that provides the following\nmethods:\n\n* `MCryptFile(Key key, std::string path)`<br />\n  Constructs an `MCryptFile` object that can be used to access data in\n  the file given by `path`. The file is encrypted. When data is read\n  from the file into memory, it will be decrypted using `key`; when\n  data is written from memory back to the file, it will be encrypted\n  using `key`. `Key` objects can be constructed from strings.\n  If the file doesn't currently exist, a new file will\n  be created. Throws `std::system_error` if the file cannot be opened\n  or created. Most of the functionality of this constructor is actually\n  implemented in the `CryptFile` superclass constructor (see below).\n\n* `char *map(std::size_t min_size = 0)`<br />\n  Maps the associated file into a contiguous region\n  of virtual memory and returns the virtual address of the first byte\n  of that region. Byte `i` of the file will henceforth be directly\n  accessible at offset `i` into the region (in plaintext).\n  The `min_size` argument can be used to create a new file\n  or grow an existing file. The actual size of the virtual region\n  will be the larger of `min_size` and the original file length;\n  if bytes are written in the region beyond the original file length,\n  the file will be extended when `flush` is called.\n  If you want to grow a file beyond the current size of the region,\n  invoke `unmap` to unmap the region, then invoke `map` to map it\n  again with a larger size. Note: file sizes are always rounded up\n  to full-page boundaries.\n\n* `void flush()`<br />\n  Encrypt all pages that have been modified since the last call to\n  `flush` and write them back to the associated file. All pages\n  currently in memory should remain in memory. This operation\n  may grow the size of the encrypted file in the file system.\n\n* `void unmap()`<br />\n  Sync any dirty pages and remove the mapping created by `map`. After\n  this method returns, the caller must no longer use any references\n  into the previously mapped region.\n\n* `char *map_base()`<br />\n  Returns the address of the first byte of the memory mapped file, or\n  NULL if the associated file is not currently mapped.\n\n* `std::size_t map_size()`<br />\n  Returns the current size of the mapped region, or 0 if the associated\n  file is not currently mapped.\n\n* `static void set_memory_size(std::size_t npages)`<br />\n  Invoked to specify how many pages should be in the pool of physical\n  memory that is shared by all `MCryptFile` objects. This method should\n  only be invoked before the first `MCryptFile` is created; it will\n  have no effect after that.\n\n## Supporting Code\nWe have written several classes for you to use in implementing MCryptFile:\n\n### CryptFile\nThe `CryptFile` class provides basic mechanisms for reading and writing\nencrypted files (but not for memory-mapping them). Your `MCryptFile` class\nshould be a subclass of `CryptFile`.\nThe `CryptFile` class has the following methods:\n\n* `CryptFile(Key key, std::string path)`<br />\n  The API for this method is identical to that for the `MCryptFile`\n  constructor (see above).\n\n* `int aligned_pread(void *dst, std::size_t len, std::size_t offset)`<br />\n\n\tReads and decrypts `len` bytes of data at position `offset` in the\n  file.  Both `len` and `offset` must be multiples of the AES\n  encryption algorithm's block size (16), which is accessible via\n  `CrypteFile::blocksize`.\n  The block size will not be an issue for this project, because you will\n  only be reading and writing full pages\n  aligned on page-size boundaries.  Returns the number of bytes read\n  or -1 on error.\n\n* `int aligned_pwrite(const void *src, std::size_t len, std::size_t offset)`<br />\n\n\tEncrypts `len` bytes starting at `src` and writes them to the\n  associated file at position `offset`.\n  Returns `len` on success and -1 on error.\n  Both `len` and `offset` must be multiples of `CryptFile::blocksize`.\n\n### VMRegion\nThe `VMRegion` class provides basic mechanisms for mapping pages into\na region of virtual memory, taking page faults, and managing permissions.\nA `VMRegion` corresponds roughly to a contiguous range of page table\nentries for one process in an operating system.\nYou will create one `VMRegion` object for each mapped MCryptFile.\nThe `VMRegion` class is defined in the header file `vm.hh` and\nhas the following methods:\n\n* `VMRegion(std::size_t nbytes, std::function<void(char *)> handler)`<br />\n  Constructs a `VMRegion` object and allocates a region of virtual\n  memory in the process that is currently unused.\n  The size of the region will be `nbytes`; if `nbytes` isn't a multiple\n  of the page size, the `VMRegion` will behave as if\n  `nbytes` were rounded up to the next higher multiple of the page size.\n  The `handler` parameter specifies a function to invoke for page faults\n  in the region. Page faults occur whenever an unmapped page is referenced\n  or an attempt is made to write a page that is currently read-only.\n  `handler` is invoked with a single argument giving the address of\n  the address that triggered the page fault.\n\n* `VPage get_base()`<br />\n  Returns the address of the first page in the virtual region.\n  `VPage` is a type that refers to the first byte of a virtual\n  page in a `VMRegion`. It is equivalent to `char *`, so you can add\n  an offset\n  to it to get the address of a value in the middle of a page.\n  You can also add multiples of the page size to the value returned\n  by `get_base` to produce `VPage`s for other pages in the virtual region.\n\n* `static void map(VPage va, PPage pa, Prot prot)`<br />\n  Sets the mapping for a particular VPage inside a VMRegion, so that\n  accesses to that page will be directed to `pa` (`PPage`s are\n  obtained using the `PhysMem` class discussed below). If a\n  different page was previously mapped at `VPage`, the old mapping\n  is removed.\n  The `prot` argument specifies what sort of accesses are allowed;\n  it should be either `PROT_NONE` to prohibit both loads and stores,\n  `PROT_READ` to allow loads but not stores, or `PROT_READ|PROT_WRITE`\n  (bitwise OR of two values) to allow both loads and stores.\n  This function's behavior is equivalent to setting the contents of\n  a page table entry.\n\n* `static void unmap(VPage va)}`<br />\n  Removes the mapping for `va`, if there is one; future references to\n  the page will cause page faults. This function's behavior is roughly\n  equivalent to clearing the `present` but in a page table entry.\n\nIn addition to these methods, `VMRegion` also exports a variable\n`page_size`, which contains the number of bytes in each page on\nthe current machine. Page sizes are 4096 bytes on the `myth` cluster\nas well as Windows or MacBook laptops, but you should use the\n`page_size` variable to ensure portability; you can assume\n`page_size` always be an even power of two.\n\n### PhysMem and PPages\nThe `PhysMem` class provides a mechanism for allocating and freeing\npages of physical memory. It implements `PPage` objects. A `PPage` is a token\nfor a physical page, which you can pass to `VMRegion::map`.\nIn addition, you can use a `PPage` to access the bytes\nof the page: a `PPage` is actually a `char *` pointer referring\nto the first byte of the page, and you can add offsets to this to\naccess other values in the page.\nUnlike `VPage`s, a `PPage` is always accessible and writable; references\nto it will never generate page faults.\n`PPage`s are useful because they allow your `MCryptFile` implementation\nto access physical pages that may not currently be mapped, such as when\ntransferring page contents to or from encrypted files.\n`PPage`s are mapped into virtual memory by the `PhysMem` class,\nat virtual addresses different from those in `VMRegion`s.\nThis is an example of *aliasing*, where the same physical page can appear\nat multiple virtual addresses.\n\nThe `PhysMem` class is defined in the header file `vm.hh` and\nhas the following methods:\n\n* `PhysMem(std::size_t npages)`<br />\n  Allocates `npages` physical memory pages, each of which may be\n  mapped into any `VMRegion`.\n\n* `PPage page_alloc()`<br />\n  Allocates a page and returns its `PPage`, or `nullptr` if there\n  are no free pages.\n\n* `void page_free(PPage p)`<br />\n  Returns `p` to the free page pool for this `PhysMem`. The caller\n  must ensure that this page is not mapped in any `VMRegion`.\n\n* `std::size_t npages()`<br />\n  Returns the total number of pages in this object (free or allocated).\n\n* `std::size_t nfree()`<br />\n  Returns the number of pages that are not currently allocated.\n\n* `PPage pool_base()`<br />\n  Returns the address of the first page in the pool (the pages in\n  the pool occupy a range of contiguous PPage addresses).\n\n## Intrusive Containers\n\nAn _intrusive container_ is a data structure such as a list or a map\nthat contains data structures that were specifically designed to be\nincluded in the contained objects &mdash; the container \"intrudes\" into the design\nof the element type.  By contrast, a _non-intrusive container_ can\ncontain any type meeting some basic requirements (such as being able\nto be moved).  Because non-intrusive containers are more general, the\nC++ standard template library (STL) provides only non-intrusive\ncontainers.  For example, you can create a `std::list<int>`, even\nthough integers were not designed to be on a list and don't contain a\n\"next pointer.\"\n\nThere are several reasons that operating systems often make use of\nintrusive data structures.  One is that intrusive data structures\nrequire no memory allocation:  all the links to insert an object in a\nlist or map are already members of the object.  By contrast, when\ninserting an element in a `std::list<int>`, the library must\ndynamically allocate memory for an object that contains the `int` as\nwell as some next and previous pointers.  Because dynamic memory\nallocation can fail, there are places in the operating system that\ncannot or should not be dynamically allocating memory, and hence\ncannot manipulate non-intrusive containers.\n\nA second benefit of intrusive data structures is that the items are\nthe iterators; this simplifies some operations.\nFor example, suppose an object has been inserted in a list and\nyou wish to remove the object from that list.\nIf the list is nonintrusive and all you have is a pointer to\nthe object, you must first get an iterator to the object;\nthis will require searching list, which is expensive.\nWith an intrusive list, the object is the iterator,\nso unlinking it can be done without searching the list.\n\nWe would like you to use intrusive containers in this project, and\nwe have provided you an intrusive list in\n`ilist.hh` and an intrusive map in `itree.hh`.  To use these\ncontainers, your structures must include a field of type `ilist_entry`\nor `itree_entry`, respectively.  The type of the container specifies a\nmember pointer to this field, rather than just the type of the\nstructure.  For example:\n\n~~~ {.cc}\nstruct MyStruct {\n    int val_;\n    ilist_entry list_link_;\n    itree_entry itree_link_;\n};\nilist<&MyStruct::list_link_> my_list;\nitree<&MyStruct::val_, &MyStruct::tree_link_> my_map;\n~~~\n\nSince the items are the iterators, most methods return pointers rather\nthan references, where `nullptr` indicates a missing item (e.g., the\nend of a list).  When you delete an object, the destructors of any\n\"entry\" fields automatically remove the object from any containers it\nmight be in.  (The flip side of this is that it is an error to delete\na container that still contains elements.)\n\nTo make your life easier, we designed `ilist` and `itree` to resemble\nSTL containers.\nIn particular, `ilist` has the following methods, which behave just\nlike the corresponding methods in `std::list`: `back`, `begin`,\n`delete_all`, `empty`, `end`, `front`, `insert`, `pop_back`,\n`pop_front`, `push_back`, `push_front`, `remove`, and `remove_all`.\n`itree` objects have the following methods, which behave just like\nthe corresponding methods in `std::map`: `begin`,\n`delete_all`, `empty`, `end`, `find`, `insert`, `lower_bound`,\n`operator[]`, `pop_back`, `pop_front`, `push_back`, `push_front`,\n`remove`, `remove_all`, and `upper_bound`.\n\nThere are also a few differences.  In particular, since\nthe items are the iterators, there is no real `end()` iterator.\nThere's a method `end()` that returns `nullptr`, just to make\nrange-for syntax work, but you can't walk an `end()` iterator\nbackwards to find the last item.  Use `back()` to get the last item in\nan `ilist` and `max()` to get the greatest element in an `itree`.\n`itree` also provides `min()` to get the smallest element.\n\n## Error handling\n\nIn general you should use exceptions to handle errors.  It makes code\nmore readable by segregating the error handling.  More importantly, it\nreduces the number of occasions for bugs where you forget to check for\nan error return value.  However, make sure that you don't leak\nresources when an exception may be thrown.  A good way to avoid\nleaking resources is to rely exclusively on destructors for reclaiming\nresources.  For example, in the implementation of `CryptFiloe` we have a\n`unique_fd` that automatically closes a file descriptor on destruction.\n\nThe one place where you must _not_ throw exceptions is inside your page\nfault handler, unless you also catch the exceptions in the handler.\nIf an exception escapes past your page fault handler,\n`VMRegion` will crash the program.\n\n## Developing and Testing\nTo get started on this project, login to the myth cluster and clone\nthe starter repo with this command:\n```\ngit clone @gitRepo@/cmap.git cs111_p5\n```\nThis will create a new directory `cs111_p5`. Do your work for the project\nin this directory.\nThe files `mcryptfile.hh` and `mcryptfile.cc` contain a skeleton for all\nof the code you need to write. Add to the declarations in `mcryptfile.h`\nand fill in the bodies of the methods in `mcryptfile.cc` to implement\nthe facilities described above. You can also additional methods or\nclasses if needed to implement your solution.\n\nThe directory also contains a `Makefile`; if you type `make`, it\nwill compile your code along with a test program, creating an\nexecutable `test`.\nYou can then invoke `./run_tests map_tests`, which will run a simple set of\ntests on your code.\nYou can also invoke `test` with an argument specifying a particular\ntest name; this will run a single test and print out its results.\nInvoke `test` with a bogus test name to print out all of the\navailable test names.\n\nAs usual, we do not guarantee that the tests we have provided are\nexhaustive, so passing all of the tests is not necessarily sufficient\nto ensure a perfect score (CAs may discover other problems in reading\nthrough your code).\n\n## Miscellaneous Notes\n* For this project you may assume that there are enough physical pages\n  to accommodate all of the pages in all of the open `MCryptFiles`:\n  you need not worry about page replacement. You can abort the program\n  if `PhysMem::page_alloc` returns `nullptr`.\n\n* Use an `itree` for each `MCryptFile` to keep track of the `VPage`s\n  in the region for that file. There should only be entries in\n  the `itree` for `VPage`s that have associated `PPage`s.\n\n* Create a single `PhysMem` object and share it across all `MCryptFiles`.\n  Don't create this object until the first time an `MCryptFile` is mapped:\n  `MCryptFile::set_memory_size` may be invoked before then to specify\n  how large the pool of physical memory should be. If\n  `MCryptFile::set_memory_size` has not been invoked, use 1000 pages\n  in the `PhysMem` object.\n\n* Load pages into memory on demand, so that no memory is wasted on\n  pages that are never accessed. You should only read pages from disk\n  when responding to page faults.\n\n* You must ensure that modified pages eventually get written back to\n  the file. To do this, you will need to keep track of which pages are\n  dirty (clean pages should not be written back).\n  Paging hardware usually provides a \"dirty\" bit in page table entries,\n  which is set by the hardware when a page is written.\n  Unfortunately, this information is not passed through by the `mmap`\n  mechanism we are using for this project, so you will have to\n  detect when pages are written.\n  You can use page protections for this:\n  if a page's protection is set to `PROT_READ`, then a page fault will\n  occur the first time the page is written (and a page fault will not\n  occur unless the page is written).\n\n* You may assume that your code is used only in single-threaded\n  environments; you do not need to worry about synchronization for this\n  project.\n\n* Your dirty bit should behave like a dirty bit in a page table, i.e.\n  it should be reset when pages are flushed (and therefore no longer\n  dirty). Since you are notified about writes by page faults, this\n  implies something about the permissions after `flush` is called.\n\n* To use a non-static member function as a handler, you can pass it\n  using a lambda as follows:\n  ```\n  [this](type arg) { fault_handler (arg); }\n  ```\n\n* If you use `gdb` to debug your project, you will notice that by default\n  it catches the SIGSEGV signals used to signal page faults. As a result,\n  the signals will not get through to your code when you run under the\n  debugger. However, you can disable this behavior with the following\n  `gdb` command:\n  ```\n  handle SIGSEGV noprint nostop pass\n  ```\n  The arguments indicate that, when SIGSEGV signals occur, they should\n  be passed to the application; `gdb` will not stop the application\n  or print any indication that the signal occurred.\n\n## Submitting Your Work\nTo submit your solution, `cd` to the directory containing your\nwork and invoke the command\n```\n./submit\n```\nIf you discover problems or improvements after you have submitted,\nyou may resubmit; only the latest submit will be graded.\n","frontmatter":{"title":"Project 5: Memory-Mapped Encrypted Files"}}},"pageContext":{"slug":"/proj_mcrypt/"}},"staticQueryHashes":[]}