CS107 Lab 4 Extra Problems (Solution)

Lab 4 Extra Problems (Solution)

Solutions by Nick Troccoli and Lisa Yan

Go back to this week's extra problem (without solutions) here.

Go back to this week's lab solutions here.

These optional problems are an opportunity to further exercise your understanding of generics and function pointers.

1) Code Study: `memmove`

About 25 minutes

The C library provides a handful of raw memory routines (e.g. memcpy, memset, ...) that operate on data of an unspecified type. Let's take a look inside memmove (version below based on code from musl) to better understand how these kind of functions are implemented. This code is included in the memmove.c file as well for you to experiment with.

 1    void *musl_memmove(void *dest, const void *src, size_t nbytes) {
 2        char *dest_copy = dest;
 3        const char *src_copy = src;
 4
 5        if (dest_copy == src_copy) {
 6            return dest_copy;
 7        }
 8
 9        if (src_copy + nbytes <= dest_copy || dest_copy + nbytes <= src_copy) {
10            return memcpy(dest_copy, src_copy, nbytes);
11        }
12
13        if (dest_copy < src_copy) {
14            for (int i = 0; i < nbytes; i++) {
15                dest_copy[i] = src_copy[i];
16            }
17        } else {
18            for (int i = nbytes - 1; i >= 0; i--) {
19                dest_copy[i] = src_copy[i];
20            }
21        }
22
23        return dest;
24    }

Go over the code and discuss these questions:

The function's interface declares its parameters as void* pointers, but internally it manipulates these pointers as char*. Q: Why the inconsistency? What would be the consequence of trying to reconcile the discrepancy by declaring the interface as char* or changing the implementation to use void*?
It casts to char * so we interpret the memory at the source and destination addresses in 1 byte increments. If we made the parameters char *s, then you would have to cast each pointer you pass to memmove as that type. If we used only void * in the implementation, we couldn't dereference it!
Note that there is no typecast on lines 2 and 3 when assigning from an untyped pointer to a typed pointer. A void* is the universal donor/recipient and can be freely exchanged with other pointer types, no cast necessary. That being said, there is a forever-ongoing online discussion about the contentious issue of whether or not to cast here; the perspective employed here is to not cast, as it is not required.
Q: What special case is being handled on line 5?
When src and dst are equal, meaning no copying is needed.
Review the man pages for memcpy and memmove to understand the differences between these two functions. Q: What special case is being handled on line 9?
This checks whether the memory regions actually overlap. memcpy stipulates that the regions of memory should not overlap, while memmove can handle that case.
Q: What two cases are being divided by the if/else on Lines 13/17? Why are both cases necessary?
These cases separate out logic for when the sources overlap in different ways. There needs to be special handling because in some cases just copying byte over one by one could overwrite bytes in the source that have not yet been copied over. If the destination comes before the source, then we should copy bytes in order from the start of source. Even if we overwrite a byte in the source while writing to the destination, that byte will have already been copied over earlier. We can't copy bytes in reverse order, because this might overwrite earlier bytes of source that we have not copied over yet. However, if the destination comes after the source, then we need to copy bytes in the reverse order. If we iterated from the first byte and copied there, then we would overwrite a later source byte before copying it over.
The man page for memmove states that the operation takes place as though it copies the data twice (src->temp, temp->dst), which implies that call to memmove might take twice as long as memcpy. However, the musl implementation doesn't operate in this literal manner. It does correctly handle overlap, but not by copying twice. What does it do instead? Q: Take a look at lines 14-16 and lines 18-20 in particular - what is going on in each of these loops? In this implementation, what then is the expected added cost of using memmove over memcpy?
memmove copies in place! If the destination comes before the source, then we should copy bytes in order from the start of source. Even if we overwrite a byte in the source while writing to the destination, that byte will have already been copied over earlier. We can't copy bytes in reverse order, because this might overwrite earlier bytes of source that we have not copied over yet. However, if the destination comes after the source, then we need to copy bytes in the reverse order. If we iterated from the first byte and copied there, then we would overwrite a later source byte before copying it over. So it's in reality not much more inefficient (if at all) than memcpy. The only difference is using memcpy in your code conveys to the reader that you know the memory regions do not overlap.
Trace the call musl_memmove(NULL, "cs107", 0). Q1: Will it result in a segmentation fault from trying to read/write an invalid pointer? Why or why not? Q2: What about the call musl_memmove(NULL, "cs107", -1)? Verify your understanding by running the memmove.c program.
Q1: memcpy will correctly handle the call with NULL and "cs107" and n = 0. Q2: When you pass in -1 the size_t will interpret that as a large unsigned value and crash when trying to dereference NULL.
Why not always use memmove? The man page seems to imply that some implementations (though not musl) do suffer a performance hit when using memmove as opposed to memcpy. Moreover, appropriately using memmove and memcpy can communicate to code readers when data may or may not overlap. For these reasons, we default to memcpy, and use memmove only when necessary.

The implementation of memmove may remind you of the strncpy function we saw in lecture. The memxxx functions have much in common with their strxxx equivalents, just without the special case to stop at a null byte. In fact, the memxxx functions are declared as part of the <string.h> module and quite possibly written by the same author.

1) Code Study: memmove

1) Code Study: `memmove`