Solutions by Nick Troccoli and Lisa Yan
Go back to this week's extra problem (without solutions) here.
Go back to this week's lab solutions here.
These optional problems are an opportunity to further exercise your understanding of generics and function pointers.
1) Code Study: memmove
About 25 minutes
The C library provides a handful of raw memory routines (e.g. memcpy, memset, ...) that operate on data of an unspecified type. Let's take a look inside memmove (version below based on code from musl) to better understand how these kind of functions are implemented. This code is included in the memmove.c file as well for you to experiment with.
1 void *musl_memmove(void *dest, const void *src, size_t nbytes) {
2 char *dest_copy = dest;
3 const char *src_copy = src;
4
5 if (dest_copy == src_copy) {
6 return dest_copy;
7 }
8
9 if (src_copy + nbytes <= dest_copy || dest_copy + nbytes <= src_copy) {
10 return memcpy(dest_copy, src_copy, nbytes);
11 }
12
13 if (dest_copy < src_copy) {
14 for (int i = 0; i < nbytes; i++) {
15 dest_copy[i] = src_copy[i];
16 }
17 } else {
18 for (int i = nbytes - 1; i >= 0; i--) {
19 dest_copy[i] = src_copy[i];
20 }
21 }
22
23 return dest;
24 }
Go over the code and discuss these questions:
- The function's interface declares its parameters as
void*pointers, but internally it manipulates these pointers aschar*. Q: Why the inconsistency? What would be the consequence of trying to reconcile the discrepancy by declaring the interface aschar*or changing the implementation to usevoid*?It casts to
char *so we interpret the memory at the source and destination addresses in 1 byte increments. If we made the parameterschar *s, then you would have to cast each pointer you pass tomemmoveas that type. If we used onlyvoid *in the implementation, we couldn't dereference it! - Note that there is no typecast on lines 2 and 3 when assigning from an untyped pointer to a typed pointer. A
void*is the universal donor/recipient and can be freely exchanged with other pointer types, no cast necessary. That being said, there is a forever-ongoing online discussion about the contentious issue of whether or not to cast here; the perspective employed here is to not cast, as it is not required. - Q: What special case is being handled on line 5?
When src and dst are equal, meaning no copying is needed.
- Review the man pages for
memcpyandmemmoveto understand the differences between these two functions. Q: What special case is being handled on line 9?This checks whether the memory regions actually overlap.
memcpystipulates that the regions of memory should not overlap, whilememmovecan handle that case. - Q: What two cases are being divided by the if/else on Lines 13/17? Why are both cases necessary?
These cases separate out logic for when the sources overlap in different ways. There needs to be special handling because in some cases just copying byte over one by one could overwrite bytes in the source that have not yet been copied over. If the destination comes before the source, then we should copy bytes in order from the start of source. Even if we overwrite a byte in the source while writing to the destination, that byte will have already been copied over earlier. We can't copy bytes in reverse order, because this might overwrite earlier bytes of source that we have not copied over yet. However, if the destination comes after the source, then we need to copy bytes in the reverse order. If we iterated from the first byte and copied there, then we would overwrite a later source byte before copying it over.
- The man page for
memmovestates that the operation takes place as though it copies the data twice (src->temp, temp->dst), which implies that call tomemmovemight take twice as long asmemcpy. However, the musl implementation doesn't operate in this literal manner. It does correctly handle overlap, but not by copying twice. What does it do instead? Q: Take a look at lines 14-16 and lines 18-20 in particular - what is going on in each of these loops? In this implementation, what then is the expected added cost of usingmemmoveovermemcpy?memmovecopies in place! If the destination comes before the source, then we should copy bytes in order from the start of source. Even if we overwrite a byte in the source while writing to the destination, that byte will have already been copied over earlier. We can't copy bytes in reverse order, because this might overwrite earlier bytes of source that we have not copied over yet. However, if the destination comes after the source, then we need to copy bytes in the reverse order. If we iterated from the first byte and copied there, then we would overwrite a later source byte before copying it over. So it's in reality not much more inefficient (if at all) thanmemcpy. The only difference is usingmemcpyin your code conveys to the reader that you know the memory regions do not overlap. - Trace the call
musl_memmove(NULL, "cs107", 0). Q1: Will it result in a segmentation fault from trying to read/write an invalid pointer? Why or why not? Q2: What about the callmusl_memmove(NULL, "cs107", -1)? Verify your understanding by running thememmove.cprogram.Q1:
memcpywill correctly handle the call withNULLand"cs107"andn = 0. Q2: When you pass in -1 thesize_twill interpret that as a large unsigned value and crash when trying to dereferenceNULL. - Why not always use
memmove? The man page seems to imply that some implementations (though not musl) do suffer a performance hit when usingmemmoveas opposed tomemcpy. Moreover, appropriately usingmemmoveandmemcpycan communicate to code readers when data may or may not overlap. For these reasons, we default tomemcpy, and usememmoveonly when necessary.
The implementation of memmove may remind you of the strncpy function we saw in lecture. The memxxx functions have much in common with their strxxx equivalents, just without the special case to stop at a null byte. In fact, the memxxx functions are declared as part of the <string.h> module and quite possibly written by the same author.