Fixing two copy-on-write bugs in the NBD server.
What is What?
NBD is a network block device protocol. It has some overlap with iSCSI,
and a little with NFS. The protocol is much simpler than either, and has
one extra feature: copy-on-write.
Copy-on-write allows for sharing the same file to multiple machines, and
only writing changes back to disk. This can save a lot of storage space
in certain situations.
There are two bugs, and two one-line fixes presented here.
The Bugs
Sequential Read after Write
For the first case, it’s enough to read and write more than one
sequential block. The second and subsequent blocks read will read into
the wrong offset of the buffer, and copy invalid data to the client.
I use a 4096 block size in this example, but I’ve used others. I did
that to match the filesystem, but for the test I don’t even need a filesystem.
The Testexport OFFSET=0
export COUNT=3 # anything >= 1
dd if=/dev/urandom of=testdata bs=4096 count=$COUNT # random data
dd if=testdata of=/dev/nbd0 bs=4096 seek=$OFFSET count=$COUNT
dd if=/dev/nbd0 of=compdata bs=4096 skip=$OFFSET count=$COUNT
sum testdata compdata
The data, testdata and compdata, will be different.
If the kernel does a partition check when /dev/nbd0 is mounted, this
test will fail with COUNT=1 as well.
Sparse Write at the Wrong Offset
For the second case, with sparse_cow=true, we need to repeat the test
with an offset > 0. expwrite() calls write() instead of pwrite().
The Testexport OFFSET=100
export COUNT=3 # anything >= 1
dd if=/dev/urandom of=testdata bs=4096 count=$COUNT # random data
dd if=testdata of=/dev/nbd0 bs=4096 seek=$OFFSET count=$COUNT
dd if=/dev/nbd0 of=compdata bs=4096 skip=$OFFSET count=$COUNT
sum testdata compdata
The first time it’s run, it will result in an Input/Output error.
The second time it’s run, it will work.
The Patches
diff --git a/nbd-orig/nbd-server.c b/nbd-patched/nbd-server.c
index 92fd141..18e5ddd 100644
--- a/nbd-orig/nbd-server.c
+++ b/nbd-patched/nbd-server.c
@@ -1582,6 +1582,7 @@ int expread(READ_CTX *ctx, CLIENT *client) {
if (pread(client->difffile, buf, rdlen, client->difmap[mapcnt]*DIFFPAGESIZE+offset) != rdlen) {
goto fail;
}
+ ctx->current_offset += rdlen;
confirm_read(client, ctx, rdlen);
} else { /* the block is not there */
if ((client->server->flags & F_WAIT) &&
(client->export == NULL)){
diff --git a/nbd-orig/nbd-server.c b/nbd-patched/nbd-server.c
index 92fd141..9a57ad5 100644
--- a/nbd-orig/nbd-server.c
+++ b/nbd-patched/nbd-server.c
@@ -1669,7 +1669,7 @@ int expwrite(off_t a, char *buf, size_t len,
CLIENT *client, int fua) {
if(ret < 0 ) goto fail;
}
memcpy(pagebuf+offset,buf,wrlen) ;
- if (write(client->difffile, pagebuf, DIFFPAGESIZE) != DIFFPAGESIZE)
+ if (pwrite(client->difffile, pagebuf, DIFFPAGESIZE, client->difmap[mapcnt]*DIFFPAGESIZE) != DIFFPAGESIZE)
goto fail;
}
if (!(client->server->flags & F_COPYONWRITE))
What Question Am I Answering?
A question came up in a coding class, about why in Rust there’s a borrow,
and there’s a an error when borrowing.
You may have seen it like (copied from the rust book):
error[E0499]: cannot borrow `s` as mutable more than once at a time
--> src/main.rs:5:14
|
4 | let r1 = &mut s;
| ------ first mutable borrow occurs here
5 | let r2 = &mut s;
| ^^^^^^ second mutable borrow occurs here
6 |
7 | println!("{}, {}", r1, r2);
| -- first borrow later used here
For more information about this error, try `rustc --explain E0499`.
error: could not compile `ownership` due to previous error
The Rust book does explain what borrow is, and the theory behind
why it’s allowed and not allowed.
I’m showing the risks practical example.
How Am I Answering It?
With as little computer theory as possible.
- No assembler
- No diagrams
- No multiple languages
- short, simple examples
- No theory until after the bug is shown
Caveat
The answer is in C. It’s in C because Rust won’t let me break
borrow, not even in an unsafe {}
block.
It’s all in plain C, though. No assembler, and no C++. As
straightforward C as I can make it, specifically to allow for
plain examples. As few as possible shortcuts are taken.
I’m not trying to write a C-programmer’s C code. I’m trying to
write an example that shows the problem, for non-C programmers.
It should not be difficult to follow coming from Rust.
Inspiration
The inspiration comes from the C max()
function, which is actually
a macro.
That is great for explaining the differences between functions and macros,
and it’s great for explaining pass-by-code instead of value or reference.
It’s also great for explaining surprises.
First Examples
First, we give example code for pass-by-value and pass-by-reference.
Pass-by-reference is also called borrow in Rust. The examples are simple,
and we do not yet discuss the difference.
For now, it’s simply about C syntax.
#include <stdio.h>
int double_by_value(int x) {
x = x * 2;
return x;
}
int double_by_reference(int *x) {
*x = *x * 2;
return *x;
}
int main() {
int x;
x = 5;
printf("double_by_value(x) = %d\n", double_by_value(x));
/* double_by_value(x) = 10 */
x = 5;
printf("double_by_reference(x) = %d\n", double_by_reference(&x));
/* double_by_reference(x) = 10 */
return 0;
}
The code is almost identical. You can see the syntax difference, but no
functional difference yet.
The logic is the same, the input is the same, and the output is the same.
Let’s make it a tiny bit more challenging, and add a second variable.
#include <stdio.h>
int add_by_value(int x, int y) {
x = x + y;
return x;
}
int add_by_reference(int *x, int *y) {
*x = *x + *y;
return *x;
}
int main() {
int x, y;
x = y = 5;
printf("add_by_value(x, y) = %d\n", add_by_value(x, y));
/* add_by_value(x, y) = 10 */
x = y = 5;
printf("add_by_reference(x, y) = %d\n", add_by_reference(&x, &y));
/* add_by_reference(x, y) = 10 */
return 0;
}
The logic is the same, the input is the same, and the output is the same.
Put it together
Add the two and we get…
#include <stdio.h>
/* Previous example code here */
int add_and_double_by_value(int x, int y) {
x = double_by_value(x);
y = double_by_value(y);
x = x + y;
return x;
}
int add_and_double_by_reference(int *x, int *y) {
*x = double_by_reference(x);
*y = double_by_reference(y);
*x = *x + *y;
return *x;
}
int main() {
int x, y;
x = y = 5;
printf("add_and_double_by_value(x, y) = %d\n", add_and_double_by_value(x, y));
/* add_and_double_by_value(x, y) = 20 */
x = y = 5;
printf("add_and_double_by_reference(x, y) = %d\n", add_and_double_by_reference(&x, &y));
/* add_and_double_by_reference(x, y) = 20 */
return 0;
}
No surprised yet. Both examples print the same, correct answer.
The logic is the same, the input is the same, and the output is the same.
Surprise
Now let’s make it simpler…
#include <stdio.h>
/* Previous example code here */
int main() {
int x;
x = 5;
printf("add_and_double_by_value(x, x) = %d\n", add_and_double_by_value(x, x));
/* add_and_double_by_value(x, x) = 20 */
x = 5;
printf("add_and_double_by_reference(x, x) = %d\n", add_and_double_by_reference(&x, &x));
/* add_and_double_by_reference(x, x) = 40 */
return 0;
}
The logic is the same, the input is the same, but the output is different.
Why is it 40?
Double borrow
So what happened? Why did removing a variable, y, result in the “wrong” answer?
Pass by value copies the value, and passes the value, not the variable. The original is
never modified.
Pass by reference (or borrow), passes a reference, not a copy. The original is modified.
Look at
int add_and_double_by_reference(int *x, int *y) {
*x = double_by_reference(x);
*y = double_by_reference(y);
*x = *x + *y;
return *x;
}
When x is doubled, at *x = double_by_reference(x);
, and *x
is also *y
, y is
also doubled. X is now 10, as expected, but y is also 10.
Then y is doubled. And since *y
is _also *x
, x is doubled again. Now both variables
are 20.
Tada!
This is (one of the reasons) why Rust won’t let you borrow the same variable twice.
What do you do instead?
Borrow once, or .clone() /* copy */
.
Bonus time
So why have pass by reference or borrow at all?
- It’s faster, when the data is large.
- It’s faster for streamed data.
- It’s not possible in assembler pass complicated variables by value. Manual copies
must be made, and the language you use might not implement implicit copies.
For extra points, do the same with threads.