How can I write a I/O bound C program?

Question

I must write programs that are I/O Bound and that will make my I/O scheduler work like never done before for a Operating Systems homework, but I have no idea how to do it. I've tried writing a simple C program that counts the lines of big text files, but it executes too fast and I can't measure the effectiveness of my scheduler with it. This is worth 25% of my grade, any help would be much appreciated.

By *I/O bound*, do you specifically mean *hard disk I/O*, or are other types of I/O acceptable? — user3386109, Oct 17 '16 at 23:19
@user3386109 Other types of I/O are acceptable, I just assumed hard disk I/O is more expensive, and I need things to be very expensive here — Fagundes, Oct 17 '16 at 23:21
The advantage of network I/O is that it just pushes electrons around, and doesn't destroy your hard drive. [Here's a starting point for network I/O](https://stackoverflow.com/questions/35568996/socket-programming-udp-client-server-in-c/35570418#35570418). In that example, the client and server stop after 4 messages, but nothing stops you from sending/receiving forever. — user3386109, Oct 17 '16 at 23:27
The advantage of using network IO is that you can deliberately make the other end really slow. — user253751, Oct 17 '16 at 23:29
Why not generate fake data as fast as possible and pipe it to /dev/null? Add additional threads as needed. Has the advantage of not destroying your HD, _and_ not "cheating" by waiting on a network. — Ray Hamel, Oct 18 '16 at 00:20
I'll be surprised if anyone destroys any hard disks just by writing lots of data to them, unless they were already failing. — user253751, Oct 18 '16 at 00:28
Linux is good at caching, so only the first time you read a file will be slow. Writes are never slow because they go into memory and only get written to disk in the background. — stark, Oct 18 '16 at 01:26

score 0 · Answer 1 · answered Oct 18 '16 at 02:02

Try this:

find / | head -n 1000000 | xargs -P 10 wc >/dev/null 2>&1

It should be pretty bad. The xargs -P option causes it to run in parallel. You can adjust the options to get a suitable load. wc isn't doing much other than consuming each file so think it should be chiefly IO-bound. Of course there is still disk caching.

score 0 · Accepted Answer · answered Oct 18 '16 at 18:46

The previous answer appears more CPU bound than i/o bound from my tests.

I suggest opening many files and seeking like crazy within each file using the low-level non-cached C routines. Here is the C code that performs the seeks:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char **argv) {

    if (argc != 2) {
        fprintf(stderr, "specify a file to read!\n");
        return EXIT_FAILURE;
    }

    int fd = open(argv[1], O_DIRECT);
    if (fd < 0) {
        perror("open error");
        return EXIT_FAILURE;
    }

    off_t size = lseek(fd, 0, SEEK_END);

    for (int i = 0; i < 1000000; i++)                                                                                
        lseek(fd, rand() % size, SEEK_SET);

    close(fd);

    return EXIT_SUCCESS;
}

Then, in a shell, run it through find for every file in the filesystem:

find / -exec ./io_bound {} \; 2>/dev/null

On my system it works quite well, one can spot the rcu_sched task:

top - 20:44:48 up 57 min,  1 user,  load average: 0.84, 0.76, 0.59
Tasks: 266 total,   2 running, 264 sleeping,   0 stopped,   0 zombie
%Cpu0  : 10.0 us, 11.3 sy,  0.0 ni, 78.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 10.3 us,  8.3 sy,  0.0 ni, 81.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 15.3 us, 15.9 sy,  0.0 ni, 68.4 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 16.4 us, 14.7 sy,  0.0 ni, 68.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16314192 total,  9431208 free,  3312716 used,  3570268 buff/cache
KiB Swap: 15624188 total, 15624188 free,        0 used. 12630464 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
13087 gluckf    20   0    4224    784    708 R   3.3  0.0   0:00.10 io_bound    
    7 root      20   0       0      0      0 S   0.3  0.0   0:03.54 rcu_sched   
 1586 root      20   0  455832  74824  62736 S   0.3  0.5   0:41.49 Xorg        
 2160 gluckf    20   0 1389916 137096  52548 S   0.3  0.8   0:41.27 cinnamon    
 2285 gluckf    20   0  498388  46752  28632 S   0.3  0.3   0:14.15 gnome-term+

`lseek` by itself generally doesn't generate IO. It likely [merely changes an offset value in the kernel's file descriptor table for the process](http://lxr.free-electrons.com/source/fs/read_write.c?v=2.0.40#L19). — Andrew Henle, Oct 18 '16 at 19:11
You're right, thanks. I had intended to read a few sectors at each random position (to avoid caching) but totally forgot! I'm not sure how to insert to code here though. — zorglub23872, Oct 18 '16 at 20:18
Just use `pread()` to read a 512-byte buffer from a random offset - no `lseek()` needed. You can mask the random offset with `~(0x1FF)` (complement of 511) to get a 512-byte aligned offset, and hopefully grab a single random disk block (assuming 512-byte disk blocks). And to *really* become IO bound, do it with multiple threads. — Andrew Henle, Oct 18 '16 at 20:33
Ok @AndrewHenle, so is this program really IO bound? I kinda got lost in the middle of your conversation with flg and don't know what should do — Fagundes, Oct 20 '16 at 19:46
@Fagundes It should be - unless the `rand()` call spends too much time blocking, waiting for more entropy. And `rand()` might not produce enough range - it's limited to `RAND_MAX`, which might be too small. If that's the case, [you can use `lrand48()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/drand48.html). — Andrew Henle, Oct 20 '16 at 20:50

How can I write a I/O bound C program?

2 Answers2