I must write programs that are I/O Bound and that will make my I/O scheduler work like never done before for a Operating Systems homework, but I have no idea how to do it. I've tried writing a simple C program that counts the lines of big text files, but it executes too fast and I can't measure the effectiveness of my scheduler with it. This is worth 25% of my grade, any help would be much appreciated.
Asked
Active
Viewed 1,220 times
1
-
use a bigger file? – Karoly Horvath Oct 17 '16 at 23:12
-
By *I/O bound*, do you specifically mean *hard disk I/O*, or are other types of I/O acceptable? – user3386109 Oct 17 '16 at 23:19
-
@user3386109 Other types of I/O are acceptable, I just assumed hard disk I/O is more expensive, and I need things to be very expensive here – Fagundes Oct 17 '16 at 23:21
-
1The advantage of network I/O is that it just pushes electrons around, and doesn't destroy your hard drive. [Here's a starting point for network I/O](https://stackoverflow.com/questions/35568996/socket-programming-udp-client-server-in-c/35570418#35570418). In that example, the client and server stop after 4 messages, but nothing stops you from sending/receiving forever. – user3386109 Oct 17 '16 at 23:27
-
The advantage of using network IO is that you can deliberately make the other end really slow. – user253751 Oct 17 '16 at 23:29
-
Why not generate fake data as fast as possible and pipe it to /dev/null? Add additional threads as needed. Has the advantage of not destroying your HD, _and_ not "cheating" by waiting on a network. – Ray Hamel Oct 18 '16 at 00:20
-
I'll be surprised if anyone destroys any hard disks just by writing lots of data to them, unless they were already failing. – user253751 Oct 18 '16 at 00:28
-
Linux is good at caching, so only the first time you read a file will be slow. Writes are never slow because they go into memory and only get written to disk in the background. – stark Oct 18 '16 at 01:26
2 Answers
0
Try this:
find / | head -n 1000000 | xargs -P 10 wc >/dev/null 2>&1
It should be pretty bad. The xargs -P option causes it to run in parallel. You can adjust the options to get a suitable load. wc
isn't doing much other than consuming each file so think it should be chiefly IO-bound. Of course there is still disk caching.

jforberg
- 6,537
- 3
- 29
- 47
0
The previous answer appears more CPU bound than i/o bound from my tests.
I suggest opening many files and seeking like crazy within each file using the low-level non-cached C routines. Here is the C code that performs the seeks:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "specify a file to read!\n");
return EXIT_FAILURE;
}
int fd = open(argv[1], O_DIRECT);
if (fd < 0) {
perror("open error");
return EXIT_FAILURE;
}
off_t size = lseek(fd, 0, SEEK_END);
for (int i = 0; i < 1000000; i++)
lseek(fd, rand() % size, SEEK_SET);
close(fd);
return EXIT_SUCCESS;
}
Then, in a shell, run it through find for every file in the filesystem:
find / -exec ./io_bound {} \; 2>/dev/null
On my system it works quite well, one can spot the rcu_sched task:
top - 20:44:48 up 57 min, 1 user, load average: 0.84, 0.76, 0.59
Tasks: 266 total, 2 running, 264 sleeping, 0 stopped, 0 zombie
%Cpu0 : 10.0 us, 11.3 sy, 0.0 ni, 78.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 10.3 us, 8.3 sy, 0.0 ni, 81.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 15.3 us, 15.9 sy, 0.0 ni, 68.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 16.4 us, 14.7 sy, 0.0 ni, 68.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16314192 total, 9431208 free, 3312716 used, 3570268 buff/cache
KiB Swap: 15624188 total, 15624188 free, 0 used. 12630464 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13087 gluckf 20 0 4224 784 708 R 3.3 0.0 0:00.10 io_bound
7 root 20 0 0 0 0 S 0.3 0.0 0:03.54 rcu_sched
1586 root 20 0 455832 74824 62736 S 0.3 0.5 0:41.49 Xorg
2160 gluckf 20 0 1389916 137096 52548 S 0.3 0.8 0:41.27 cinnamon
2285 gluckf 20 0 498388 46752 28632 S 0.3 0.3 0:14.15 gnome-term+

zorglub23872
- 301
- 2
- 5
-
`lseek` by itself generally doesn't generate IO. It likely [merely changes an offset value in the kernel's file descriptor table for the process](http://lxr.free-electrons.com/source/fs/read_write.c?v=2.0.40#L19). – Andrew Henle Oct 18 '16 at 19:11
-
You're right, thanks. I had intended to read a few sectors at each random position (to avoid caching) but totally forgot! I'm not sure how to insert to code here though. – zorglub23872 Oct 18 '16 at 20:18
-
Just use `pread()` to read a 512-byte buffer from a random offset - no `lseek()` needed. You can mask the random offset with `~(0x1FF)` (complement of 511) to get a 512-byte aligned offset, and hopefully grab a single random disk block (assuming 512-byte disk blocks). And to *really* become IO bound, do it with multiple threads. – Andrew Henle Oct 18 '16 at 20:33
-
Ok @AndrewHenle, so is this program really IO bound? I kinda got lost in the middle of your conversation with flg and don't know what should do – Fagundes Oct 20 '16 at 19:46
-
@Fagundes It should be - unless the `rand()` call spends too much time blocking, waiting for more entropy. And `rand()` might not produce enough range - it's limited to `RAND_MAX`, which might be too small. If that's the case, [you can use `lrand48()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/drand48.html). – Andrew Henle Oct 20 '16 at 20:50