Thursday, March 5, 2009

The Mouse That Roared: Killing A Sun Server During Lunch Break

Hey there,

We continue today with our fast-and-furious post week (brought on by a spat of work and family obligations like we've never seen before!). Like they say: "It's all right if your work gets in the way of your hobbies, but if it gets in the way of itself..." No, wait. That saying goes the other way and is completely inappropriate ;)

Today we decided to test the age-old proposition that any Goliath can be brought down by a David. In this case, we pitted a simple shell script against a SunBlade 150 with 1Gb of RAM (Half of what it can support) and a 650 MHz UltraSPARC IIi CPU (About as good as it can get). The entire experiment took about 30 minutes, because we only ran it until we got sick and tired of waiting.

The end result was a machine that was completely unusable from the console, unreachable via telnet, ftp or ssh and, somehow managed not to panic and crash after about 1000 faults of every kind and color. We're confident it would have eventually gone down, but it was surprisingly robust (in that respect). If we have hours and hours to spend at some time in the future, we'll give the same test a shot and see if the machine just comes back to its senses over time (and how long that takes). Our bet is that it will (like most Unix and Linux OS's running on halfway decent hardware). Only time will tell.

Following, a quick walkthrough of the experiment followed by the tiny little shell script (only run once) that caused all the damage. We called it tree, for some reason; perhaps because it reminded us of that old saying "If a tree falls in a forest, no one can hear you scream." ...damn! Got that one wrong, too! It's amazing what a few basic shell commands and a simple double-exec can do ;)

THE WALKTHROUGH:

1. BASELINE READING:
These are some basic statistics from when nothing much is going on.

host # date;uptime;vmstat 1 1;iostat 1 1
Tue Mar 3 16:56:17 CST 2009
4:56pm up 41 day(s), 3:25, 3 users, load average: 0.71, 1.36, 1.41
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 rm s1 in sy cs us sy id
0 0 0 1240680 389376 1 10 1 0 0 0 0 0 -0 -0 -0 402 381 276 71 3 26
tty dad1 ramdisk1 sd1 nfs1 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
1 136 1 0 3 0 0 0 0 0 1 0 0 1 71 3 0 26


2. INITIAL EXECUTION OF THE SCRIPT WITH TIME/DATE STAMPING: Notice how high the uptime stats jump and the greater time differences between executions of the simple loop.

host # ./tree
host # while :;do date;uptime;vmstat 1 2;iostat 1 2;done
Tue Mar 3 17:01:22 CST 2009
5:01pm up 41 day(s), 3:30, 2 users, load average: 119.02, 27.25, 10.11
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 rm s1 in sy cs us sy id
0 0 0 1240680 389376 1 10 1 0 0 0 0 0 -0 -0 -0 402 381 276 71 3 26
1038 0 0 981184 260848 547 6123 17 174 174 0 0 56 0 0 0 513 6623 336 20 80 0
tty dad1 ramdisk1 sd1 nfs1 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
1 136 1 0 3 0 0 0 0 0 1 0 0 1 71 3 0 26
0 11800 0 0 0 0 0 0 0 0 0 0 0 0 50 50 0 0

Tue Mar 3 17:02:02 CST 2009
5:02pm up 41 day(s), 3:31, 2 users, load average: 445.61, 118.66, 42.41
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 rm s1 in sy cs us sy id
0 0 0 1240672 389376 1 10 1 0 0 0 0 0 -0 -0 -0 402 381 276 71 3 26
1271 1 0 891840 189784 581 6149 4 239 235 0 0 95 0 0 0 591 6517 359 20 80 0
tty dad1 ramdisk1 sd1 nfs1 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
1 136 1 0 3 0 0 0 0 0 1 0 0 1 71 3 0 26
0 550 241 31 1 0 0 0 0 0 0 0 0 0 17 83 0 0

Tue Mar 3 17:02:55 CST 2009
5:02pm up 41 day(s), 3:32, 2 users, load average: 851.80, 278.54, 102.82
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 rm s1 in sy cs us sy id
0 0 0 1240664 389368 1 10 1 0 0 0 0 0 -0 -0 -0 402 381 276 71 3 26
870 1 0 753176 54624 159 2286 260 1085 1236 0 758 346 0 0 0 1090 5003 1533 13 87 0
tty dad1 ramdisk1 sd1 nfs1 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
1 136 1 0 3 0 0 0 0 0 1 0 0 1 71 3 0 26
0 170 1392 319 44 0 0 0 0 0 0 0 0 0 16 84 0 0


3. 6 MINUTES IN AND WE'VE GOT PROBLEMS: It only gets worse from here, since our script has exponentially multiplied itself so many times already that the amount of possible failed execs is infinitesimal in comparison to the system faults.

Tue Mar 3 17:07:45 CST 2009
ld.so.1: cp: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: cp: fatal: libc.so.1: open failed: No such file or directory
Killed
ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory
ld.so.1: sh: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: sh: fatal: libc.so.1: open failed: No such file or directory
ld.so.1: cp: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: cp: fatal: libc.so.1: open failed: No such file or directory
Killed
tree: tree: cannot open
ld.so.1: cp: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: cp: fatal: libc.so.1: open failed: No such file or directory
Killed
ld.so.1: cp: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: cp: fatal: libc.so.1: open failed: No such file or directory
Killed
ld.so.1: cp: fatal: /usr/lib/libc.so.1: mmap failed: Resource temporarily unavailable
ld.so.1: cp: fatal: libc.so.1: open failed: No such file or directory
Killed
cp: Cannot map /lib/ld.so.1
Killed
tree: cannot fork: no swap space


4. 16 MINUTES IN AND THE CONSOLE IS TOAST: Still the brave little toaster, I mean SunBlade, stays on the net and doesn't go down. In fact, response times to echo are instant.

host # while :;do date;ping -c 10 10.99.99.99;date;done
Tue Mar 3 17:17:56 CST 2009
PING 10.99.99.99: (10.99.99.99): 56 data bytes
64 bytes from 10.99.99.99: icmp_seq=0 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=1 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=2 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=3 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=4 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=5 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=6 ttl=254 time=0 ms

Tue Mar 3 17:19:22 CST 2009
PING 10.99.99.99: (10.99.99.99): 56 data bytes
64 bytes from 10.99.99.99: icmp_seq=0 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=1 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=2 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=3 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=4 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=5 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=6 ttl=254 time=0 ms


5. 28 MINUTES IN AND WE'RE GETTING TIRED OF WAITING: The little SunBlade never tapped out (at least not in the first half hour). Even though it has been rendered completely useless, the ping responses are still sitting at 0ms. Time for a reboot.

host # while :;do date;ping -c 10 10.99.99.99;date;done
Tue Mar 3 17:30:32 CST 2009
PING 10.99.99.99: (10.99.99.99): 56 data bytes
64 bytes from 10.99.99.99: icmp_seq=0 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=1 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=2 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=3 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=4 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=5 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=6 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=7 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=8 ttl=254 time=0 ms
64 bytes from 10.99.99.99: icmp_seq=9 ttl=254 time=0 ms


6. AND THIS IS HOW EASY IT WAS TO CRIPPLE THE MACHINE: The script attached is very simple. Given a larger machine, with more horsepower, we think you'd see the exact same results within, probably, the same amount of time. Hopefully, someday, we'll be able to find out without losing our jobs ;)

If there was a lesson in today's post, it would probably read something like this: "Be mindful of the small things. They can grow into big things very fast!" (We got that one right, since we just paraphrased something that may or may not be a famous expression ;)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# tree - POC only - don't run this - it will spawn a lot of processes and slow your machine down severely.
#
# 2009 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

mkdir a b
cp tree a/
cp tree b/
(cd a;exec sh tree &)
(cd b;exec sh tree &)


, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.