Aug 092011
 

grep, sed, and awk for fun and profit

I recently moved this website to a new website. After doing that, I noticed a lot more captcha failures. I don’t think there are more automated attempts. logcheck on the new server is not configured to ignore the log messages. Tonight, I thought I’d do something about them.

The Log messages

The log messages look like this:
Aug  6 23:09:06 gelt FreeBSDDiary[43547]: captcha failure: user='Marina' IP='95.65.75.160' email='marinapetrova08@gmail.com'
Aug  6 23:21:18 gelt FreeBSDDiary[43870]: captcha failure: user='MigoneWoorope' IP='173.242.116.186' email='jeffer.s.o.n.v.3.v@gmail.com'
Aug  6 23:50:19 gelt FreeBSDDiary[47046]: captcha failure: user='apelealcoxy' IP='109.230.251.74' email='southfan@southpark.jatsu.pl'
These messages are created by some custom code I have added to Phorum Granted, the message isn’t exactly easy to parse. If I was doing this again, I would rethink my output.

Parsing the output

My goal: get a list of IP addresses and the number of failed capchas for each. This is my first attempt:
$ bunzip2 -c  /var/log/messages.* | grep 'captcha failure' | sed "s/.*IP=\'\(.*\)\' email.*/\1/g" | sort | uniq -c | sort -r
  25 95.65.75.160
  13 77.65.48.239
  12 195.162.68.141
  10 68.185.116.91
   8 89.76.212.140
   7 188.92.77.196
   6 199.15.234.226
   5 95.64.12.21
   5 193.105.210.64
   5 190.202.87.131
   4 67.205.96.23
   4 204.124.182.82
   4 182.50.141.198
   4 110.85.4.110
   4 109.73.76.45
   3 79.133.133.158
   3 78.108.79.129
   3 77.127.24.215
   3 50.7.240.10
   3 46.118.42.143
   3 31.192.105.125
   3 188.165.254.157
   3 182.50.142.66
   3 173.242.122.112
   3 121.97.116.217
   3 117.88.218.15
   3 109.230.251.74
   3 109.230.244.101
   3 109.230.222.225
   3 109.230.220.230
   2 98.126.54.66
   2 96.45.173.2
   2 95.84.137.56
   2 91.224.160.90
   2 91.210.157.234
   2 91.207.8.46
   2 89.28.124.238
   2 89.105.246.13
   2 87.70.43.111
   2 77.87.32.102
   2 77.70.41.8
   2 77.125.105.161
   2 68.223.157.219
   2 58.8.220.166
   2 58.8.154.111
   2 46.118.42.41
   2 46.118.229.58
   2 31.184.236.44
   2 31.184.236.31
   2 221.206.40.125
   2 209.226.31.161
   2 208.76.55.91
   2 195.218.182.253
   2 184.106.170.252
   2 178.210.32.201
   2 178.168.47.154
   2 119.93.74.238
   2 109.230.246.238
   2 109.230.244.129
   2 109.186.24.10
   1 98.254.20.226
   1 98.220.58.252
   1 95.79.39.180
   1 95.78.65.113
   1 95.69.216.91
   1 95.220.216.191
   1 95.168.183.233
   1 95.135.16.67
   1 95.133.71.169
   1 95.133.207.140
   1 95.132.98.135
   1 94.232.65.104
   1 94.23.248.199
   1 94.190.47.108
   1 94.179.167.19
   1 94.179.148.219
   1 94.141.37.123
   1 93.182.133.94
   1 93.166.121.107
   1 93.157.169.18
   1 93.127.27.110
   1 92.60.232.11
   1 92.39.76.212
   1 91.224.160.132
   1 91.214.186.131
   1 91.210.104.246
   1 89.208.32.87
   1 89.139.10.153
   1 88.196.166.18
   1 88.190.26.16
   1 87.69.95.86
   1 87.68.52.89
   1 87.249.3.2
   1 85.122.23.124
   1 84.251.45.75
   1 83.21.213.143
   1 83.149.44.243
   1 83.139.165.126
   1 80.98.175.191
   1 80.87.145.14
   1 80.240.203.100
   1 79.133.140.123
   1 77.92.233.198
   1 72.64.185.93
   1 72.46.131.108
   1 71.241.146.196
   1 71.188.61.102
   1 71.184.168.232
   1 69.181.42.19
   1 68.41.239.107
   1 67.169.121.228
   1 65.78.173.203
   1 61.90.31.61
   1 59.58.154.44
   1 58.8.116.16
   1 58.8.100.13
   1 50.56.95.138
   1 46.251.237.188
   1 46.21.144.176
   1 46.17.96.12
   1 46.146.95.26
   1 41.190.16.17
   1 23.19.39.197
   1 222.165.130.214
   1 221.7.159.224
   1 218.92.8.165
   1 218.24.196.122
   1 217.77.222.158
   1 217.196.164.35
   1 216.24.192.168
   1 213.87.136.220
   1 212.87.241.135
   1 207.204.243.16
   1 204.145.80.57
   1 203.148.95.71
   1 202.181.176.3
   1 195.191.55.204
   1 195.190.13.54
   1 194.11.24.156
   1 193.105.210.113
   1 192.251.226.206
   1 188.92.76.221
   1 188.27.105.149
   1 188.26.145.208
   1 188.233.18.255
   1 188.232.72.141
   1 188.163.64.194
   1 188.143.233.14
   1 188.143.233.111
   1 188.143.232.164
   1 188.143.232.157
   1 188.143.232.109
   1 188.134.30.212
   1 187.76.192.186
   1 184.171.170.75
   1 184.107.41.143
   1 183.16.116.7
   1 178.162.155.241
   1 178.137.17.213
   1 178.122.42.126
   1 178.122.40.252
   1 178.121.103.57
   1 175.42.82.224
   1 174.36.42.78
   1 174.142.19.206
   1 173.234.229.179
   1 173.0.59.196
   1 141.105.65.153
   1 125.39.93.39
   1 125.120.185.56
   1 125.109.198.150
   1 123.121.216.142
   1 122.193.26.244
   1 121.54.84.26
   1 118.97.164.78
   1 117.41.235.212
   1 116.252.185.10
   1 115.87.242.24
   1 112.111.184.192
   1 109.95.196.34
   1 109.87.152.237
   1 109.230.251.99
   1 109.230.251.228
   1 109.230.251.184
   1 109.230.251.121
   1 109.230.244.111
   1 109.230.217.104
   1 109.230.216.123
   1 109.172.78.18
   1 108.48.26.155

The log messages look like this:
Aug  6 23:09:06 gelt FreeBSDDiary[43547]: captcha failure: user='Marina' IP='95.65.75.160' email='marinapetrova08@gmail.com'
Aug  6 23:21:18 gelt FreeBSDDiary[43870]: captcha failure: user='MigoneWoorope' IP='173.242.116.186' email='jeffer.s.o.n.v.3.v@gmail.com'
Aug  6 23:50:19 gelt FreeBSDDiary[47046]: captcha failure: user='apelealcoxy' IP='109.230.251.74' email='southfan@southpark.jatsu.pl'
That is 190 distinct IP addresses. Well, I don’t mind. I’ll rediret them all… Or perhaps just the top 10 offenders:
$ bunzip2 -c  /var/log/messages.* | grep 'captcha failure' | sed "s/.*IP=\'\(.*\)\' email.*/\1/g" | sort | uniq -c | sort -r | head -10 | awk '{print $2}' | sort
188.92.77.196
190.202.87.131
193.105.210.64
195.162.68.141
199.15.234.226
68.185.116.91
77.65.48.239
89.76.212.140
95.64.12.21
95.65.75.160
Hmmm, now I’ll add those IP addresses to my virtual host definition. But first, some help:
$ bunzip2 -c  /var/log/messages.* | grep 'captcha failure' | sed "s/.*IP=\'\(.*\)\' email.*/\1/g" | sort | uniq -c | sort -r | head -10 | awk '{print "RewriteCond %{REMOTE_ADDR} " $2 " [OR]"}' | sort
RewriteCond %{REMOTE_ADDR} 188.92.77.196 [OR]
RewriteCond %{REMOTE_ADDR} 190.202.87.131 [OR]
RewriteCond %{REMOTE_ADDR} 193.105.210.64 [OR]
RewriteCond %{REMOTE_ADDR} 195.162.68.141 [OR]
RewriteCond %{REMOTE_ADDR} 199.15.234.226 [OR]
RewriteCond %{REMOTE_ADDR} 68.185.116.91 [OR]
RewriteCond %{REMOTE_ADDR} 77.65.48.239 [OR]
RewriteCond %{REMOTE_ADDR} 89.76.212.140 [OR]
RewriteCond %{REMOTE_ADDR} 95.64.12.21 [OR]
RewriteCond %{REMOTE_ADDR} 95.65.75.160 [OR]
Using that, I redirect those IP addresses to another URL, where they cannot login or register.

  One Response to “grep, sed, and awk for fun and profit”

  1. Cool, looks like my parsing stuff I do on apache files when
    people try (or succeed) to ddos apache.

    Wouldn’t it be cool when people like you collecting data
    of offenders could forward those found IP’s to a sort of collecting station
    on the internet, where the IP’s would be diminished in value.
    In an easy way, like just pipe the found IP’s to a locally running daemon.