grep, sed, and awk for fun and profit
I recently moved this website to a new website. After doing that, I noticed a lot more captcha failures.
I don’t think there are more automated attempts. logcheck on the new server is not configured to ignore
the log messages. Tonight, I thought I’d do something about them.
The Log messages
The log messages look like this:
Aug 6 23:09:06 gelt FreeBSDDiary[43547]: captcha failure: user='Marina' IP='95.65.75.160' email='marinapetrova08@gmail.com' Aug 6 23:21:18 gelt FreeBSDDiary[43870]: captcha failure: user='MigoneWoorope' IP='173.242.116.186' email='jeffer.s.o.n.v.3.v@gmail.com' Aug 6 23:50:19 gelt FreeBSDDiary[47046]: captcha failure: user='apelealcoxy' IP='109.230.251.74' email='southfan@southpark.jatsu.pl'
These messages are created by some custom code I have added to Phorum
Granted, the message isn’t exactly easy to parse. If I was doing this again, I would rethink my output.
Parsing the output
My goal: get a list of IP addresses and the number of failed capchas for each. This is my first attempt:
$ bunzip2 -c /var/log/messages.* | grep 'captcha failure' | sed "s/.*IP=\'\(.*\)\' email.*/\1/g" | sort | uniq -c | sort -r 25 95.65.75.160 13 77.65.48.239 12 195.162.68.141 10 68.185.116.91 8 89.76.212.140 7 188.92.77.196 6 199.15.234.226 5 95.64.12.21 5 193.105.210.64 5 190.202.87.131 4 67.205.96.23 4 204.124.182.82 4 182.50.141.198 4 110.85.4.110 4 109.73.76.45 3 79.133.133.158 3 78.108.79.129 3 77.127.24.215 3 50.7.240.10 3 46.118.42.143 3 31.192.105.125 3 188.165.254.157 3 182.50.142.66 3 173.242.122.112 3 121.97.116.217 3 117.88.218.15 3 109.230.251.74 3 109.230.244.101 3 109.230.222.225 3 109.230.220.230 2 98.126.54.66 2 96.45.173.2 2 95.84.137.56 2 91.224.160.90 2 91.210.157.234 2 91.207.8.46 2 89.28.124.238 2 89.105.246.13 2 87.70.43.111 2 77.87.32.102 2 77.70.41.8 2 77.125.105.161 2 68.223.157.219 2 58.8.220.166 2 58.8.154.111 2 46.118.42.41 2 46.118.229.58 2 31.184.236.44 2 31.184.236.31 2 221.206.40.125 2 209.226.31.161 2 208.76.55.91 2 195.218.182.253 2 184.106.170.252 2 178.210.32.201 2 178.168.47.154 2 119.93.74.238 2 109.230.246.238 2 109.230.244.129 2 109.186.24.10 1 98.254.20.226 1 98.220.58.252 1 95.79.39.180 1 95.78.65.113 1 95.69.216.91 1 95.220.216.191 1 95.168.183.233 1 95.135.16.67 1 95.133.71.169 1 95.133.207.140 1 95.132.98.135 1 94.232.65.104 1 94.23.248.199 1 94.190.47.108 1 94.179.167.19 1 94.179.148.219 1 94.141.37.123 1 93.182.133.94 1 93.166.121.107 1 93.157.169.18 1 93.127.27.110 1 92.60.232.11 1 92.39.76.212 1 91.224.160.132 1 91.214.186.131 1 91.210.104.246 1 89.208.32.87 1 89.139.10.153 1 88.196.166.18 1 88.190.26.16 1 87.69.95.86 1 87.68.52.89 1 87.249.3.2 1 85.122.23.124 1 84.251.45.75 1 83.21.213.143 1 83.149.44.243 1 83.139.165.126 1 80.98.175.191 1 80.87.145.14 1 80.240.203.100 1 79.133.140.123 1 77.92.233.198 1 72.64.185.93 1 72.46.131.108 1 71.241.146.196 1 71.188.61.102 1 71.184.168.232 1 69.181.42.19 1 68.41.239.107 1 67.169.121.228 1 65.78.173.203 1 61.90.31.61 1 59.58.154.44 1 58.8.116.16 1 58.8.100.13 1 50.56.95.138 1 46.251.237.188 1 46.21.144.176 1 46.17.96.12 1 46.146.95.26 1 41.190.16.17 1 23.19.39.197 1 222.165.130.214 1 221.7.159.224 1 218.92.8.165 1 218.24.196.122 1 217.77.222.158 1 217.196.164.35 1 216.24.192.168 1 213.87.136.220 1 212.87.241.135 1 207.204.243.16 1 204.145.80.57 1 203.148.95.71 1 202.181.176.3 1 195.191.55.204 1 195.190.13.54 1 194.11.24.156 1 193.105.210.113 1 192.251.226.206 1 188.92.76.221 1 188.27.105.149 1 188.26.145.208 1 188.233.18.255 1 188.232.72.141 1 188.163.64.194 1 188.143.233.14 1 188.143.233.111 1 188.143.232.164 1 188.143.232.157 1 188.143.232.109 1 188.134.30.212 1 187.76.192.186 1 184.171.170.75 1 184.107.41.143 1 183.16.116.7 1 178.162.155.241 1 178.137.17.213 1 178.122.42.126 1 178.122.40.252 1 178.121.103.57 1 175.42.82.224 1 174.36.42.78 1 174.142.19.206 1 173.234.229.179 1 173.0.59.196 1 141.105.65.153 1 125.39.93.39 1 125.120.185.56 1 125.109.198.150 1 123.121.216.142 1 122.193.26.244 1 121.54.84.26 1 118.97.164.78 1 117.41.235.212 1 116.252.185.10 1 115.87.242.24 1 112.111.184.192 1 109.95.196.34 1 109.87.152.237 1 109.230.251.99 1 109.230.251.228 1 109.230.251.184 1 109.230.251.121 1 109.230.244.111 1 109.230.217.104 1 109.230.216.123 1 109.172.78.18 1 108.48.26.155
The log messages look like this:
Aug 6 23:09:06 gelt FreeBSDDiary[43547]: captcha failure: user='Marina' IP='95.65.75.160' email='marinapetrova08@gmail.com' Aug 6 23:21:18 gelt FreeBSDDiary[43870]: captcha failure: user='MigoneWoorope' IP='173.242.116.186' email='jeffer.s.o.n.v.3.v@gmail.com' Aug 6 23:50:19 gelt FreeBSDDiary[47046]: captcha failure: user='apelealcoxy' IP='109.230.251.74' email='southfan@southpark.jatsu.pl'
That is 190 distinct IP addresses. Well, I don’t mind. I’ll rediret them all… Or perhaps just
the top 10 offenders:
$ bunzip2 -c /var/log/messages.* | grep 'captcha failure' | sed "s/.*IP=\'\(.*\)\' email.*/\1/g" | sort | uniq -c | sort -r | head -10 | awk '{print $2}' | sort 188.92.77.196 190.202.87.131 193.105.210.64 195.162.68.141 199.15.234.226 68.185.116.91 77.65.48.239 89.76.212.140 95.64.12.21 95.65.75.160
Hmmm, now I’ll add those IP addresses to my virtual host definition. But first, some help:
$ bunzip2 -c /var/log/messages.* | grep 'captcha failure' | sed "s/.*IP=\'\(.*\)\' email.*/\1/g" | sort | uniq -c | sort -r | head -10 | awk '{print "RewriteCond %{REMOTE_ADDR} " $2 " [OR]"}' | sort RewriteCond %{REMOTE_ADDR} 188.92.77.196 [OR] RewriteCond %{REMOTE_ADDR} 190.202.87.131 [OR] RewriteCond %{REMOTE_ADDR} 193.105.210.64 [OR] RewriteCond %{REMOTE_ADDR} 195.162.68.141 [OR] RewriteCond %{REMOTE_ADDR} 199.15.234.226 [OR] RewriteCond %{REMOTE_ADDR} 68.185.116.91 [OR] RewriteCond %{REMOTE_ADDR} 77.65.48.239 [OR] RewriteCond %{REMOTE_ADDR} 89.76.212.140 [OR] RewriteCond %{REMOTE_ADDR} 95.64.12.21 [OR] RewriteCond %{REMOTE_ADDR} 95.65.75.160 [OR]
Using that, I redirect those IP addresses to another URL, where they cannot login or register.
Cool, looks like my parsing stuff I do on apache files when
people try (or succeed) to ddos apache.
Wouldn’t it be cool when people like you collecting data
of offenders could forward those found IP’s to a sort of collecting station
on the internet, where the IP’s would be diminished in value.
In an easy way, like just pipe the found IP’s to a locally running daemon.