Google blocking searches from unknown browsers

Google blocking searches from unknown browsers

Google appears to be redirecting all
search queries submitted from non-recognized broswers to a 403-Forbidden page. I noticed the problem
tonight when using the links
browser. Similar queries from the same IP with
Opera, IE, or lynx succeed. Search attempts by other links users
from other IPs resulted in the same message. Additional tests showed that fetch and wget also failed.

The 403-Forbidden message refers you to their terms and conditions.
I’m sure Google
are trying to block automated queries, which do not comply with their
T+C. I’ve written to them regarding this. I am also confident
this problem will be fixed soon and that they were not targeting
any particular tool directly.

The original page (with my IP address removed) is available at
google-links-forbidden-original.php.

You can get around the problem by changing the referrer field via Setup->Network Options->HTTP options->Fake User-Agent. I
used Opera/7.10 (UNIX; U) [en] and was able to get past their blockage.

For wget, you can do something like:

wget --user-agent="Opera/7.10 (UNIX; U) [en]" http://www.google.com/search?hl=en\&ie=ISO-8859-1\&q=things\&btnG=
Google+Search

And for fetch, you can set the HTTP_USER_AGENT environment variable:

export HTTP_USER_AGENT="Opera/7.10 (UNIX; U) [en]"

It’s not hard to set user agents for most tools. This will break some things for a while, but hopefully most tools
can be adjusted.

Fixed! (added on 13 May 2003)

It’s been fixed. For links at least. You no longer need a referrer entry.

13 thoughts on “Google blocking searches from unknown browsers”

  1. Sounds about right for them. I wonder if its got anything to do with the IRC GoogleBot TCL’s. :S hope they fix it soon because its annoying having to open up a remote opera to quickly search for something on a browser on shell

    chaz

    1. John Meredith

      Google have been doing this for a little while – I think to cut down on automated search queries from it’s database ie. perl scripts etc. Saying that however, it is simple to change the browser identification string and continue as normal.

      John

      1. John Meredith wrote:
        >
        > Google have been doing this for a little while

        A very little while. I’ve often used links in the past with Google.

        > I
        > think to cut down on automated search queries from it’s
        > database ie. perl scripts etc.

        Agreed.

        > Saying that however, it is
        > simple to change the browser identification string and
        > continue as normal.

        As noted in the article.

    2. chaz wrote:
      >
      > Sounds about right for them. I wonder if its got
      > anything to do with the IRC GoogleBot TCL’s. :S

      What is that?

      > hope they fix
      > it soon because its annoying having to open up a remote opera
      > to quickly search for something on a browser on shell

      Pardon?

    3. Author: Dan Langille (—.unixathome.org)
      Date: 2003-05-11 05:48

      Dan wrote:
      >
      > Sounds about right for them. I wonder if its got
      > anything to do with the IRC GoogleBot TCL’s. :S

      >What is that?

      Eggdrop has a tcl which can reference the google database by a trigger like !google blah, and im assuming it has an issue with the browser identity.

      > hope they fix
      > it soon because its annoying having to open up a remote opera
      > to quickly search for something on a browser on shell

      >Pardon?

      Im talking about when i have to use VNC to search for something due to internet restrictions on the local machine at my college.. IE. it has most search engines blocked due to "pornography searches" – as u can tell, my administrator is slightly … "lost in space". I need to use VNC onto my remote unix box just to be able to surf when im in college…. i used to just use lynx to do most of the browsing because it was simpler.

    4. Cristian Burneci

      The campaign is specifically targeted against links and wget, which
      can "dump" the content of a remote page into a text file. (Should
      this be a starting poing for performing automated queries?)
      Anyway note that links 2.x can’t do this anymore, so banning this
      browser is hilarious.

    5. Sniffy McNickles

      >The campaign is specifically targeted against
      >links and wget, which can "dump" the content
      >of a remote page into a text file.

      What are you talking about? This statement
      makes no sense.

      Any browser can save the contents of a page
      to a text file.

      If you’re trying to say they’re targeting automated
      tools, you may be right, although blocking on UA is a
      silly way to do it.

      Much better would be to throttle repetitive looking
      requests, which is pretty easy to do.

      My guess is they’re being annoyed by something specific
      which happens not to set the UA, and this is a stopgap
      until something else is in place.

    6. I can’t see why you would want to use a non-interactive browser for any other reason other than for violating their TOS. Google provide a SOAP API, which I have used quite successfully for programmatically searching. They even provide excellent documentation and sample scripts.

      HTH

    7. Eli the Bearded

      I noticed the same thing happening with a page download tool I wrote (bget, available at CPAN in the scripts section). When I used a browser emulation I could get access.

      As for mjl, I was doing it because I wanted to save an article I found in Google Groups in the same place that I save all my other news posts. So I copied the URL to the view original format link and tried to fetch the page.

      By the way, when I did it the forbidden message I got had just a
      simple base64 encoded block in the ‘code below’ section, but the one here is doublely base64 encoded.

    8. [1] What’s a TCL?
      [2] Yahoo is not as good as Google 😉 but they have improved
      [3] Sniffy McNickels makes a great point – "Much better would be to throttle repetitive looking requests, which is pretty easy to do." Could you provide a URL which explains how this is done and at what level (e.g. as a daemon? in hardware…?)
      [4] Most browsers and spiders allow the user to spoof the UA (UserAgent); What is this coming to? A fixed browser ID? As trustable as an IP? 🙂
      [5] What is mjl?
      [6] To ‘chaz’ with the college sysadmin who has "search engines blocked due to ‘pornography searches’": That sysadmin needs to be fired and expelled. This makes about as much sense as closing down a city because a criminal lives within its boundaries!
      [7] Is there a good write up (URL) about the Google SOAP API and what can be done using it?
      [8] Sniffy McNickels is incorrect in his/its argument: "Any browser can save the contents of a page to a text file." There’s more to the story! wget is non-interactive, whereas most browsers require clicking or scheduling through a GUI. Also, wget can have several instances run and acts in a more linear and consecutive, "robotic" manner than PACU’s (point and click users’) requests of an HTTP or FTP site.
      [9] There is no nine. Please email me if this thread changes, it is hostmaster then an at symbol then Video2Video is the dot com domain. Thanks.

Leave a Comment

Scroll to Top