Why Google Allows Target.com to Spam Results

by Greg Niland on December 10, 2009

Please Note – I did not want to expose Target’s flaws but I feel it is very unfair how they are being rewarded by Google at the expense of the mom & pop retailer stores.

If you have ever worked on improving a website’s ranking Google you know that there are rules you must follow.  If you break any of Google’s rules (either intentionally or accidentally) you run the risk of your website being penalized or possibly even banned from Google.  Since Google has a quasi monopoly over online search most people would never dream of doing anything that might attract the wrath of Google.  Target, one of the biggest retailers in the U.S. does not share that point of view.

Target.com is currently flooding the Google search results with millions of near identical error pages.

target-spam1

Why does this matter?  Because when you have a powerful site like Target.com and you start hanging millions of pages off of it you are bound to get some decent rankings regardless of how terrible your page is.  For example Target.com is currently ranking #1 for Exercise Bike Clearance.

target-spam2

Imagine if each page generates just one visitor each day.  We are talking millions of Google users being tricked into visiting Target.com.  Does this page really live up to Google’s rhetoric about delivering a good user experience?

target-spam3

Instead of Google removing these pages that are obviously error pages they instead are rewarding Target’s spam attempt with high rankings and online holiday traffic.  In case you think for one second that you should do this on your website – THINK TWICE.

Google’s algorithm gives preferential treatment to big brand websites.  Big brands have more links and more trustworthy websites referring to them.  That link popularity is quite powerful in the Google algorithm.  I am not even going to get into the quality signals that Google sees from the high level of toolbar usage coming from people visiting Target.com.  The Target.com domain has so much power in the Google algorithm they can bend the Google quality rules more than any small mom & pop website can.

Another thing to remember is that users searching on Google expect to see big brands in the search results.  If they don’t see them they think that Google is broken.  Google is in a difficult position.  They need to balance user expectation of seeing big brands in the serps while still controlling the big brands.

I just hope that something is done because Google users deserve higher quality serps &  smaller retailers deserve more equality when it comes to Google’s quality standards.

{ 1 trackback }

SearchCap: The Day In Search, December 14, 2009
December 14, 2009 at 3:20 pm

{ 50 comments… read them below or add one }

Jon Payne December 14, 2009 at 1:09 pm

I’m not sure you can automatically blame Target here for intentionally trying to spam. Their site is coded such that any search you put in there will spur a unique URL, with whatever your search query was in the Title tag. For intstance, if you search on “your mom is hot” you get:

http://www.target.com/gp/search/188-1977530-4602238?field-keywords=your+mom+is+hot&url=index%3Dtarget%26search-alias%3Dtgt-index

Now if you link to that page from a few pages on some third party site you can get it indexed.

In fact, we can have a little fun with this :) I’m going to see if I can get this page below to rank for “jon payne is so hot”.

http://www.target.com/gp/search/188-1977530-4602238?field-keywords=jon+payne+is+so+hot&url=index%3Dtarget%26search-alias%3Dtgt-index&ref=sr_bx_1_1&x=0&y=0

If someone wants to drop a link or two to that it would be much appreciated.

David December 14, 2009 at 8:34 pm

Thanks for the article, there are several leading travel and tourism companies who rely on this spam as you pointed out its extra traffic and even if a small percentage converts its icing on the cake.

The problem is that some of these trusted sites can easily use this advantage to fill the results with crap.

mevans05 December 15, 2009 at 6:55 am

I would think that the negative experience that users are having by landing on these pages is more damaging to Target than the benefit of the empty traffic they’re receiving. I’m sure Target isn’t meaning to spam, but they should address the issue.

@MartinSEM December 15, 2009 at 9:30 am

This is clearly blackhat. A *simple technique to gain more traffic and a poor attempt to drive sales.
So, why is Google indexing 404 pages? And, if Target has so many of them, why aren’t they getting penalized for it? Isn’t there a red flag going up at Google’s headquarters to check it out? Well for one, if the 404 page eventually takes the user to the actual product they were looking for (advertised from the indexed link) then it’s not a horrible user experience and perhaps that’s why Google is allowing it to get indexed.
Doesn’t make sense to me why Google would EVER index 404 pages… isn’t that like publishing a book about (Topic X) with no information/blank pages…

The 404 tactic Target and several travel companies are using IS taking advantage of a loophole not yet fixed in Google (imo). Similar to meta keyword stuffing, or placing white text on a white background, which no longer work. It’s just a matter of time that this loophole get’s fixed (I hope)…. and another one opens up *sigh*.
There’s a fine line between working ethically and making money online. Some companies weight one far greater than the other.

So, was this an honest mistake made by Target’s developers, or a marketing ploy to increase revenue and site traffic?

Andriy Moraru December 15, 2009 at 1:47 pm

MartinSEM, those aren’t 404 error pages they are just empty search results pages with OK (200) answer to GET request. For Google they are just common pages with some not very unique content but still very optimized for their search keywords. And that’s certainly not an intentional from Target’s side, but nevertheless they should close their search results from indexing (with robots.txt for example) and Google should keep its index clean of such results.

Shiju Alex December 15, 2009 at 11:30 pm

@Andriy Moraru – That was a good update and technically correct. But after reading through the entire thing, we cant wrong someone, if someone says that it is a 404 page :)

But could someone clarify about how these pages get into Google’s index? I guess there should be no natural links to such pages, of course unless someone writes a post like this :) .
Is it that Google bot randomly searches using the search form?
Any insights?

Doug December 16, 2009 at 5:38 am

If you think that’s bad, you should see what the shopping comparison engines allow Amazon, Amazon Marketplace and Zappo’s to do on their search results. They SPAM those search engines by loading their product file with each and every size as an individual product item. What does that do? Well if you sort by least expensive, and they are the least expensive for that item (XYZ Shoe for example), then they push everyone else off the page. Not only is this unfair to Mom-n-Pop stores, but it’s pretty lame on the part of the shopping comparison search engines (NexTag, Shoppingm Shopzilla, et al).

It is a fine example of how the rich get richer and the poor keep on struggling. Money is the driving force in this instance and “money talks.”

@MartinSEM December 16, 2009 at 7:30 am

My mistake, they are not 404 pages.
Still, seems odd these pages would get indexed. It shouldn’t be up to the developer to decide if these pages should or should not get indexed – Google should take the initiative here and not let this happen.

Shiju Alex December 16, 2009 at 9:27 pm

@Jon Payne
For “jon payne is so hot” with quotes, this article is the only result :)

@MartinSEM
But we can think that in essence, they are 404 – a page for something that could not be found.

Anyone got some idea about how they get to index all these result pages?

PPC Guru December 22, 2009 at 2:25 pm

It’s a simple fix for Target – to put a noindex tag on the search results pages.

For Google it’s a simple fix of considering it duplicate content, and just manually raising the dupe content filter.

Josh Driver December 23, 2009 at 12:48 am

@Shiju Alex
Somewhere on the internet they would have to be linked to. Maybe spamming by target, or their site gives these links as ads to search for a product.

Jbaker December 23, 2009 at 1:58 am

The pages to index are likely datamined from google toolbar usage. For example, someone somewhere once searched for exercise bike clearance on the target site, which caused their browser to request the URL in question. The google toolbar collects this info and sends it home perhaps as a source of urls to index,

Or, as someone else mentioned, some site somewhere may have linked to an existing sale that was going on but is gone, or worse, some site is spamming links on their site for some affiliate prog with target and that site is just autogenerating tons of nonsense links perhaps… Google indexes that spam page and in turn indexes all of the bogus target search urls.

As someone else has mentioned, it is possible for a vendor such ad target to put ‘noindex’ tags in search result pages to keep google from indexing that page, but what is the incentive for a retailer to do that? In many cases google is more reliable at finding a piece of info on a site than the sites own nav or search, so why not let google index the whole catalog, right?

I tend to agree though, the no search results page should likely have the ‘noindex’ tag at target, so these nonsense urls will drop from google indexing overtime, but links such as a search result on target for ‘kitchenaid’ would stick around.

Kenton Varda December 23, 2009 at 2:00 am

This is obviously not intentional. If it were intentional, Target would be providing decent landing pages. For instance, Target actually sells exercise bikes. If they were intentionally spamming the term “exercise bike”, why on earth would they be doing it with an error page rather than provide an actual exercise bike page? That doesn’t make any sense.

As for Google, I think it’s a safe bet that they have zero interest in having these crappy results in their result list. There’s probably some sort of bug affecting this. Perhaps Target recently changed their site and, in so doing, broke a ton of links that were perfectly valid before? If so then my guess is that these will disappear after a short time, once the raking system catches up.

Never attribute to malice that which is better explained by incompetence.

Andy Chapman December 23, 2009 at 2:08 am

Interesting, shows how poor Google’s duplicate content detection is. I wonder if sites with a higher PR get more leniency on this kind of thing, as they are more “trusted”. Any SEM experts here care to comment?

James December 23, 2009 at 2:12 am

@Shiju Alex

No, they’re _NOT_ 404 pages. 404 = web server returns 404 error. They are 200 pages.

That’s like saying your sink is in essence a toilet

ivucica December 23, 2009 at 2:24 am

Consider that Google indexes URLs collected through other means. Once, we had a bug in our webapp where a friend modified his profile via GET request instead of ordinary POST. But that is not a problem. Problem is that Google picked up the URL and ‘indexed’ it, changing its profile page. Privileges were promptly adjusted! :-)
Thus, someone searching on target.com produces a unique URL, which is picked up via toolbar or so, annnd… :-)

Ben December 23, 2009 at 2:37 am

bing (ducks!) doesn’t seems to pick up these pages.

ok, not trolling here, but seriously – where are the linking pages that trigger this spam?

Dejan December 23, 2009 at 2:57 am

For all of those wondering how Google ended up following those links:

It’s enough to put them in sitemap.xml. Google will crawl through that file and index all pages in it instead of (or in addition to, I’m not completely sure) following links.

Andy Beard December 23, 2009 at 3:55 am

Seems to me they have changed the way they do their search queries as if you hit search again you come up with a very appropriate landing page.

It is a bug for Target not to have content on that page more than a bug to Google for still indexing the page.

It is also fair to point out that of those millions of pages indexed, all but 2000 are in Google’s supplemental index – if a mom & pop really wanted to rank for that term because it brought them conversions, it wouldn’t be very hard to do.

Josh Adams December 23, 2009 at 4:14 am
Daniel December 23, 2009 at 4:16 am

Suspicious? Google gets a slice of the cake too. Just guess how much revenue Google is making from the adsense traffic on target.com’s “can’t be found” pages. Same black text style, meh.

JimD December 23, 2009 at 4:44 am

This isnt the fault of target. Its a Google issue. I’m sorry but there are a lot of sites out there that can return a “term cannot be found error”. The question is why Google is allowing these error pages to be indexed. They obviously filter these type of results out for hundreds of other sites that act the same way. Talk about freaking out over nothing.

Digitivity | Digital Productivity December 23, 2009 at 5:44 am

Jon Payne, the problem is that while small sites struggle and struggle to get their legitimate pages indexed all the while in fear of the Google juggernaut, Google has indexed 14.9 *million* auto-generated error pages for a big site like target.com.

Wendy December 23, 2009 at 6:19 am

The same reason Google allows these sites to spam.

Surchur.com
Filechaos.com
Jumptags.com

Just search any google trends term and these websites show up on the first page. They add “tag” pages with no content and they rank high.

Also Askville.com from Amazon is spamming every single google trends keyword. When you click on the result, you get to see all the page’s content is added from other websites (they copy it) by a script.

Don’t know why google is allowing all this google trends spam.

Communibus Locis December 23, 2009 at 7:48 am

This is an old trick and I wish I could remember the name for it but check THIS part of the trick out!!

http://www.target.com/gp/search/183-6976899-0507622?field-keywords=http%3A%2F%2Fwww.theminorityreport.org&url=index%3Dtarget%26search-alias%3Dtgt-index

Link juice for all! Huzzah!!

Communibus Locis

Justin December 23, 2009 at 7:59 am

This is black hat SEO 101. If mom and pop did this, they could easily be banned. Great post!

Jeff December 23, 2009 at 8:13 am

Google does index HTML forms. http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html
If you begin to type in your search for “exercise bike” at Google, by the time you reach “exer” Google has already suggested “exercise bike clearance” as being popular searches for what you are typing. Based on that information, it is possible that Google would use those terms for a shopping site such as Target to fill in a search form and index the resulting page. While there are additional factors as to the page being listed in the top results, this shows that Google will enter information into a site search.

Barry Hunter December 23, 2009 at 8:29 am

Nobody seems to have noted

http://www.mattcutts.com/blog/empty-review-sites/

which was Google looking for feedback on this type of issue. Given time I imagine this one we be sorted too.

unity100 December 23, 2009 at 8:50 am

this is just like the link directory issue. this ‘dummy search page no results result with keyword titled as bait’ thing is not only exclusive to target. a lot of online sites exploit that. google needs to come up with a way to penalize them out of the sky.

David Beard December 23, 2009 at 9:23 am

Great article and I’ve noticed it as well.

It’s Black Hat, Unethical and should be taken off. Google babbles of wanting to eliminate spam sites such as Target’s Ranking #1 for Exercise Bike Clearance because end users are reaching unwanted information, error pages and plain old junk which butchers the user experience. They jabber about how they are doing their best to eliminate spam such as Target’s Ranking #1 for Exercise Bike Clearance scenario and yet it doesn’t seem to be happening.

Could it be that Target’s Ranking #1 for Exercise Bike Clearance is not as important as all other spam websites and their intentions? It’s a good argument, but Google does inform us that spam is spam, just like Target’s Ranking #1 for Exercise Bike Clearance.

What’s next? Walmart, Kohls, Costco, and other big name companies with error pages like Target’s Ranking #1 for Exercise Bike Clearance?

I’m embarrased for Google, to say one thing then show another.

Eric December 23, 2009 at 9:34 am

It’s very interesting that BING doesn’t suffer the same problem… Google’s arrogance is about to catch up with them.

SB December 23, 2009 at 9:38 am

I think it hurts both Google and Target. I stopped using Google last year because I got tired of wading through pages of spam links and related time wasters such as pricegrabber, shopping@yahoo, dealtime, etc. until I found a legitimate match for my search, and I’ve never shopped at Target online because every time Target came up with a match for my search, it was a false lead. And I don’t think I’m very unique.

Eamon Nerbonne December 23, 2009 at 9:56 am

From some adhoc testing, it looks like people have the bad habit of linking to the search forms of not just target, but amazon, ebay, and google itself. So, datamining from the toolbar may be an issue, but it’s definitely not only that; there really are sites out there linking to this kind of page.

I suspect the real issue isn’t any kind of evil conspiracy, it’s just that people are too lazy to find “real” pages @target or amazon or whatnot, and just link to their search form instead. This particularly makes sense if the process is somewhat automated; imagine, for instance, a site which always links “keywords” to the appropriate target.com page – there’s no easy way for the site to determine the best url for that, so it just links to target’s search form – but, in doing so, it’s making a link that google and other bots follow and assume is a real page (after all, there’s again no easy way to see that the “no results” page is actually a 404 – it’s not, target’s web server is returning a normal 200 status code indicating a normal web page.)

There’s (almost certainly) no evil SEO conspiracy here, just a difficult technical issue – albeit one google should solve.

Matt Cutts December 23, 2009 at 10:11 am

Hey Greg, someone just pointed this post to me–thanks for mentioning it, and we’ll dig into it; our quality guidelines specifically mention search results in our search results.

Nithin December 23, 2009 at 10:24 am

These pages provide no value, this is a well known trick where new pages are created on the fly for searches. I doubt that this was an ‘honest” mistake on Target’s part.

In any case, Matt Cutts is on it: http://twitter.com/mattcutts/status/6970359594

Daemon_ZOGG December 23, 2009 at 10:36 am

Monopoly?.. YEP! Google is to search engines, what M$ is to operating systems. They act only in “their own” best interest. Although I’m really not too suprised by what Google and big companies are doing these days.. Thanks for outing that little factoid. ;)

KorbenDallas December 23, 2009 at 10:43 am

Apparently you have rather poor understanding of how the Target website works and how Google search engine works. The “error” pages your are referring to do not exist on Target web site. These pages are temporaries generated on-the-fly when you submit a search request to Target’s web site. They disappear entirely afterwards. These pages do not exist physically, they cannot and will not be indexed by Google and they cannot and will not appear in Google searches, unless one specific thing happens. That specific thing is when someone else creates a link to that [non-existing] Target search page on some other web page. The Target page still doesn’t exist, however it will be regenerated for you every time you visit the link (and, again, destroyed afterwards). When Google encounters such links on other web pages, Google does not understand that the link leads to a temporary generated-on-the-fly page, and includes that page into its search index. This is why you see them in your search results. It is certainly not Target’s fault. It is a consequence of third-party sites linking to Traget. It is a consequence of Google not being able to tell a real page from a temporary one.

Lisa Thayer December 23, 2009 at 11:20 am

I have long thought that Target is using black hat to dominate. I never click on the SERPs when Target comes up because is is very rare that it has anything to do with my search. ALWAYS click on their PPC ads. ;-)

Robert December 23, 2009 at 11:59 am

These are not ERROR pages. That’s like calling this an error page too: http://www.google.com/search?hl=en&q=TEST_1222211&aq=f&oq=&aqi= and then saying that is duplicate content because the structure of the page is the same as this one http://www.google.com/search?hl=en&q=PIZZA_25612&aq=f&oq=&aqi=

Most website search engines will display a “product not found” message when you search for something that isn’t found.

This isn’t black hat seo, or spamming the search engines.

What Target is doing wouldn’t even come close to hurting someone’s SEO/SEM campaign. A lot more goes into ranking than just creating a bunch of pages.

The reason Target ranks for the “emery shallow” phrase is because it is LINKED TO on this website: http://www.purevolume.com/emery

It counts as a backlink, you’ll notice that this website also links over to BestBuy and a few other places where you can purchase the CD. It appears that Target’s website no longer has the CD available, thus produces the search result saying it is not available.

Ryan December 23, 2009 at 12:23 pm

@KorbenDallas – not to troll but I suspect ALL pages on target.com are more or less dynamically generated. What makes a page temporary? You are explaining away target’s behavior when its appears likely intentional. If they wanted those pages out of serps they would just throw in a meta tag or disallow /s/. Clearly that isn’t happening.

Secondly how does anyone know whether or not an rss/xml feed of search terms isnt floating out there? 3.9 million search pages are indexed and by my count only 900k legit pages are indexed.

Results 1 – 10 of about 920,000 from target.com for -”target search results”

If you don’t believe that was intentional then I have some great land in florida for sale you should check out. Everyone does this that has the pagerank to spare:

http://www.google.com/search?hl=en&esrch=FT1&q=site:digg.com/search/&aq=f&oq=&aqi=

http://www.google.com/search?hl=en&esrch=FT1&q=site:docstoc.com/search/&aq=f&oq=&aqi=

Justin December 23, 2009 at 2:46 pm

Correct, those are not 404 pages. How ever, where and how are those search pages getting added to googles crawler? Somewhere, Target is leaving links to those empty search result pages. My guess is that recent searches are saved/stored someplace on Targets site and possibly even shown to users under a “Recent searches” area. I think and would hope Google would be smart enough to see that all but a few words differ on each of those pages and eliminate those pages as duplicated content pages.

Justin December 23, 2009 at 2:48 pm

KorbenDallas makes sense. Others may be responsible for the links getting added to Google’s crawler. However, why is Google not smart enough to remove the pages as duplicated content?

Bruce December 24, 2009 at 4:10 am

You can be sure that Target did not “spam” a search like “Jon Payne is so hot” into Google.

What looks like is happening is that Target’s product catalog isn’t searchable from the web except through Target’s own search, probably in order to enforce site security on the database. In order to get their results out in Google search and get customers, they must have a deal where Google will pass search terms to the target.com internal search engine and then format the result. The trick on Target’s side is that rather than returning an error page, they are returning a “user friendly” empty search results page.

This is a Google error interpreting the results they get back from Target, not Target spam. Now if Google starts to filter out results from target.com containing the string “We could not find matches…”, and Target changes their empty search page to use different text, then I’ll go along with the spam allegation.

Outtanames999 December 24, 2009 at 4:59 am

Those of you mentioning favoritism to Amazon will not be surprised by the fine print at the bottom of Target’s pages that says POWERED BY AMAZON.COM.

Robert December 24, 2009 at 9:32 am

These are not ERROR pages. That’s like calling this an error page too: http://www.google.com/search?hl=en&q=TEST_1222211&aq=f&oq=&aqi= and then saying that is duplicate content because the structure of the page is the same as this one http://www.google.com/search?hl=en&q=PIZZA_25612&aq=f&oq=&aqi=

Most website search engines will display a “product not found” message when you search for something that isn’t found.

This isn’t black hat seo, or spamming the search engines.

What Target is doing wouldn’t even come close to hurting someone’s SEO/SEM campaign. A lot more goes into ranking than just creating a bunch of pages.

The reason Target ranks for the “emery shallow” phrase is because it is LINKED TO on this website: http://www.purevolume.com/emery

It counts as a backlink, you’ll notice that this website also links over to BestBuy and a few other places where you can purchase the CD. It appears that Target’s website no longer has the CD available, thus produces the search result saying it is not available.

Sjan Evardsson December 26, 2009 at 12:28 pm

I think the error here is in how Target is treating its search results. Because they are relying on GET searches (which result in unique URLs) they could (should) close the loop by being RESTful in answer by returning an actual 404 (with a useful search page as the result body.) This way, the experience looks the same to the end user, unless that end user is a crawler, which will see the 404 header and know that it should not be indexed.

This is my opinion, yours may vary.

Joe Devon December 26, 2009 at 8:28 pm

I can understand why Target uses GET requests. If they use Akamai or a similar service, the GET allows them to cache the results.

Is it a 404?: “The server has not found anything matching the Request-URI”

Seems debatable to me. They are reaching the request URI which happens to be a search page which is informing people that the terms don’t CURRENTLY match any pages on the site. Maybe tomorrow those terms will match…

If the concern is Search Engines/Google indexing search pages aggressively, the solution is not a 404. It’s a meta tag: http://www.robotstxt.org/meta.html when there are no results.

In my opinion, search result pages shouldn’t appear in a search engine’s index at all. The resulting page should. Therefore there should be a robots.txt requesting search engines to not index ANY search pages… And the SE should reduce the weight of these results regardless…

IrishWonder January 3, 2010 at 8:59 am

I have seen lesser known brands do the same on their site – and seems like there is no way to fight it for as long as Google doesn’t do anything about it itself. Google’s webmaster guidelines, for example, clearly state that having search result pages of a site indexable is a no-no – yet if you see a site doing that and complain about it to G they don’t do anything (in my experience, they may even index yet more of such pages – not going to out anyone but tried this personally with one especially nasty site and got 0 result)

Shiju Alex January 8, 2010 at 2:24 am

That is an interesting discussion here. But I could follow it up only today.

I think that @Josh Driver @Jbaker @Jeff @KorbenDallas and @Justin have good points on how these pages get indexed.
Another way might be from the logs/reports of some server based analytics apps, which are displayed in html.

@James what I said was in the spirit of what @Sjan Evardsson mentioned. It is a page for something not found. Difference is that it was not found in the database (server). The found ones were also not on the web server but found in a db :)

I just made a fresh query on google for ‘Exercise Bike Clearance’ – http://www.google.com/search?q=Exercise+Bike+Clearance – and was not surprised. Blogs discussing this very issue has taken over the SERPs. Will a genuine product searcher prefer this scenario? Or would they have liked a ‘not found’ page from a shopping site?

Another interesting thing: the domain exercisebikeclearance.com was registered on 2009-12-23 and is 301ed to pedobearplush.com
Ever watchful webmasters!!!!

Cor January 25, 2010 at 8:34 pm

I can understand that larger retail organisations can receive a number of positions in the directory. It is all based on the turnover and interest surrounding the products.

Leave a Comment

Previous post: Understanding Search Engineers

Next post: Why Google is Creepier than a Nosy Neighbor