MSN’s auto-generated spam research

by Greg Niland on April 5, 2006

SEO By The Sea reported on this research paper (pdf) written by 3 of Microsoft’s researchers discussing their recent examination of similar pages.  They found that about 22% of the pages were identical.  I think this means that 22% of the MSN index and not necessarily Google’s index is identical. 

They also found that identical pages remain identical pages 10 weeks later.  Which makes sense since you would expect a spammer to only touch their auto-generated content once. 

Then it states that these findings could help MSN reduce the load of pages to crawl if they identify identical pages and stop indexing or reduce its priority.  I would agree, but I am not sure MSN has enough research to correctly identify the real site from a fake copy.

The weirdest thing from my pov is that they discuss the implications of this for PageRank.  Last time I checked PageRank was a Google thing and link popularity was the generic name.

Leave a Comment

Previous post: Upcoming Show – Dan Boberg covering YSM’s new optimization offerings

Next post: Doctors Appointment