Archive for the 'splogs' Category

Crawling Blogs

Thursday, October 9th, 2008

Through a period when my blog was updated only once, this is how Feedburner viewed bots.

blogcrawler

Note that crawling blogs is an interesting problem:

  • Recency is critical
  • Ping servers are available, albeit with incomplete coverage

Crawling blogs is also highly resource intensive:

  • Network latency
  • Disk access/write latency

How do your numbers look?

40% Japanese Blogs are Spam

Sunday, August 3rd, 2008

Adam Richards (MutantFrog) points to a report from CNET that 40% of blogs hosted on the popular platform "Nifty" is spam.

Japanese web portal Nifty has announced findings that a full 40% of Japanese blogs are set up as nothing but ad platforms to suck up clicks and affiliate bonuses.

.. A Nifty-affiliated research body randomly sampled 100,000 blog entries per month using the filter between October 2007 and February 2008. Over the five-month period it was determined that 40% of domestic blogs are spam blogs.

A translation of the article reveals that the same technique used in identifying spam in their samples, will be used by Japanese blog analytics services named BuzzPulse and BuzzSeeQer.  Note that last we checked, the English blogosphere has comparable, if not more number of splogs. Our study, however, looked at newly created blogs, as opposed to the Nifty study on a sample of all currently hosted blogs.

Note: If researchers behind the Nifty spam filter are reading this, I’d be interested in following up.