Archive for the 'socialmedia' Category

ICWSM 2009

Wednesday, September 24th, 2008

CFP now open. This is an excellent event, in its third year and hosted right here in San Jose.

The social and community driven aspects of our digital lives continue to rapidly increase, resulting in transformative behaviours and, significantly, publishing and distributing huge amounts of fascinating data. The International Conference on Weblogs and Social Media will meet once more in 2009 to discuss the latest research analyzing and leveraging this resource. As with previous meetings, we will bring together a wide range of researchers and industry practitioners from many disciplines providing a unique opportunity for sharing ideas and collaboration in this space.

John Kleinberg is one of the invited speakers. I wasn’t aware of the "Rebel King" anagram:

Prof. Kleinberg needs no introduction, but how so apt that the above piece of fascinating data is courtesy *social media*.

The Numerati

Wednesday, September 17th, 2008

Stephen Baker’s Take on Life and Technology.

I recently came across Stephen Baker’s book via the Wall Street Review. This note on splogs got me interested:

A splog, though unreadable, is seeded with words that will attract Google ads. A computer-user may be annoyed at finding himself staring at a screen full of gibberish but click on an ad anyway, allowing the robot blogger to harvest revenue. This sleight of hand has the Numerati hard at work getting their software to distinguish between a blog and a splog. Mr. Baker gives a helpful sketch of the math involved, each blog reduced to a vector in a space of several dozen dimensions.

The problem of splogs, is one case study, through which Baker shares the positive side of the "Numeratis". So what/who is a Numerati anyway? ED-AI225_book09_DV_20080914184328 According to Baker:

They’re members of a global elite, and are busy analyzing our every move. They’re rummaging through mountains of data, looking for patterns of our behavior so that they can predict what we might want to buy, who we’re likely to vote for, what job we’d do better than our colleagues. Some are even matching us with potential lovers…

Baker, through his book, uncovers the "numerati cult", who they are, the positives, negatives, and the unknown. Overall, his attempt is to share what these Numeratis mean to, well, a non-Numerati. Yahoo!, Google, and IBM appear to feature prominently, so do many Numeratis.

Elsewhere, both positives and negatives highlighted:

The Corner Office:

The "Numerati" are an evolving class of quant-humping, algorithm experts who will be playing an enormous role in shaping our society, our economy and our lives. They are the types who founded Google and Yahoo but they are going beyond simple searching to manipulating and massaging the tremendous mass of data that we generate from Web clicks and cell phones.

Sentimine:

I have already ordered my copy…How could I resist when we’re mining the blogoisphere for sentiment and about to test our own home-grown splog detector?

Bacon Rebellion:

…“The Numerati,” a class of math experts who quietly orchestrate the massaging of the zillions of bits of data about us. We generate the stuff every time we use our cell phones or search Google, use a grocery loyalty card or whisk through a toll booth using a Smarttag.

ThinkOR:

I think it is great that operations research is getting some publicity with The Numerati. However, there can be such a thing as a bad publicity. Is it just me or does it seem to everybody (OR folks) that this book is casting us in a rather negative light?

Michael Trick:

This is a book primarily about what I would call data mining and clustering, so there are wide swathes of the “numerati” field that are not covered.  But for a popular look on how our mathematics is used to characterize and predict human behavior, The Numerati is an extremely interesting book.

I hope to see this book influence, and promote the positives. The target audience are the non-Numerati’s. But still, this has piqued my curiosity, ordered.

LinkedIn Hacked? No, Just Down

Saturday, September 6th, 2008

Just noticed LinkedIn is down. Downtimes remind us how important these social sites have grown to be. Lloyd Taylor, LinkedIn’s VP of Technical Operations clarifies.

Update: Site Now Carries this message:

pic_li_wizard_411x389.gifLinkedIn is currently unavailable while we make upgrades to improve our service to you. We’ll return around 12:00am (PT) September 7th, 2008.

We apologize for the inconvenience and appreciate your patience. Thank you for using LinkedIn!

Wikimatix: Engagement around Buzzy Keywords

Saturday, August 16th, 2008

Wikimatix is a uber-cool app developed by Akshay Java, from UMBC. The tool mashes up buzzy keywords, with Wikipedia to bootstrap conservations. Go check it out.

Btw, Akshay Java, and UMBC sounds very familiar. I wonder why.

40% Japanese Blogs are Spam

Sunday, August 3rd, 2008

Adam Richards (MutantFrog) points to a report from CNET that 40% of blogs hosted on the popular platform "Nifty" is spam.

Japanese web portal Nifty has announced findings that a full 40% of Japanese blogs are set up as nothing but ad platforms to suck up clicks and affiliate bonuses.

.. A Nifty-affiliated research body randomly sampled 100,000 blog entries per month using the filter between October 2007 and February 2008. Over the five-month period it was determined that 40% of domestic blogs are spam blogs.

A translation of the article reveals that the same technique used in identifying spam in their samples, will be used by Japanese blog analytics services named BuzzPulse and BuzzSeeQer.  Note that last we checked, the English blogosphere has comparable, if not more number of splogs. Our study, however, looked at newly created blogs, as opposed to the Nifty study on a sample of all currently hosted blogs.

Note: If researchers behind the Nifty spam filter are reading this, I’d be interested in following up.

SIGIR 2008 Keynote: Delighting Chinese Users, The Google China Experience

Monday, July 28th, 2008

Kai-Fu Lee, from Google China presented the keynote at SIGIR. The theme was centered around — “Can a multinational Internet company succeed in China?”. Many of the presented ideas are applicable to the Indian market as well. kaifu

Internet in China, now:

  • Largest Internet market, with over 250 million users (200 million in the US)
  • Broadband penetration is 86%, mainly through ADSL
  • Internet cafes quite popular even in small townships

Google’s strategy over the last three years:

  • Starting 2006, learn about Chinese users
  • 2007 and 2008, build the best Chinese search engine
  • Launch products designed by Chinese engineers

Hopefully, market share will follow. To show that this isn’t that simple, Kai-Fu presented a detailed analysis, as summarized through this table:

Attribute

US China

Implications

Alternatives to Internet Plenty None High dependence and trust on Internet information
New Users 3% 35% Embrace new users to gain market share
Internet-Cafe Users <1% 33% First time user sees Internet at IM/gaming/music
Average Age 45 25 Entertainment & Communities
English skills High Low Highest quality content is still in English; its phenomenal in the medical domain
Internet Music iTunes, iPod Baidu, PC Monetizing difficult
Piracy 21% 96% Ready for high quality free-software as service
Credit Cards 2/person 0.02/person Local e-commerce more applicable
Cell Phones 190 million 600 million Mobile Internet users have never touched a PC before (India in a similar situation)

Using this background, Kai-Fu shared a few additional characteristics of Chinese users.

On new users:

  • Internet is more an entertainment source, less an informational source
  • Since users pay by the hour in Internet-Cafes, they love the concept of directories. In one user study, he mentioned a user questioning Google’s design with the question — “Did you forget to design rest of the page?”
  • More of an emphasis to Universal Search
  • Winning on metrics did not reflect on increase in market share

On users in general:

  • Time spent on a result page is 10 seconds in the US. Kai-fu suggested it’s much higher in China, about 30-60 seconds. Most of these users are supposedly exploratory and curious, and click on more search results. Many of the queries are also informational, rather than navigational. To support this behavior Google used window pop-ups on search result click.
  • To support informational needs of users, Google has turned their zeitgeist into a directory product in China.
  • To alleviate the problem of misspellings, Google is branding “g.cn” as their official domain. Chinese users, apparently have problems spelling even “Google”. Query suggestions are also turned on by default.
  • Much of the useful content on the Web is in English, mostly in the health verticals. To make such content available to Chinese users, Google is aggressively pursuing the content translation problem. Kai-Fu hinted that as the Web matures, this could be an important differentiation across search engines.
  • To be more relevant to Chinese users, Google has developed new applications. One such application is “SMS Greeting” search. Kai-Fu revealed that in the Chinese culture SMS messages are replacing “greeting cards”. China supposedly has one of the highest SMS traffic around the New Year. Such niche applications can be useful for overall branding.

Kai-Fu finally ended by sharing that many of these directions helped Google increase market share from 14% in 2006 to about 25% in 2008 within China. We will have to wait to see if multinationals make further inroads in China. Time will tell.

Elsewhere on the keynote:

SIGIR 2008: Data Points from Organizers

Monday, July 21st, 2008

The organizers of SIGIR 2008 shared numbers at the start of the conference. Highlights follow.

Healthy registration numbers, similar to those seen in 2007.

Event 2008 2007
Conference 574 599
Tutorial 464 363
Workshops 292 287


Slightly lesser overall numbers understandable, given that Singapore is 10-20 hrs away from most of Europe and Americas. Tutorial numbers high, thanks to a “package” registration initiated this year — an idea that could be adopted by other conferences as well. A quarter of the attendees are from the US. 19 are from India, not bad, but not high enough either.

In total, 85 full papers were accepted for an overall acceptance rate of 17%. The conference also features 99 posters (acceptance rate 50%), and 11 demonstrations (acceptance rate 58%).

The Program Committee shared some interesting numbers on acceptance rate by geography.

Geo Submitted Accepted
North America 159 41
EU 104 15
Pacific Rim (except US) 186 24
India 31 1


North America has a very high hit-rate. Acceptance rate for submissions from India needs much improvement, but its good to see 31 papers submitted, a good sign.

Lastly, the organization committee shared that they convinced the Singapore Government to host the National food festival
during SIGIR. Imagine something similar in other parts of the world!

SIGIR 2008

Sunday, July 20th, 2008

I am now at the SIGIR conference in Singapore. Haven’t seen much user generated content around it though. Here’s a short (to be updated) list:

If you are around, feel free to ping.

ICWSM 2008 Videos Online

Wednesday, May 28th, 2008

ICWSM 2008 promised to be a great event, and from what I hear, went well above expectations. Select videos are now available online, courtesy videolectures.

Closing remarks from Eytan:

  • 180 attendees
  • 103 submissions, 23% acceptance rate
  • 946 cups (43 gallons) of coffee ;) — in the coffee (consumption) capital of the world
  • Wikipedian Self-Governance in Action: Motivating the Policy Lens (video)
  • 2009, 2010 plans well underway
  • More data: spinn3r (Kevin Burton) , Boardtracker (Ron Kass), TREC
  • Eytan plans to be less active within the community next year. His contributions will be missed.

Within just two years of its inception, ICWSM has turned into the most prominent cross-disciplinary conference for social media.  Kudos to everyone behind this.

(via Matt)