Yahoo! Labs

June 27th, 2009

the site, is now online.

Labs cover efforts across Yahoo!, with a theme of both core and applied research:

Yahoo! Labs is pioneering the new sciences underlying the Web. As the center of scientific excellence here at Yahoo!, we deliver both fundamental and applied scientific leadership, publish research and create new technologies that power Yahoo!’s products.

We’re responsible for big inventions—and our goals are nothing short of inventing the future of the Internet and creating the next generation of businesses for Yahoo!.

The labs is head by Prabhakar Raghavan, with a leadership across verticals. The publications page is a good place to start.


WWW2009 Accepted Papers

February 11th, 2009

Abstracts of all accepted papers is now online [PDF only - why?].

Cursory: Some interesting papers I am looking forward to:

  • Ossama Abdelhamid, Behshad Behzadi, Stefan Christoph and Monika Henzinger. Detecting The Origin Of Text Segments Efficiently
  • Yue Lu, ChengXiang Zhai and Neel Sundaresan. Rated Aspect Summarization of Short Comments
  • Ziv Bar-Yossef and Maxim Gurevich. Estimating the ImpressionRank of Web Pages
  • jong wook kim, K. Selcuk Candan and Junichi Tatemura. Efficient Overlap and Content Reuse Detection in Blogs and Online News Articles
  • Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra and Alexandros Ntoulas. Releasing Search Queries and Clicks Privately
  • Fan Guo, Chao Liu, Tom Minka, Yi-Min Wang and Christos Faloutsos. Click Chain Model in Web Search
  • Xing Yi, Hema Raghavan and Chris Leggetter. Discover Users’ Specific Geo Intention in Web Search
  • Paul Bennett, Max Chickering and Anton Mityagin. Learning Consensus Opinion:Mining Data from a Labeling Game
  • Jérôme Kunegis, Andreas Lommatzsch and Christian Bauckhage. The Slashdot Zoo:  Mining a Social Network with Negative Edges
  • Meeyoung Cha, Alan Mislove and Krishna Gummadi. A Measurement-driven Analysis of Information Propagation in the Flickr Social Network
  • Wen-Yen Chen, Jon-Chyuan Chu, Junyi Luan, Hongjie Bai and Edward Chang. Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior

Popular Photographers on Flickr

February 3rd, 2009

Check out my first version, based loosely on PageRank.


Yahoo! Key Scientific Challenges Program

December 20th, 2008

Yahoo! Labs, the organization I am part of, has just announced the Key Scientific Challenges Program.

This is your chance to get an inside look at the big challenges Yahoo! research scientists are working on while driving your research forward. Learn more about the real-world problems facing our industry, then focus on and solve these fundamental challenges alongside the top minds in the field.

As a part of the Yahoo! Key Scientific Challenges Program, you’ll receive $5,000 seed funding, exclusive access to Yahoo! research scientists and select datasets, and an invite to the Key Scientific Challenges Graduate Student Summit.

Having recently completed a Ph.D. myself, I can strongly attest to the value of this program, and to the practicality of challenges listed as Search Challenges, among others. If this is you, and interested in actively contributing to important scientific solutions, definitely consider participating.


Twitter Venn: Search Visualization

December 20th, 2008

Apps like these are why information retrieval, search, and social media continue to fascinate researchers, and users.

Jeff Clark, has created an effective search perspective for twitter, which he calls Twitter-Venn. The tool, uses Venn diagrams as the underlying visualization and:

..supports investigation into the relationship between how words are used within the messages of all the people using Twitter.

And, as Jeff humorously notes:

In the context of tweets that mention ‘Christmas’ the santa to jesus ratio is about 4:1 .

Before uncloaking to a wider audience, would be nicer to, at the minimum:

  • improve display, and refresh performance
  • enable easy screen-capture

I definitely see this turning into a very effective trend mining tool. Impressive. (via Matt)


proto.in: Startups in India

November 25th, 2008

I have been looking around for startup communities within India for quite a while now, without much luck. Web searches returned results that didn’t really refer me to what I wanted. This content vertical (within India) is rather young, and the community is yet to promote their best of the best online sources. So finally, I relied on old school word-of-mouth (thanks Girish!), which landed me at proto.in.

Proto.in is driven by this underlying philosophy:

Proto.in is the startup event that happens in India, inviting entrepreneurs from within and around the subcontinent to participate, share, discuss and draw strength from the growing entrepreneurial demand and knowledge base that is created, in an effort to create world-class product leaders from the region. Proto is about celebrating logoentrepreneurship, and encouraging it where it matters the most - at the startup level!

1. To Showcase Innovative technology products borne out of India
2. To Encourage, grow and create entrepreneurial awareness
3. To create a community of startup entrepreneurs, who can grow in strength and numbers, drawing wisdom from each other.
4. To act as a bridge between well-established companies, veteran entrepreneurs, venture capitalists, analysts, journalists, professionals and grass-root entrepreneurs.

 

..and is run by Vijay Anand, with the help of a large, and growing volunteer base from all around India. The format is similar to BarCamp, DemoCamp (one of which I attended), and in some sense, to the HackDay’s hosted here at Yahoo!

Navigating through the site, the two most useful sources of information include their blog, and the hosted event pages (2007-1, 2007-2, 2008-1). Much of this, however, get hidden within the main site, including the highly valuable company profiles, which I am yet to deep-dive into.

The next startup event is to be hosted at Bangalore (my hometown!), in January 2009, with nominations closing Dec 10, 2008.

 

proto_bg

The charter of the community is highly encouraging. Having made some key initial contributions, they promise to continue being an important online (and offline) hub for startups in India.


Siri

October 22nd, 2008

Siri might just turn out to be a perfectly timed AI startup. Via hchen1.

Siri is a new Silicon Valley start-up that attempts to change to the way people use the internet. I joined Siri in Sept. 2008, but I was unable to talk about it until this week. Siri (previously known as stealth-company.com) is an SRI spin-off company armed with $8.5M VC funding. The company inherits  technology innovations resulted from many years of AI research (e.g., the DARPA-funded CALO project).

I am quite excited by a recent sneak preview.


HYPERTEXT 2009

October 12th, 2008

HYPERTEXT 2009, will be held at Torino, Italy between June 29th and July 1st next year. Perhaps, the evolution of HYPERTEXT conference reflects the increasing scope and influence of the Web over the past decade.

turin3-final

The Web, the Semantic Web, the Web 2.0, and Social Networks are all manifestations of the success of the link. The Hypertext Conference provides the forum for all research concerning links: their semantics, their presentation, the applications they have been put to, the knowledge that can be derived from their analysis, and their effect on society.

Main themes in HYPERTEXT 1996 included:

  • Spatial Hypertexts
  • Autonomous Hypertext Systems and Link Discovery
  • Hypertext Rhetoric and Criticism
  • Models of Hypermedia Design and Evaluation
  • Open Hypermedia
  • Navigation in the World-Wide Web
  • Systems and Infrastructure
  • Extending the World-Wide Web

With many of the above questions, now answered, researchers are moving towards the more "social aspects". HYPERTEXT 2008 themes included:

  • Information linking: new models and techniques for interacting with information, automating the "trailblazer"
  • Social linking: link inference, analysis and modeling, similarity and retrieval, applications
  • Hypertext, culture, and communication
  • Applications of hypertext

Submission deadline is February 2009. If you are interested in this area, please consider participating.


Political Streams

October 10th, 2008

Political Streams from LiveLabs, is now open to business. From the FAQ,

Political Streams is an application which mines social media content in real time for political discussion. It surfaces the news articles and documents that are being discussed as well as the people and places that appear in those articles. In addition, it provides related information for any news article, weblog post, person or place. This related information gives a broader context, allowing the user to understand how both the mainstream and social media are discussing an issue, person or place.  

It is this last part that makes this tool interesting.

trend

Very impressive and promising start. Clearly designed to "scale across verticals", this is a result of work by some very smart researchers at Live Labs. I look forward to many of these "yet to be uncovered" verticals. A "200 OK" from social streams is keenly awaited.


Crawling Blogs

October 9th, 2008

Through a period when my blog was updated only once, this is how Feedburner viewed bots.

blogcrawler

Note that crawling blogs is an interesting problem:

  • Recency is critical
  • Ping servers are available, albeit with incomplete coverage

Crawling blogs is also highly resource intensive:

  • Network latency
  • Disk access/write latency

How do your numbers look?