CIKM 2009 Accepted Papers
September 7th, 2009The CIKM organization recently released their accepted papers from 2009. The full list is here. There appears to be an increase in the number of papers from information retrieval, especially search ranking, and hence the regression problem.
Some papers I am looking forward to include:
- A Unified Relevance Model for Opinion Retrieval — Xuanjing Huang (Fudan University), W. Bruce Croft (University of Massachusetts Amherst)
- Mining Data Streams with Periodically Changing Distributions — Yingying Tao (University of Waterloo), M. Tamer Ozsu (University of Waterloo)
- Fast Shortest Path Distance Estimation in Large Networks [PDF] — Michalis Potamias (Boston university), Francesco Bonchi (Yahoo! Research), Carlos Castillo (Yahoo! Research), Aristides Gionis (Yahoo! Research)
- Adaptive Relevance Feedback in Information Retrieval — Yuanhua Lv (University of Illinois at Urbana-Champaign), ChengXiang Zhai (University of Illinois at Urbana-Champaign)
- Detecting Topic Evolution in Scientific Literature: How Can Citations Help? — Qi He (The Pennsylvania State University), Bi Chen (The Pennsylvania State University), Jian Pei (Simon Fraser University), Baojun Qiu (The Pennsylvania State University), Prasenjit Mitra (The Pennsylvania State University), Lee Giles (The Pennsylvania State University)
- What Happens after an Ad Click? Quantifying the Impact of Landing Pages in Web Advertising — Hila Becker (Columbia University), Andrei Broder (Yahoo! Research), Evgeniy Gabrilovich (Yahoo! Research), Vanja Josifovski (Yahoo! Research), Bo Pang (Yahoo! Research)
- Personalized Social Search Based on the User’s Social Network — David Carmel (IBM Research Lab in Haifa), Naama Zwerdling (IBM Research Lab in Haifa), Ido Guy (IBM Research Lab in Haifa), Shila Ofek-Koifman (IBM Research Lab in Haifa), Nadav Har’el (IBM Research Lab in Haifa), Inbal Ronen (IBM Research Lab in Haifa), Erel Uziel (IBM Research Lab in Haifa), Sivan Yogev (IBM Research Lab in Haifa), Sergey Chernov (Leibniz University)
- Characterizing and Predicting Search Engine Switching Behavior — Ryen W White (Microsoft Research), Susan T Dumais (Microsoft Research)
- Improvements That Don’t Add Up: Ad-Hoc Retrieval Results Since 1998 — Timothy G. Armstrong (The University of Melbourne), Alistair Moffat (The University of Melbourne), William Webber (The University of Melbourne), Justin Zobel (The University of Melbourne)
- Joint Sentiment/Topic Model for Sentiment Analysis — Chenghua Lin (University of Exeter), Yulan He (The Open University)
- Graph-based Transfer Learning — Jingrui He (MLD SCS CMU), Yan Liu (IBM Research), Richard Lawrence (IBM Research)
- Terminology Mining in Social Media — Magnus Sahlgren (SICS), Jussi Karlgren (SICS)
- Generating Comparative Summaries of Contradictory Opinions in Text — Hyun Duk Kim (University of Illinois at Urbana-Champaign), ChengXiang Zhai (University of Illinois at Urbana-Champaign)
- Practical Lessons of Data Mining at Yahoo! — Ye Chen (eBay Inc.), Dmitry Pavlov (Yandex Labs), Pavel Berkhin (eBay Inc.), Aparna Seetharaman (Yahoo! Inc.), Albert Meltzer (Yahoo! Inc.)
If you are an author of one of these papers, please share a PDF when available.
One of our papers, “Ensembles in Adversarial Classification for Spam”, was accepted as a short paper. The work was primarily carried out by Deepak Chinavle, with supervision from Prof. Tim Oates, and drew out from some of my work during graduate studies.
This paper improves on our understanding of adversarial classification and concept drift. We show that explicitly tracking mutual agreements of base classifiers within an ensemble, can be beneficial to reduce re-training frequency. Though this can be somewhat gleaned from the concept drift literature, we further show that our method works well even in the absence of new labeled examples. This has important practical benefits.
From our abstract:
The standard method for combating spam, either in email or on the web, is to train a classifier on manually labeled instances. As spammers change their tactics, the performance of such classifiers tends to decrease over time. Gathering and labeling more data to periodically retrain the classifier is expensive. We present a method based on an ensemble of classifiers that can detect when its performance might be degrading and retrain itself, all without manual intervention. Experiments with a real-world dataset from the blog domain show that our methods can significantly reduce the number of times classifiers are retrained when compared to a fixed retraining schedule, and they maintain classification accuracy even in the absence of manually labeled examples.
Coverage on CIKM 2009, elsewhere: