Skip to main content

IBM Israel Research Seminars

 

Say you are looking for information about a particular person. A search engine returns many pages for that person's name, but which pages refer to the person you care about, and which are about other people who happen to have the same name? Furthermore, if we are looking for multiple people who are related in some way, how can we best leverage this social network? In this talk I will present two unsupervised frameworks for solving this problem: one based on link structure of the Web pages, another using Agglomerative/Conglomerative Double Clustering---an application of a recently introduced multi-way distributional clustering method [Bekkerman, El-Yaniv & McCallum 2005].

To evaluate our methods, we collected and hand-labeled a dataset of over 1000 Web pages retrieved from Google queries on 12 personal names appearing together in someone's email folder. On this dataset our methods outperform traditional agglomerative clustering by more than 20%, achieving over 80% F-measure. Joint work with Andrew McCallum.

Speacker Bio
Ron Bekkerman is a doctoral student at the University of Massachusetts, working with Prof. Andrew McCallum on the CALO project. Ron's research interests are machine learning and web mining. Ron completed his B.Sc. and M.Sc. in CS at the Technion---Israel Institute of Technology. His Masters thesis (under supervision of Prof. Ran El-Yaniv) dealt with feature induction for text categorization.