Home
Research
Publications
People
Data
Links
|
Research
User Preferences over Sets
Many interactions between humans and computers involve a search for information or items. For example, a World Wide Web search engine can produce a list of webpages that are ranked by their relevance to a specified search query. The underlying assumption is that the user is searching for a specific piece of information, and that the pages can be ranked in terms of their likelihood of containing the desired information.
In contrast, there are many applications in which the user instead wishes to obtain an optimal set of items. Examples include building a music playlist or selecting an incoming class of students from a pool of college applicants. The obvious approach, of ranking all items and then picking the top k, does not always yield the optimal subset. Items in a set can interact in ways that increase (via complementarity), or decrease (via redundancy or incompatibility), the overall valuation of a set.
For example, while a user may have a favorite music artist, he or she may not want a party music playlist composed solely of that artist's work (redundancy). This phenomenon has been referred to as the "portfolio effect" or "dependent relevance".
Our recent work has addressed three major aspects of this problem:
- How to represent set-based user preferences. We defined a new preference language, called DD-PREF, that allows users to encode preferences over different values for each feature as well as the relative importance of diversity (a good "spread" of values over the permitted range) versus depth (a good match of values to those that are highly valued).
- How to satisfy user preferences. We have adopted a method we refer to as "wrapper-greedy", which searches the space of possible subsets by starting with the first item in the set as a seed and then greedily adding items that improve the objective function. The "wrapper" part refers to the fact that we iterate over all possible starting seeds and then return the subset with the best objective function value.
- How to learn user preferences. It can be difficult for users to quantitatively specify their preferences, but often much easier to simply provide sample subsets that are highly valued. We have developed a method for inferring user preferences from these example sets (positive examples only).
Constrained Clustering
We are interested in the problem of incorporating background information (or user preferences) into clustering algorithms. Building on previous advances that have yielded a variety of semi-supervised clustering methods, we are examining issues such as:
- Active constraint selection. We have developed methods for actively selecting pairs of points for which we then request constraint information, particularly as applied to spectral clustering methods.
- Constrained problem difficulty. It is evident that some constrained clustering problems are more difficult than others. We have developed techniques for quantifying problem difficulty that are helpful both for analyzing behavioral differences in different algorithms as well as aiding in feature selection and active constraint selection.
- Explanations as feature annotations. If we annotate pairwise constraints with information about why they are constrained, in terms of which features were used to construct the constraint, it may be possible to infer more general properties of the constraint source, and then perhaps to infer additional constraints.
|