I have been working on relational data clustering. Data clustering traditionally assumes that data objects are independent of one another. This assumption simplifies the problem, but also limits the potential for clustering algorithms to reveal natural groupings in data. I am investigating how we can use data which contains relational dependencies between objects to detect stronger patterns in data.
My underlying theory for why relational data clustering (or relational learning in general) is important involves a way of defining the known universe. In a single sentence, this theory is: All things, real or abstract, are defined by zero or more tangible features and one or more relations to another real or abstract concept [1].
Note the theory stated above requires at least one relation, but does not require any tangible features. Take, for example, the concept of happy. What tangible features does happy have? The first two definitions for happy in the American Heritage Dictionary are: "1. Characterized by good luck; fortunate. 2. Enjoying, showing, or marked by pleasure, satisfaction, or joy." The definition of happiness is only given in terms of relations to other concepts. The first shows a relation to possible causes of happiness. The second shows a relation to the effect of happiness. A scientis might further define happiness by its relation to a particular state of chemistry in the body. Happiness is also often measured in relation to past experiences of happiness, as in "I haven't been this happy since..." The point of this is that humans possess a high level of thought and expression through the ability to create and understand interrelations between different parts of information. The more powerful machines are at processing relational data, the more likely they are to learn as quickly and as diversely as humans.
The AI problem is the all-encompasing task for Artificial Intelligence researchers to develop a machine that can learn and act as intelligently as a human being. For over fifty years, researchers have made significant discoveries that contribute to solving this problem. All of these researchers would agree, however, that there is a very long way to go before the AI problem will be solved.
I've already argued that relational data is necessary for representing complex and abstract pieces of information. I will further argue that clustering is useful for general information processing. There are two reasons for this:
Relational data can express information in ways that non-relational information cannot. Adapting existing machine learning techniques, as well as creating new learning techniques will lead to more powerful learning machines. Relational data clustering is an important component for machine learning systems and for solving the AI problem. My research aims to develop relational data clustering algorithms that find groupings in data that will ultimately lead to faster retrieval of information and adaptation of knowledge to lead to learning about new concepts that are similar to already-learned concepts.
1000 Hilltop Circle
Baltimore, MD 21250
(410) 455-8894
Office: ITE 339
aanthon2 AT umbc DOT edu
[1]A.K. Agrawala, R. Larsen, and D. Szajda "Information Dynamics: An Information-Centric Approach to System Design," Proceedings of the International Conference on Virtual Worlds and Simulation, January 2000.