Applying AI and Interactive Visualization Methods to Complex Models

Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County

This NSF-sponsored project developed new approaches to support human understanding of the uncertainty that is inherent in the structure and predictions of complex models. This research draws on and contributes to research in the fields of machine learning and visualization.

Specifically, the focus in the project is on understanding several types of uncertainty that are associated with model predictions. Sample uncertainty occurs when regions of the instance space are not well represented in the training data, and predictions are therefore based on sparse information. Model instability occurs when model predictions vary, depending on the training data that was used to construct the model. Prediction variability occurs when a given observation may have noisy attributes, and this input uncertainty leads to uncertainty in the model's predictions.

We developed novel analytical techniques to create meta-models that characterize these three forms of uncertainty. To facilitate user understanding of the nature and distribution of these multiple types of uncertainty across the model space, novel visualization methods represent these meta-models in a display space.

A description of the software that we have developed for analyzing and visualizing model uncertainty is available here. Some screenshots of the visualizations produced by the system are provided as examples of the visualization methods:

This figure shows the predictions made by a naive Bayes classifier on the Adult (census) data set, where the class predicted is the education level of an individual, given input attributes including age, gender, race, and income. The display space is created by applying principal components analysis (PCA) to the input attributes. Each glyph (image) in the display represents a single instance in the data set. Each class is assigned a distinct color, and the interior of the glyph is "speckled" with these colors, with the frequency of each class's color being proportional to its predicted probablity. The exterior border of the glyph shows the true class. Therefore, instances whose interior color is predominantly the same color as the border are true positives for that class; instances whose interior color is predominantly a different color as the border represent false negatives for that class.

This image shows predictions made by a naive Bayes classifier for the UCI Cardiotocography data set, in which the input attributes are measurements of fetal heartrate and uterine contractions, and the classification is into one of several possible heartrate patterns. Here, the glyph representation is the pie chart, and the display space was generated using principal components analysis. Notice that most instances are correctly predicted (the border and the interior are the same) with high confidence (if there is a second color in the glyph -- i.e., an alternative class predicted with non-zero probability -- it uses only a very small "sliver" of hte "pie"). Notice also that the first principal component (x axis) does a very effective job of sorting the data by class (color). Individual instances that are mispredicted (e.g., the half-blue, half-orange glyph near the vertical cente and about 15% from the left side of the display) may be outliers or may represent instances with ambiguous or incorrect attribute values. Future versions of the software will provide interactive drill-down techniques that would permit a user to query the system about the details of these instances and the associated model predictions.

As part of this project, we identified a number of data sets that contain interesting forms of uncertainty. The software page links to the source data that we have obtained. Interested researchers may contact us directly to inquire about cleaned-up/processed versions of these data sets.

Penny Rheingans, Marie desJardins, Wallace Brown, Alex Morrow, Doug Stull, and Kevin Winner (2014), Visualizing Uncertainty in Predictive Models, in Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization, Charles D. Hansen, Min Chen, Christopher R. Johnson, Arie Kaufman, and Hans Hagen (Eds.), Springer-Verlag Inc, Mathematics and Visualization Series, 2014. ISBN 978-1-4471-6496-8.
Download PDF

The project is a joint effort between two laboratories:

The project was sponsored by National Science Foundation (NSF grant IIS-EAGER-1050168)


MAPLE Lab   Vangogh Lab   UMBC
This material is based upon work supported by the National Science Foundation under award IIS-EAGER-1050168 and REU supplement 1129683.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s)
and do not necessarily reflect the views of the National Science Foundation.