This NSF-sponsored project developed new approaches to support human understanding of the uncertainty
that is inherent in the structure and predictions of complex models. This research draws on and
contributes to research in the fields of machine learning and visualization.
Specifically, the focus in
the project is on understanding several types of uncertainty that are associated with model predictions.
Sample uncertainty occurs when regions of the instance space are not well represented in the training
data, and predictions are therefore based on sparse information. Model instability occurs when model
predictions vary, depending on the training data that was used to construct the model. Prediction
variability occurs when a given observation may have noisy attributes, and this input uncertainty
leads to uncertainty in the model's predictions.
We developed novel analytical techniques to create
meta-models that characterize these three forms of uncertainty. To facilitate user understanding
of the nature and distribution of these multiple types of uncertainty across the model space, novel
visualization methods represent these meta-models in a display space.
A description of the software that we
have developed for analyzing and visualizing model uncertainty is
available here. Some
screenshots of the visualizations produced by the system are
provided as examples of the visualization methods:
This figure shows the predictions made by a naive Bayes
classifier on the Adult (census) data set, where the class
predicted is the education level of an individual, given input
attributes including age, gender, race, and income. The display
space is created by applying principal components analysis (PCA)
to the input attributes. Each glyph (image) in the display
represents a single instance in the data set. Each class
is assigned a distinct color, and the interior of the glyph
is "speckled" with these colors, with the frequency of each
class's color being proportional to its predicted probablity.
The exterior border of the glyph shows the true class.
Therefore, instances whose interior color is predominantly
the same color as the border are true positives for that class;
instances whose interior color is predominantly a different
color as the border represent false negatives for that class.
This image shows predictions made by a naive Bayes classifier for the
UCI Cardiotocography data set, in which the input attributes are
measurements of fetal heartrate and uterine contractions, and the
classification is into one of several possible heartrate patterns.
Here, the glyph representation is the pie chart, and the display
space was generated using principal components analysis. Notice that most
instances are correctly predicted (the border and the interior are
the same) with high confidence (if there is a second color in
the glyph -- i.e., an alternative class predicted with non-zero
probability -- it uses only a very small "sliver" of hte "pie").
Notice also that the first principal component (x axis) does a
very effective job of sorting the data by class (color).
Individual instances that are mispredicted (e.g., the half-blue,
glyph near the vertical cente and about 15% from the left side of
the display) may be outliers or may represent instances with
ambiguous or incorrect attribute values. Future versions of
the software will provide interactive drill-down techniques
that would permit a user to query the system about the details
of these instances and the associated model predictions.
As part of this project, we identified a number of data
sets that contain interesting forms of uncertainty. The software page
links to the source data that we have obtained. Interested researchers
may contact us directly to inquire about cleaned-up/processed versions
of these data sets.
Penny Rheingans, Marie desJardins, Wallace Brown, Alex Morrow, Doug Stull, and Kevin Winner (2014), Visualizing Uncertainty in Predictive Models, in Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization
, Charles D. Hansen, Min Chen, Christopher R. Johnson, Arie Kaufman, and Hans Hagen (Eds.), Springer-Verlag Inc, Mathematics and Visualization Series, 2014. ISBN 978-1-4471-6496-8.
The project is a joint effort between two laboratories:
This material is based upon work supported by the National Science Foundation under award IIS-EAGER-1050168 and REU supplement 1129683.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s)
and do not necessarily reflect the views of the National Science Foundation.