<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.w3.org/2005/Atom">
<title>Statistics</title>
<link href="http://hdl.handle.net/10379/1904" rel="alternate"/>
<subtitle/>
<id>http://hdl.handle.net/10379/1904</id>
<updated>2017-10-29T23:50:09Z</updated>
<dc:date>2017-10-29T23:50:09Z</dc:date>
<entry>
<title>Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses</title>
<link href="http://hdl.handle.net/10379/3835" rel="alternate"/>
<author>
<name>Yang, Haixuan</name>
</author>
<id>http://hdl.handle.net/10379/3835</id>
<updated>2015-10-15T12:32:30Z</updated>
<published>2012-08-07T00:00:00Z</published>
<summary type="text">Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses
Yang, Haixuan
The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments.
</summary>
<dc:date>2012-08-07T00:00:00Z</dc:date>
</entry>
<entry>
<title>Program quality with pair programming in CS1</title>
<link href="http://hdl.handle.net/10379/3805" rel="alternate"/>
<author>
<name>Krnjaji c, Milovan</name>
</author>
<id>http://hdl.handle.net/10379/3805</id>
<updated>2015-10-15T12:43:00Z</updated>
<published>2005-01-01T00:00:00Z</published>
<summary type="text">Program quality with pair programming in CS1
Krnjaji c, Milovan
In several regression applications, a different structural&#13;
relationship might be anticipated for the higher or lower&#13;
responses than the average responses. In such cases,&#13;
quantile regression analysis can uncover important features&#13;
that would likely be overlooked by mean regression.&#13;
We develop two distinct Bayesian approaches to fully&#13;
nonparametric model-based quantile regression. The first&#13;
approach utilizes an additive regression framework with&#13;
Gaussian process priors for the quantile regression functions&#13;
and a scale uniform Dirichlet process mixture prior&#13;
for the error distribution, which yields &#13;
flexible unimodal&#13;
error density shapes. Under the second approach, the&#13;
joint distribution of the response and the covariates is&#13;
modeled with a Dirichlet process mixture of multivariate&#13;
normals, with posterior inference for different quantile&#13;
curves emerging through the conditional distribution&#13;
of the response given the covariates. The proposed nonparametric&#13;
prior probability models allow the data to uncover&#13;
non-linearities in the quantile regression function&#13;
and non-standard distributional features in the response&#13;
distribution. Inference is implemented using a combination&#13;
of posterior simulation methods for Dirichlet process&#13;
mixtures. We illustrate the performance of the proposed&#13;
models using simulated and real data sets.
</summary>
<dc:date>2005-01-01T00:00:00Z</dc:date>
</entry>
<entry>
<title>Bayesian Model Specification: Some problems related to model choice and calibration</title>
<link href="http://hdl.handle.net/10379/3804" rel="alternate"/>
<author>
<name>Krnjajic, Milovan</name>
</author>
<id>http://hdl.handle.net/10379/3804</id>
<updated>2015-10-15T12:42:59Z</updated>
<published>2011-01-01T00:00:00Z</published>
<summary type="text">Bayesian Model Specification: Some problems related to model choice and calibration
Krnjajic, Milovan
In the development of Bayesian model specification for inference and prediction we&#13;
focus on the conditional distributions p([theta],[beta]) and p(D[theta],[beta]), with data D and background&#13;
assumptions [beta], and consider calibration (an assessment of how often we get the right&#13;
answers) as an important integral step of the model development. We compare several&#13;
predictive model-choice criteria and present related calibration results. In particular, we&#13;
have implemented a simulation study to compare predictive model-choice criteria LS[cv] ,&#13;
a log-score based on cross-validation, LS[fs], a full-sample log score, with deviance information&#13;
criterion, DIC. We show that for several classes of models DIC and LS[cv] are&#13;
(strongly) negatively correlated; that LS[fs] has better small-sample model discrimination&#13;
performance than either DIC, or LS[cv] ; we further demonstrate that when validating&#13;
the model-choice results, a standard use of posterior predictive tail-area for hypothesis&#13;
testing can be poorly calibrated and present a method for its proper calibration.
</summary>
<dc:date>2011-01-01T00:00:00Z</dc:date>
</entry>
<entry>
<title>Quantifying the Price of Uncertainty in Bayesian Models</title>
<link href="http://hdl.handle.net/10379/3802" rel="alternate"/>
<author>
<name>Krnjajic, Milovan</name>
</author>
<id>http://hdl.handle.net/10379/3802</id>
<updated>2015-10-15T12:42:58Z</updated>
<published>2013-01-01T00:00:00Z</published>
<summary type="text">Quantifying the Price of Uncertainty in Bayesian Models
Krnjajic, Milovan
During the exploratory phase of a typical statistical analysis it is natural to look at the&#13;
data in order to narrow down the scope of the subsequent steps, mainly by selecting a set&#13;
of families of candidate models (parametric, for example). One needs to exercise caution&#13;
when using the same data to assess the parameters of a specific model and deciding how&#13;
to search the model space, in order not to underestimate the overall uncertainty, which&#13;
usually occurs by failing to account for the second order randomness involved in exploring&#13;
the modelling space. In order to rank the models based on their fit or predictive performance&#13;
we use practical tools such as Bayes factors, log-scores and deviance information&#13;
criterion. Price for model uncertainty can be paid automatically when using Bayesian&#13;
nonparametric (BNP) specification, by adopting weak priors on the (functional) space of&#13;
possible models, or in a version of cross validation, where only a part of the observed&#13;
sample is used to fit and validate the model, whereas the assessment of the calibration&#13;
of the overall modelling process is based on the as-yet unused part of the data set. It is&#13;
interesting to see if we can determine how much data needs to be set aside for calibration&#13;
in order to obtain an assessment of uncertainty approximately equivalent to that of the&#13;
BNP approach.
</summary>
<dc:date>2013-01-01T00:00:00Z</dc:date>
</entry>
</feed>
