Abstract: MAP inference in continuous probabilistic models has largely been restricted to convex density functions in order to guarantee tractability of the underlying model, since high-dimensional nonconvex optimization problems contain a combinatorial number of local minima, making them extremely challenging for convex optimization techniques. This choice has resulted in significant computational advantages but a loss in model expressivity. We present a novel approach to nonconvex optimization that overcomes this tradeoff by exploiting local structure in the objective function, greatly expanding the class of tractable, continuous probabilistic models. Our algorithm optimizes a subset of variables such that near the minimum the remaining variables decompose into approximately independent subsets, and recurses on these. Finding the global minimum in this way is exponentially faster than using convex optimization with restarts.
Abstract: Causal polytrees are singly connected causal models and they are frequently applied in practice. However, in many applications, many variables remain unobserved and causal polytrees cannot be applied without explicitly including unobserved variables. Our study thus proposes the ancestral polytree model, a novel combination of
ancestral graphs and singly connected graphs. Ancestral graphs can model causal and non-causal dependencies, while singly connected models allow for efficient learning and inference. We discuss the basic properties of ancestral polytrees and propose an efficient structure learning algorithm. Experiments on synthetic datasets and biological datasets show that our algorithm is efficient and the applications of ancestral polytrees are promising.
Abstract: Linear-chain conditional random fields (LC-CRFs) have been successfully applied in many structured prediction tasks. Many previous extensions, e.g. replacing local factors by neural networks, are computationally demanding. In this paper, we extend conventional LC-CRFs by replacing the local factors with sum-product networks, i.e. a promising new deep architecture allowing for exact and efficient inference. The proposed local factors can be interpreted as an extension of Gaussian mixture models (GMMs). Thus, we provide a powerful alternative to LC-CRFs extended by GMMs. In extensive experiments, we achieved performance competitive to state-of-the-art methods in phone classification and optical character recognition tasks.
Abstract: Sum-product networks (SPNs; Poon & Domingos, 2011) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum-Product Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other's probability distributions, as well as modeling the probabilities of relations between objects. We also present LearnRSPN, the first algorithm for learning tractable statistical relational models. LearnRSPN is a recursive top-down structure learning algorithm for RSPNs, similar to the LearnSPN (Gens & Domingos, 2013) algorithm for propositional SPN learning. We evaluate the algorithm on two datasets; the RSPN learning algorithm outperforms Markov Logic Networks (Richardson & Domingos, 2006) in both running time and predictive accuracy.
Abstract: Stochastic And-Or grammars extend traditional stochastic grammars of language to model other types of data such as images and events. In this short paper, we discuss our work in progress that aims to provide a unified framework and a probabilistic logic interpretation of stochastic And-Or grammars. We also present a general inference algorithm of stochastic And-Or grammars and discuss its tractability.
Abstract: We define two non-parametric models for Sum-Product Networks (SPNs). The first is a tree structure of Dirichlet Processes; the second is a dag of hierarchical Dirichlet Processes. These generative models for data implicitly define a prior distribution on SPN of tree and of dag structure. They allow MCMC fitting of data to SPN models, and the learning of SPN structure from data.
Abstract: A sequence of random variables is exchangeable if its joint distribution is invariant under variable permutations. We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. We prove that a family of tractable EVMs is optimal under zero-one loss for a large class of functions, including parity and threshold functions, and strictly subsumes existing tractable independence-based model families. Extensive experiments show that EVMs outperform state of the art classifiers such as SVMs and probabilistic models which are solely based on independence assumptions.
Abstract: We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.
Abstract: We consider the selectivity constraint on the structure of sum-product networks (SPNs), which allows each sum node to have at most one child with non-zero output for each possible input.
This allows us to find globally optimal maximum likelihood parameters in closed form.
Although being a constrained class of SPNs, these models still strictly generalize classical graphical models such as Bayesian networks.
Closed form parameter estimation opens the door for structure learning using a principled scoring function, trading off training likelihood and model complexity.
In experiments we show that these models are easy to learn and compete well with state of the art.
Abstract: Markov logic networks (MLNs) are a popular statistical relational learning formalism that combine Markov networks with first-order logic. Unfortunately, inference and maximum-likelihood learning with MLNs is highly intractable. For inference, this problem is addressed by lifted algorithms, which speed up inference by exploiting symmetries. State-of-the-art lifted algorithms give tractability guarantees for broad classes of MLNs and inference tasks. For learning, we showed in recent work how to use lifted inference techniques for efficient maximum-likelihood parameter learning.
In this paper, we propose the first lifted structure learning algorithm that guarantees that the MLNs it learns are liftable, and thus tractable for certain queries. Our work is among the first to apply the tractable learning paradigm to statistical relational models. Moreover, it is the first structure learning algorithm that exactly optimizes the likelihood of the MLN.
An empirical evaluation on three real-world datasets shows that our algorithm learns accurate models, both in terms of likelihood and prediction quality. Furthermore, our tractable learner outperforms intractable models on prediction tasks suggesting that liftable models are a powerful hypothesis space, which may be sufficient for many standard learning problems.
Abstract: The mean field algorithm is a widely used approximate inference algorithm for graphical models whose exact inference is intractable. In each iteration of mean field, the approximate marginals for each variable are updated by getting information from the neighbors. This process can be equivalently converted into a feedforward network, with each layer representing one iteration of mean field and with tied weights on all layers. This conversion enables a few natural extensions, e.g. untying the weights in the network. In this paper, we study these mean field networks (MFNs), and use them as inference tools as well as discriminative models. Preliminary experiment results show that MFNs can learn to do inference very efficiently and perform significantly better than mean field as discriminative models.
Abstract: Sum-product networks (SPNs) are a deep probabilistic representation that allows for efficient, exact inference. SPNs generalizes many other tractable models, including thin junction trees, latent tree models, and many types of mixtures. Previous work on learning SPN structure has mainly focused on using top-down or bottom-up clustering to find mixtures, which capture variable interactions indirectly through implicit latent variables. In contrast, most work on learning graphical models, thin junction trees, and arithmetic circuits has focused on finding direct interactions among variables. In this paper, we present ID-SPN, a new algorithm for learning SPN structure that unifies the two approaches. In experiments on 20 benchmark datasets, we find that the combination of direct and indirect interactions leads to significantly better accuracy than several state-of-the-art algorithms for learning SPNs and other tractable models.
Abstract: In this paper, we present cutset networks, a new tractable probabilistic model for representing multi-dimensional discrete distributions. Cutset networks are rooted OR trees, in which each OR node represents conditioning of a variable in the model, with tree Bayesian networks (Chow-Liu trees) at the leaves. From an inference point of view, cutset networks model the mechanics of Pearl's cutset conditioning algorithm, a popular exact inference method for probabilistic graphical models. We present efficient algorithms, which leverage and adapt vast amount of research on decision tree induction, for learning cutset networks from data. We also present an expectation-maximization (EM) algorithm for learning mixtures of cutset networks. Our experiments on a wide variety of benchmark datasets clearly demonstrate that compared to approaches for learning other tractable models such as thin-junction trees, latent tree models, arithmetic circuits and sum-product networks, our approach is significantly more scalable, and provides similar or better accuracies.
Abstract: The Neural Autoregressive Distribution Estimator (NADE) and its real-valued version RNADE are competitive density models of multidimensional data across a variety of domains. These models use a fixed, arbitrary ordering of the data dimensions. One can easily condition on variables at the beginning of the ordering, and marginalize out variables at the end of the ordering, however other inference tasks require approximate inference. In this work we introduce an efficient procedure to simultaneously train a NADE model for each possible ordering of the variables, by sharing parameters across all these models. We can thus use the most convenient model for each inference task at hand, and ensembles of such models with different orderings are immediately available. Moreover, unlike the original NADE, our training procedure scales to deep models. Empirically, ensembles of Deep NADE models obtain state of the art density estimation performance.
Abstract: We present a new probabilistic model of compact commutative Lie groups that produces invariant-equivariant and disentangled representations of data. To define the notion of disentangling, we borrow a fundamental principle from physics that is used to derive the elementary particles of a system from its symmetries. Our model employs a newfound Bayesian conjugacy relation that enables fully tractable probabilistic inference over compact commutative Lie groups -- a class that includes the groups that describe the rotation and cyclic translation of images. We train the model on pairs of transformed image patches, and show that the learned invariant representation is highly effective for classification.
Abstract: This paper presents foundational theoretical results on distributed parameter estimation for undirected probabilistic graphical models. It introduces a general condition on composite likelihood decompositions of these models which guarantees the global consistency of distributed estimators, provided the local estimators are consistent.