Learning Bayesian networks for Multi-Relational Data

Many organizations maintain data in databases. Multi-relational databases contain information about entities, attributes of entities, links, and attributes of links. This talk presents methods for applying Bayesian network learning to multi-relational data. Generative graphical models like Bayesian networks support important applications such as information extraction, entity resolution, link-based clustering, link-based outlier detection, query optimization, and others. I describe a scalable parameter learning method, based on the Fast Moebius Transform, that integrates statistical information across multiple tables in the database. For learning the structure of a graphical model I describe a lattice search algorithm, that efficiently searches for probabilistic associations along increasingly longer relational pathways. These methods scale to millions of data records, for instance to data from the Internet Movie Database. Both theoretical arguments and empirical evidence indicate that Bayesian network learning provides excellent estimates of statistical associations in a relational database.

Oliver Schulte is a Professor in the School of Computing Science at Simon Fraser University, Vancouver, Canada. He received his Ph.D. from Carnegie Mellon University in 1997. His current research focuses on machine learning for structured data, such as relational databases and event data. He has published papers in leading AI and machine learning venues on a variety of topics, including learning Bayesian networks, learning theory, game theory, and scientific discovery. While he has won some nice awards, his biggest claim to fame may be a draw against chess world champion Gary Kasparov.

Oliver Schulte professor in the School of Computing Science, Simon Fraser University, Vancouver, Canada