Bayesian Networks: Statistical introduction

Statistical introduction

Given data $x\,\!$ and parameter $\theta$ , a simple Bayesian analysis starts with a prior probability (prior) $p(\theta )$ and likelihood $p(x\mid \theta )$ to compute a posterior probability $p(\theta \mid x)\propto p(x\mid \theta )p(\theta )$ .

Often the prior on $\theta$ depends in turn on other parameters $\varphi$ that are not mentioned in the likelihood. So, the prior $p(\theta )$ must be replaced by a likelihood $p(\theta \mid \varphi )$ , and a prior $p(\varphi )$ on the newly introduced parameters $\varphi$ is required, resulting in a posterior probability

$p(\theta ,\varphi \mid x)\propto p(x\mid \theta )p(\theta \mid \varphi )p(\varphi )$ .

This is the simplest example of a hierarchical Bayes model.

The process may be repeated; for example, the parameters $varphi$ may depend in turn on additional parameters $\psi \,\!$ , which require their own prior. Eventually the process must terminate, with priors that do not depend on unmentioned parameters.

Introductory examples

Given the measured quantities $x_{1},\dots ,x_{n}\,\!$ each with normally distributed errors of known standard deviation $\sigma \,\!$ ,

$x_{i}\sim N(\theta _{i},\sigma ^{2})$

Suppose we are interested in estimating the $\theta _{i}$ . An approach would be to estimate the $\theta _{i}$ using a maximum likelihood approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply

$\theta _{i}=x_{i}$ .

However, if the quantities are related, so that for example the individual $\theta _{i}$ have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,

$x_{i}\sim N(\theta _{i},\sigma ^{2})$ ,

$\theta _{i}\sim N(\varphi ,\tau ^{2})$ ,

with improper priors $\varphi \sim {\text{flat}}$ , $\tau \sim {\text{flat}}\in (0,\infty )$ . When $n\geq 3$ , this is an identified model (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual $\theta _{i}$ will tend to move, or shrink away from the maximum likelihood estimates towards their common mean. This shrinkage is a typical behavior in hierarchical Bayes models.

Restrictions on priors

Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable $\tau \,\!$ in the example. The usual priors such as the Jeffreys prior often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the expected loss will be inadmissible.

Course Syllabus

Course Syllabus

Unit 1: What Is Artificial Intelligence?

1.1: The Turing Test

The Turing Test for Intelligence

Why the Turing Test Is Important

1.2: The Four Types of AI

Is Intelligence How You Think or the Output of Thinking?

Unit 1 Assessment

Unit 2: Agent-Based Approach to AI

2.1: Introduction to Agent-Based AI

Agents, Agent Types, and Their Capabilities

2.2: Analyzing Environmental Characteristics

Properties of Problem Environments and How to Analyze Them

Unit 2 Assessment

Unit 3: Machine Learning and Its Importance

3.1: Learning in AI and Agents

Supervised, Unsupervised, and Reinforcement ML

3.2: Applications of ML in Neural Networks

Newer Machine Learning Models and Applications

Unit 4: Machine Learning Algorithms

4.1: Classification Algorithms

Classification versus Regression

Importance of Classification and Regression in Machine Learning

Classification Using K-nearest Neighbors Algorithm

4.2: Classification Algorithm Performance

False Positives / False Negatives / Confusion Matrix

Precision and Recall Calculations from Confusion Matrix

Linear Regression – How It Works

4.3: Linear Regression Algorithms

Metrics for Linear Regression Effectiveness: R-squared, MSE and RSE

Lasso and Ridge Regression

Improving Linear Regression by Reducing Residual Errors

4.4: Other Supervised ML Classification Algorithms

Classification Using Decision Trees

Classification Using Logistic Regression

Applying Bayes' Theorem in Machine Learning

4.5: Unsupervised Learning and Reinforcement Learning

Unlabelled Data and Unsupervised Machine Learning

Principles and Applications of Reinforcement Learning

4.6: ML Using Neural Networks

Introduction to Neural Networks Basics

Neural Networks: Types and Applications

Unit 5: Problem-Solving Methods in AI

5.1: Integrating ML Skills

Applying Classification to Determine Insurability

How Regression Is Applied in Contemporary Computing

Using Neural Networks in Cancer Detection

5.2: General AI Problem-Solver Architecture

Characteristics of General Problem-Solver

5.3: Designing a General Problem-Solving Agent

How GPS Is Used

Computational Tractability of GPS

Unit 6: Search Algorithms

6.1: Uninformed Search Algorithms

Uninformed or Brute Force Search

Depth First Search Algorithm

Breadth First Search Algorithm

Uniform Cost Search Algorithm

6.2: Heuristic Search Algorithms

Heuristics and Using Them to Improve Search

Overview of A* Search and Analysis of Performance

Unit 7: Iterative Improvement Algorithms

7.1: Using Iterative Improvement to Solve Problems

Iterative Improvement Algorithms and Hill-Climbing

Constraint Satisfaction Problems and Their Importance

7.2: Improving Algorithm Efficiency

How Simulated Annealing Improves Hill-Climbing

Improving Mediocre Solutions Using Genetic Algorithms

Unit 8: Game-Playing Models

8.1: Game Trees and the Minimax Algorithm

Principles of Game Trees and How to Create One

Using the Minimax Algorithm in Adversarial Games

Assumptions Underlying Minimax Approach

8.2: Game-Playing Strategies

The Alpha Beta Pruning Algorithm

Tackling Multi-person Games

Unit 9: Natural Language Processing

9.1: Foundations of NLP

NLP Overview, Challenges, and Applications