Lecture 11: Expert Systems and Reasoning with Uncertainty

The Library of Dresan: Dr. Anthony G. Francis, Jr.'s Weblog

Introduction to Artificial Intelligence with Applications to Public Health

Rollins School of Public Health at Emory University Instructor: Dr. Anthony G. Francis, Jr.

Lecture 11: Expert Systems and Reasoning with Uncertainty

Expert systems automate or assist with expert human tasks by capturing knowledge held by experts and using AI reasoning techniques to draw conclusions in novel situations; examples of expert systems include medical diagnosis and clinical decision support systems. Expert systems typically reason with logical rules, which in theory can draw potentially infinite sets of valid conclusions from the given facts. However, simple reasoning is not enough; typically have additional components related to entering, maintaining, and explaining the application of the knowledge they store. Furthermore, we frequently cannot specify sound and complete rules for an expert domain; instead, expert systems use probability models to draw conclusion conclusions with varying degrees of belief that become increasingly tenuous the further and further we stray from observed facts. These probability models, often based on the Bayesian statistical framework, have wide application across many areas of artificial intelligence, including expert systems, decision theory, and information retrieval

Outline

Expert Systems
Architecture of Expert Systems
Successful Expert Systems
Reasoning with Uncertainty
Bayesian Networks
Applications of Bayesian Networks

Readings:

Artificial Intelligence: Chapters 17 (esp. 17.4) and 19
Machines Who Think: Chapter 12

Expert Systems

The rise of knowledge-based artificial intelligence

Early AI: Weak Methods
- Universal mechanisms for solving generic problems
- Computationally intractable for all but the smallest, simplest problems
- Examples: Logic Theorist, General Problem Solver
  - Logic Theorist: mathematical proofs
  - General Problem Solver: planning
  - EPAM: memory retrieval
1970's: Strong Methods

Add task-specific knowledge about particular problems
Not universal, but may be efficient and effective for given problems
Examples:
- DENDRAL: spectroscopic analysis
- MACSYMA: mathematical equation solver
- HEARSAY: speech understanding

Expert Systems: Strong Methods applied to Expert Problems

Focus on tasks normally performed by human specialists
Solve "expert" rather than "mundane" problems
- Mundane tasks: basic tasks performed by every human every day
  - General cognitive, perceptual and motor faculties
  - Very hard problems for artificial intelligence
  - Involve signal processing, vast knowledge, or realtime control
  - Require vast amounts of memory and/or processing power
  - Examples: recognizing a face, driving a car, making a sandwich
- Expert problems: symbolic cognitive tasks requiring trained experts
  - Specialized tasks employing our general cognitive apparatus
  - "Easy" problems for AI because they are abstracted to the symbolic realm
  - Involve symbolic information, logical reasoning, and probability
  - Require good knowledge representation and inference algorithms
  - Examples: playing chess, diagnosing a disease, solving an equation
Because of their task-specific focus:
- Require support architecture around core AI components
- Require knowledge engineering to collect specialist knowledge

Architecture of Expert Systems

Major Components of Expert Systems

Knowledge Base
Working Memory
Inference Engine
Explanation System
Knowledge Base Editor
User Interface

Users of Expert Systems

Domain Expert
Knowledge Engineer
End User

Knowledge Base:

Updated through a process of knowledge engineering
- Knowledge engineers interview domain experts
- Initial system is iteratively tested
- Occasionally side-by-side comparisons used
Incorporates both facts and heuristics
- Facts: Generally public, shared, explicit, vetted knowledge
  - Representations: Logical Statements or Network Knowledge
    - Semantic Networks
    - Frames
    - Object-Attribute-Value (OAV) tuples
- Heuristics: Generally private, not shared, implicit, rules of thumb
  - Representations: If-Then Rules
    - Simple Rules
    - Variabilized Rules
    - Uncertain Rules

Successful Expert Systems

All too many failures:
Made expert system problem solving a "stunt" that disrupted practice
- MYCIN - sessions too long, range too limited, too large and expensive to deploy
- XCON - hard to maintain as system grew to 10,000 rules
- In general: limited to
  "Any problem that can be and frequently is solved by your in-house expert in a 10 to 30 minute phone call," - Morris W. Firebaugh
Invisible successes:
Hidden systems, hidden knowledge
- MAXIMA: freely available symbolic mathematics package
- COLOSSUS: Australian insurance adjustor advisor
- F-16 Maintenance Skills Tutor: has expert system troubleshooter
Successes in Healthcare
Systems integrated into medical practice:
- PUFF - interprets pulmonary function tests
- PIERS - produces chemical pathology reports
- FocalPoint - scans 10% of all PAP screen slides

Reasoning with Uncertainty

Requirements for Logical Reasoning
- Consistent Axioms
- Correct Inference Rules
- Valid Initial Facts
When Logic Fails
- Unknown or unknowable "axioms"
- Uncertain inference rules or process models
- Incomplete knowledge of the world or its state
Probability Theory
- Probability: degree of belief in a proposition
  - Not statistical probability (likelihood that an event will occur)
  - Could be derived statistically if population sampling data available
  - Usually this data is not available or accurate
- Views of Probability
  - Frequentist - probabilities derived from experiment
  - Objectivist - probabilities are properties of the universe
  - Subjectivist - probabilities characterize agent's beliefs
  - Reference problem - the more precisely we objectively fix a situation, the smaller the range of conditions over which it holds
- Kolmogoroev's Axioms of Probability
  - 1. All probabilities are between 0 and 1
  - 2a. Necessarily true propositions have probability 1
  - 2b. Necessarily false propositions have probability 0
  - 3. The probability of a disjunction is the sum of the probabilities of its elements, minus the probability they both happen at the same time
  - Result: sum of all ways an event can happen must be 1
- Random variables
  - One result of Kolmogorev's axioms:
    Sum of all mutually exclusive ways an event can happen must be 1
  - Random variables enumerate these mutually exclusive outcomes
    - V = v1, v2, v3 ... vn
  - Types of random variables:
    - Propositional: True or False
    - Categorical: take on one of a set of values
    - Numerical: take on one of a continuous range of values
  - Probability of Random Variables:
    - P( v1 )
    - Short for P( V=v1 )
    - Joint probability of many variables
      - P( V1, V2, V3 )
      - can take on values P( V1=v1, V2=v2, V3=v3 ... VN=vn )
      - equivalent to P( v1 ^ v2 ^ v3 ... vn )
  - Probabilistic Reasoning
    - Prior Probability: likelihood of a proposition, all things being equal:
      - P( H ) short for P( Hypothesis ) across all joint probability distributions
    - Conditional Probability: likelihood of one event given another
      - P( H | E ) short for P( Hypothesis | Evidence )
      - P( H | E ) = P( H ^ E ) / P( E )
    - Chain Rule: reasoning across multiple hypotheses
      - P( V1, V2 ... Vn ) = Power(i=1..n) P( Vi | V1, ... VN )
    - Bayes' Rule: reasoning back from evidence to hypotheses
      - P( H | E ) = P( E | H ) * P( H ) / P( E )
Problems with Reasoning with Uncertainty
- Ad-hoc methods do not scale
  - MYCIN certainty factors give good results for small problems
  - Not reliable for larger chains of rules
  - Not stable for larger problems
- Fully specified probability theory can be intractable
  - Joint probability distributions have combinatorial explosion
  - Cannot collect answers for every cell of the matrix
  - Could not tractably compute with them if you had them
- Bayesian reasoning often provides an effective method
  - Compute using Bayes rule over bayesian inference networks
  - Uses independence assumptions to streamline reasoning
  - Often works almost as well as more accurate approaches

Bayesian Networks

Represent conditional independence as a directed acyclic graph (DAG)
- Requires conditional independence:
  P( H | E1, E2 ) = P( H | E1 ) means H is conditionally independent of E2
- Enables us to eliminate many unneeded cells of the matrix
Structure of a Bayesian Network
- Prior probabilities are assigned to nodes without parents
- Conditional probability tables assigned for nodes with parents
- Nodes that are not connected are assumed to be independent
Reasoning with a Bayesian Network
- Use the chain rule over all conditional dependencies
- E.g., P( C1, C2, P1, P2 ) = P( C1 | P1, P2 )*P( C2 | P2 )*P( P1 )*P( P2 )
- Much simpler than the unconstrained case: 16 possibilities
Other uses of Bayesian Networks
- D-Separation: compute independence given some evidence
- Polytrees: Simpler networks which enable more efficient inference
- Evidence Above: compute probability of hypotheses given evidence
- Evidence Below: compute probability of evidence given hypotheses

Applications of Bayesian Networks

Expert systems - compute principled probability relations
Planning - compute the best possible action
Information retrieval - guess which documents are relevant

Resources

Expert Systems: ????
Logic: ????
Probability: ????
Utility Theory: ????
Decision Theory: ????
Bayesian Reasoning: ????
Bayesian Networks: ????
Clinical Decision Support Systems: ????
AI and Healthcare: ????

Research
Articles
Classes
Software

Classic
Weblog
Wiki
Store

f@nu fiku
Fiction
Personal
About

Contact