Design document for learning rules for reactive planning

This is a design document for a machine learning program to identify simple rules for planning, based on previously made plans. First, let me give some definitions (under "Data structures"); then I will explain in more detail.

Tools

The language is C++; the compiler is g++; the system is Unix. We will use the Standard Template Library (STL).

Data structures:

Predicate: a list of strings in parentheses, e.g.: (clear a), or (on ?x ?y). Variables are denoted with leading ?'s. The type of a predicate is the first word in it; subsequent words are its arguments.

A predicate is simply an STL vector of strings

Most predicates relate to description of the world. A special kind of predicate is a target predicate: a predicate which tell us what to do, e.g. (unstack block-A block-B), meaning that we should unstack block-A from block-B. This is just another predicate, not a new data structure.

A binding of a variable to another string is an assertion that the variable equals the other string; e.g. binding ?x to A means "?x = A." It's implemented as a class in its own right.

A predicate matches another if a) they are both empty.
A predicate matches another if a) the first elements match, and b) given bindings from matching the first elements, the rest of the first predicate matches the rest of the second.

Two elements match, given a list of bindings (empty unless otherwise specified), a) they are identical, or b) after applying all bindings, at least one is still a variable. Bindings produced are the bindings given plus a binding of the two elements.

(So there is a boolean function match that takes two strings and a bindings list, and another that takes two predicates and a bindings list. Each updates its bindings list and returns whether the things match.)

Two special strings classifications:

Variable: a string with a leading ?. Usage: (clear ?x) means something is clear, but we haven't specified what.

Constant: any string that isn't a variable. Usage: (clear block-A) means that there is a block named block A, and it is clear.

To distinguish variables and constants, I'll have a small boolean function IsVariable to return true if the argument is a variable.

State description: a set of predicates describing the world. A complete state description describes it completely. This is implemented as an STL vector of predicates.

Rule: a state description (called an antecedent) paired with a target predicate (called a consequent). E.g, here's a rule that says if the hand has something in it, we should put that object on the table. Note that rules may contain variables, but need not.

if ( (holding ?x) ) <-- antecedent
then (putdown ?x) <-- consequent

Member functions:

void add_antecedent_predicate (const Predicate&);
bool match (const Example&, vector<vector <Binding>>&);
functions to read in and print out; constructors and destructors

Why a list of binding lists? because there may be more than one way for a rule to match an example.

Example: a complete state description (antecedent) paired with a target predicate (consequent). It means, "when the world state was thus-and-so, the action to take is thus-and-so."

Since rule and example are so similar, we link them by inheritance. Example is a subclass of rule. functions add_antecedent_predicate and match are disabled.

Purpose of program

(This really should be in the requirements document, but there wasn't one, so...)

The input to the program is a list of examples from standard input. The output is a list of rules, sent to standard output.

The rules should be as follows:

If the antecedent of any example matches the antecedent of any rule (thus producing bindings), the target predicates also match (given those bindings). If there's more than one possible set of bindings, this is true for all sets; and each variable in the rule's target predicate is bound in each possible set of bindings.
Each example is thus covered by a rule or rules.
There should be as few rules as possible to achieve the previous 2 conditions.
Given the previous conditions, each rule should have as few predicates in its antecedent as possible.

Global functions

main ( ) -- reads in examples from standard input; calls learn; prints results.

learn -- does the learning
Input: all examples
Output: all rules
Design: for each target predicate,
               divide examples into "positive" (having this target predicate type) and "negative"
                normalize examples so that arguments in target predicate match.
                    E.g. if one has consequent (pickup A B) the others all have consequent (pickup A B) -- never (pickup B C), say.
                    This way we're always using the same term for "thing to pick up" and "its base"
                while there are still positive examples
                    call learn_rule to get a rule to cover them
                    remove all positive examples now covered
                reset examples for next target predicate

learn_rule -- learns the best rule with consequent equal to a target predicate
Input: positive examples; negative examples; target predicate
Output: rule
Design: create a new rule with target predicate as consequent, and an empty antecedent
       this means "under all circumstances, do target predicate"

             while there are still negative examples the rule covers
                identify all predicates we might add to antecedent*
                find the one that covers the most positive examples
                    if there is a tie, pick the one that rules out most negative examples
                    if there is a tie, pick the one that binds the most not-yet-bound variables from the target predicate
                if there is none, complain and crash -- there will be
                else add this to antecedent of rule
                remove all negative examples the rule no longer covers

functions that I should really include but I haven't yet:

how to identify all predicates we might add to antecedent of rule

how to find the one that covers most positives, negatives, etc.

design for match (in all its incarnations).

Possible design changes

We may change the criteria for choosing what predicate to add to a rule antecedent, possibly using FOIL (described in Chapter 6 of Machine Learning).