This is a design document for a machine learning program to identify simple rules for planning, based on previously made plans. First, let me give some definitions (under "Data structures"); then I will explain in more detail.
The language is C++; the compiler is g++; the system is Unix. We will use the Standard Template Library (STL).
Predicate: a list of strings in parentheses, e.g.: (clear a), or (on ?x ?y). Variables are denoted with leading ?'s. The type of a predicate is the first word in it; subsequent words are its arguments.
A predicate is simply an STL vector of strings
Most predicates relate to description of the world. A special kind of predicate is a target predicate: a predicate which tell us what to do, e.g. (unstack block-A block-B), meaning that we should unstack block-A from block-B. This is just another predicate, not a new data structure.
A binding of a variable to another string is an assertion that the variable equals the other string; e.g. binding ?x to A means "?x = A." It's implemented as a class in its own right.
A predicate matches another if a) they are both empty.
A predicate matches another if a) the first elements match, and b) given bindings
from matching the first elements, the rest of the first predicate matches the rest of the
second.
Two elements match, given a list of bindings (empty unless otherwise specified), a) they are identical, or b) after applying all bindings, at least one is still a variable. Bindings produced are the bindings given plus a binding of the two elements.
(So there is a boolean function match that takes two strings and a bindings list, and another that takes two predicates and a bindings list. Each updates its bindings list and returns whether the things match.)
Two special strings classifications:
Variable: a string with a leading ?. Usage: (clear ?x) means something is clear, but we haven't specified what.
Constant: any string that isn't a variable. Usage: (clear block-A) means that there is a block named block A, and it is clear.
To distinguish variables and constants, I'll have a small boolean function IsVariable to return true if the argument is a variable.
State description: a set of predicates describing the world. A complete state description describes it completely. This is implemented as an STL vector of predicates.
Rule: a state description (called an antecedent) paired with a target predicate (called a consequent). E.g, here's a rule that says if the hand has something in it, we should put that object on the table. Note that rules may contain variables, but need not.
if ( (holding ?x) ) <-- antecedent
then (putdown ?x) <-- consequent
Member functions:
void add_antecedent_predicate (const Predicate&);
bool match (const Example&, vector<vector <Binding>>&);
functions to read in and print out; constructors and destructors
Why a list of binding lists? because there may be more than one way for a rule to match an example.
Example: a complete state description (antecedent) paired with a target predicate (consequent). It means, "when the world state was thus-and-so, the action to take is thus-and-so."
Since rule and example are so similar, we link them by inheritance. Example is a subclass of rule. functions add_antecedent_predicate and match are disabled.
(This really should be in the requirements document, but there wasn't one, so...)
The input to the program is a list of examples from standard input. The output is a list of rules, sent to standard output.
The rules should be as follows:
main ( ) -- reads in examples from standard input; calls learn; prints results.
learn -- does the learning
Input: all examples
Output: all rules
Design: for each target predicate,
divide
examples into "positive" (having this target predicate type) and
"negative"
normalize
examples so that arguments in target predicate match.
E.g.
if one has consequent (pickup A B) the others all have consequent (pickup A B) -- never
(pickup B C), say.
This
way we're always using the same term for "thing to pick up" and "its
base"
while
there are still positive examples
call
learn_rule to get a rule to cover them
remove
all positive examples now covered
reset
examples for next target predicate
learn_rule -- learns the best rule with consequent equal to a target predicate
Input: positive examples; negative examples; target predicate
Output: rule
Design: create a new rule with target predicate as consequent, and an empty
antecedent
this means "under all circumstances, do target
predicate"
while there are
still negative examples the rule covers
identify
all predicates we might add to antecedent*
find
the one that covers the most positive examples
if
there is a tie, pick the one that rules out most negative examples
if
there is a tie, pick the one that binds the most not-yet-bound variables from the target
predicate
if
there is none, complain and crash -- there will be
else
add this to antecedent of rule
remove
all negative examples the rule no longer covers
functions that I should really include but I haven't yet:
how to identify all predicates we might add to antecedent of rule
how to find the one that covers most positives, negatives, etc.
design for match (in all its incarnations).
We may change the criteria for choosing what predicate to add to a rule antecedent, possibly using FOIL (described in Chapter 6 of Machine Learning).