Inductive Logic Programming

### Introduction

Induction is reasoning from the specific to the general. For example, consider the following dataset on kinship that is similar to what we have considered in the earlier chapters.

 parent(a,b) parent(a,c) parent(d,b) father(a,b) father(a,c) mother(d,b) male(a) female(c) female(d)

Given the above dataset, we can use inductive reasoning to infer the following rules (or view definitions):

 father(X,Y) :- parent(X,Y) & male(X) mother(X,Y) :- parent(X,Y) & female(X)

In inductive logic programming, given a dataset, a set of starting view definitions, and a target predicate, we can infer the view definition of the target predicate. In the example above, we were given a dataset, no starting view definitions, and we inferred the view definition of father and mother. In the context of the inductive logic programming, the dataset is also referred to as a set of positive examples. Some inductive reasoning algorithms also take a set of negative examples. If negative examples are not provided, they can be computed as the set of ground atoms in the Herbrand base that are not present in the dataset. Set of positive and negative examples taken together is also known as training data.

In this lesson, we will consider a simple inductive learning algorithm to infer rules from data. The description here is based on an algorithm known as the First Order Inductive Learner (FOIL).

### An Example Inductive Learner

The general approach used in an inductive learner is to start from the predicate whose definition is to be learned as the head of a a rule whose body is initialized to be empty. At each step, we add a literal to the body of the rule so that it satisfies several positive examples and none of negative examples. The literal to be added can be one of the predicates from the problem statement, the negation of a predicate from the problem statement, equality between two bound variables and inequality between two bound variables. Recursive literals are allowed if they will not cause an infinite regression. For choosing between alternative literals, a heuristic measure is used. An example pseudo code of an inductive learner is shown below.

 foil(positive, negative, predicate) clauses ← ∅ repeat clause ← learnNewClause(positive, negative, target) positive ← positive - all positive examples satisfied by the clause clauses ← clauses + clause until positive examples = ∅ return clauses

 learnNewClause(positive, negative, predicate) clause ← ∅ repeat literal ← choose_literal(clause, positive, negative) clause ← clause ∪ literal negative ← negative - negative examples satisfied by literal until negative examples = ∅ return clause

To understand the working of this algorithm, let us consider how it learns the rule father(X,Y) :- parent(X,Y) & male(X). Two positive examples, and the relevant background predicates are as listed at the start of this section. There are ten negative examples are listed below.

 father(a,d) father(b,a) father(b,c) father(b,d) father(c,a) father(c,b) father(c,d) father(d,a) father(d,b) father(d,c)

We start a new clause with father(X,Y) in the head, and with an empty body. Here are the possible candidates of literals to be added to the body.

 male(X) male(Y) female(X) female(Y) parent(X,Y) father(X,Z) father(Y,X) father(Y,Z) father(Z,X) father(Z,Y)

To ensure the safety of the rules, we have not considered any negated literals, or a literal that checks for equality of inequality of unbound variables. Furthermore, we only consider those literals that share at least one variable with the predicate that we are defining.

To choose the most useful literal to be added to the body of the rule, we use the following gain heuristic that measures the contribution of the literal to the desired predicate definition.

 gain(literal) = T * ( log2 (post_pos / (post_pos + post_neg)) - log2 (pre_pos / (pre_pos + pre_neg)))

In the above formula post_pos and post_neg respectively refer to the positive and negative examples that satisfy the rule after adding the literal, and pre_pos and pre_neg respectively refer to the positive and negative examples satisfied by the rule before adding the literal. T refers to the number of positive examples that satisfied the rule before adding the new literal, and are still covered after adding the literal.

As an example, let us compute the gain for literal male(X). For the starting clause with empty body, the values for pre_pos and pre_neg are 2 and 10 respectively. Both positive examples satisfy father(X,Y) :- male(X) giving the values for post_pos as 2. Four negative negative examples satisfy this clause giving the values for post_neg as 4. The value for L is 2. The overall gain value can be calculated at 2.0. In contrast, if we consider the literal female(X), none of the positive examples satisfy it, and three negative examples satisfy it, giving an overall gain value of 0. Therefore, male(X) will be preferred over female(X). All the literals that are considered in the first step, and their gain values are summarized in the table below.

LiteralLpre_pospre_negpost_pospost_neggain
male(X)2210242.0
male(Y)1210150.0
female(X)0210060.0
female(Y)1210150.0
parent(X,Y)2210214.0
parent(X,Z)2210452.83
parent(Y,X)0210030.0
parent(Y,Z)0210090.0
parent(Z,X)0210030.0
parent(Z,Y)2210362.0

Let us understand why the value for post_pos for the literal parent(X,Z) is 4 when the number of positive examples is only 2. The literal parent(X,Z) introduces a new variable Z. Given one positive example such as parent(a,b), parent(a,Z) matches positive examples, parent(a,b) and parent(a,c), giving us two different matches (ie, X=a, Z=b, Y=b and X=a, Z=c, Y=b). Therefore, total number of matches is 4. This computation is incorporated into the algorithm by extending the set of positive and negative examples to take into account the extra variable. For simplicity, we have omitted the example extension step from the pseudo code.

The highest gain value in the above table is for the literal parent(X,Y). Therefore, at the end of the first iteration, we have the rule father(X,Y) :- parent(X,Y). As this rule is satisfied by a negative example, we must keep looking for additional clauses. The literals considered and their gain values are shown in the table below.

LiteralLpre_pospre_negpost_pospost_neggain
female(X)021010.0
female(Y)121100.59
parent(X,Y)221210.0
parent(X,Z)221410.53
parent(Z,Y)221320.0
male(X)221201.17
male(Y)12111-0.41

As we can see that the literal male(X) has the highest gain and the resulting rule is satisfied by all the positive and negative examples. Hence, we have successfully learned the rule father(X,Y) :- parent(X,Y), male(X).

### Enhancements and Variations

The basic version of the inductive learner considered in the previous section has numerous enhancements. Some important enhancement include ensuring that the rules remain stratified and making sure that rules do not lead to infinite recursion.

In many real-world problems it is not possible to formulate rules that precisely characterize the domain. In such circumstances, many studies have found that simple, approximate concepts can be more valuable than over specified rules that "fit the noise." One approach to address this issue is to implement a stopping criteria for deciding that a clause should not be extended even if it is satisfied by some negative examples. These decisions are based on the perception that, for a sensible clause, the number of bits required to encode the clause should never exceed the number of bits needed to indicate explicitly the tuples covered by the clause.

Any greedy algorithm is prone to making locally optimal but globally undesirable choices. In this context, a literal chosen for the right-hand side of a clause may subsequently turn out to be unnecessary or even counterproductive. Once a clause has been developed, the literals of the right-hand side can be examined to see whether any could be omitted without compromising the accuracy of the clause as a whole. There have also been attempts to extend the basic algorithm with search strategies different from greedy search.

An alternative approach to inductive learning is based on inverse deduction. The basic intuition is that we must be able to start from an example, and infer the steps (ie, the necessary rules) that will help derive that example. Just like the approach used in FOIL, the inverse deduction process is computationally expensive.

### Concluding Remarks

Inductive logic programming bears similarity to data mining where we are interested in looking for patterns hidden in the data. Inductive logic programming provides us a framework in which any background knowledge can be taken into account in the process of learning.

Inductive logic programming differs from the statistical approaches to machine learning in that it does not depend on the availability of a large number of examples. As the knowledge learned by inductive logic programming is in the form of rules, it is more directly available for human examination and understanding.

There have been some applications of inductive logic programming in biology for learning rules for protein folding, and in natural language processing for learning information extraction rules. Considerable more research is needed to explore the full potential of inductive logic programming for practical applications.

### Exercises

Exercise 1. Enhance the kinship example considered in this chapter to include examples of the ancestor relationship, and use FOIL implementation to learn the rules to define it. A sample implementation of FOIL is available as part of the code accompanying the textbook Artificial Intelligence: A Modern Approach, and could be downloaded from here. The FOIL implementation is available in the source code file knowledge.py.