Inductive Logic Programming ## IntroductionInduction is reasoning from the specific to the general. For example, consider the following dataset on kinship that is similar to what we have considered in the earlier chapters.
Given the above dataset, we can use inductive reasoning to infer the following rules (or view definitions):
In inductive logic programming, given a dataset, a set of starting view definitions, and a target predicate, we can infer the view definition of the target predicate. In the example above, we were given a dataset, no starting view definitions, and we inferred the view definition of father and mother. In the context of the
inductive logic programming, the dataset is also referred to as a set
of positive examples. Some inductive reasoning algorithms also take a
set of negative examples. If negative examples are not provided, they
can be computed as the set of ground atoms in the Herbrand base that
are not present in the dataset. Set of positive and negative examples
taken together is also known as training data.In this lesson, we will consider a simple inductive learning algorithm to infer rules from data. The description here is based on an algorithm known as the First Order Inductive Learner (FOIL). ## An Example Inductive LearnerThe general approach used in an inductive learner is to start from the predicate whose definition is to be learned as the head of a a rule whose body is initialized to be empty. At each step, we add a literal to the body of the rule so that it satisfies several positive examples and none of negative examples. The literal to be added can be one of the predicates from the problem statement, the negation of a predicate from the problem statement, equality between two bound variables and inequality between two bound variables. Recursive literals are allowed if they will not cause an infinite regression. For choosing between alternative literals, a heuristic measure is used. An example pseudo code of an inductive learner is shown below.
To understand the working of this algorithm, let us consider how it learns the rule father(X,Y) :- parent(X,Y) & male(X). Two
positive examples, and the relevant background predicates are as
listed at the start of this section. There are ten negative examples
are listed below.
We start a new clause with father(X,Y) in the head, and
with an empty body. Here are the possible candidates of literals to be
added to the body.
To ensure the safety of the rules, we have not considered any negated literals, or a literal that checks for equality of inequality of unbound variables. Furthermore, we only consider those literals that share at least one variable with the predicate that we are defining. To choose the most useful literal to be added to the body of the rule, we use the following gain heuristic that measures the
contribution of the literal to the desired predicate definition.
In the above formula post_pos and post_neg
respectively refer to the positive and negative examples that satisfy
the rule after adding the literal, and pre_pos
and pre_neg respectively refer to the positive and negative
examples satisfied by the rule before adding the literal. T
refers to the number of positive examples that satisfied the rule
before adding the new literal, and are still covered after adding the
literal.As an example, let us compute the gain for literal male(X). For the starting clause with empty body, the
values for pre_pos and pre_neg are 2 and 10
respectively. Both positive examples satisfy father(X,Y) :-
male(X) giving the values for post_pos as 2. Four
negative negative examples satisfy this clause giving the values
for post_neg as 4. The value for L is 2. The overall
gain value can be calculated at 2.0. In contrast, if we consider the
literal female(X), none of the positive examples satisfy it,
and three negative examples satisfy it, giving an overall gain value
of 0. Therefore, male(X) will be preferred
over female(X). All the literals that are considered in the
first step, and their gain values are summarized in the table
below.
Let us understand why the value for post_pos for the
literal parent(X,Z) is 4 when the number of positive examples
is only 2. The literal parent(X,Z) introduces a new
variable Z. Given one positive example such as
parent(a,b), parent(a,Z) matches positive
examples, parent(a,b) and parent(a,c), giving us two
different matches (ie, X=a, Z=b, Y=b and X=a, Z=c,
Y=b). Therefore, total number of matches is 4. This computation
is incorporated into the algorithm by extending the set of positive
and negative examples to take into account the extra variable. For
simplicity, we have omitted the example extension step from the pseudo
code.The highest gain value in the above table is for the literal parent(X,Y). Therefore, at the end of the first
iteration, we have the rule father(X,Y) :- parent(X,Y). As
this rule is satisfied by a negative example, we must keep looking for
additional clauses. The literals considered and their gain values are
shown in the table below.
As we can see that the literal male(X) has the highest gain
and the resulting rule is satisfied by all the positive and negative
examples. Hence, we have successfully learned the
rule father(X,Y) :- parent(X,Y), male(X).
## Enhancements and VariationsThe basic version of the inductive learner considered in the previous section has numerous enhancements. Some important enhancement include ensuring that the rules remain stratified and making sure that rules do not lead to infinite recursion.In many real-world problems it is not possible to formulate rules that precisely characterize the domain. In such circumstances, many studies have found that simple, approximate concepts can be more valuable than over specified rules that "fit the noise." One approach to address this issue is to implement a stopping criteria for deciding that a clause should not be extended even if it is satisfied by some negative examples. These decisions are based on the perception that, for a sensible clause, the number of bits required to encode the clause should never exceed the number of bits needed to indicate explicitly the tuples covered by the clause. Any greedy algorithm is prone to making locally optimal but globally undesirable choices. In this context, a literal chosen for the right-hand side of a clause may subsequently turn out to be unnecessary or even counterproductive. Once a clause has been developed, the literals of the right-hand side can be examined to see whether any could be omitted without compromising the accuracy of the clause as a whole. There have also been attempts to extend the basic algorithm with search strategies different from greedy search. An alternative approach to inductive learning is based on inverse deduction. The basic intuition is that we must be able to start from an example, and infer the steps (ie, the necessary rules) that will help derive that example. Just like the approach used in FOIL, the inverse deduction process is computationally expensive. ## Concluding RemarksInductive logic programming bears similarity to data mining where we are interested in looking for patterns hidden in the data. Inductive logic programming provides us a framework in which any background knowledge can be taken into account in the process of learning.Inductive logic programming differs from the statistical approaches to machine learning in that it does not depend on the availability of a large number of examples. As the knowledge learned by inductive logic programming is in the form of rules, it is more directly available for human examination and understanding. There have been some applications of inductive logic programming in biology for learning rules for protein folding, and in natural language processing for learning information extraction rules. Considerable more research is needed to explore the full potential of inductive logic programming for practical applications. ## ExercisesExercise 1. Enhance the kinship example considered in this chapter to include examples of the |