Todd is a programmer/analyst with the Institute of Geophysics and Planetary Physics at UCLA. He is also associated with the NASA/JPL Planetary Data Systems project. Todd can be reached at 1104 N. Orchard, Burbank, CA 91506.
Not long ago I needed to implement an expert system, so I started looking for tools I could use to build one. I was surprised to learn that there are over two dozen expert system tools for the PC. In order to select one (or more), I began examining expert systems in detail and, as I studied how they are implemented, terms like "rules" and "actions" began cropping up everywhere. These terms weren't totally foreign to me; I'd run across them before in tools such as YACC. Once I realized that you can build expert systems with it, I began looking at YACC differently.
In this article, I'll discuss just how you can build an expert system with YACC using Turbo C, Version 2.0 and MKS YACC from Mortice Kern Systems (MKS). MKS YACC (which together with MKS LEX make up the MKS LEX & YACC package) is identical in function to those commonly found on Unix systems, so the code presented here should be easily ported to Unix platforms. All development and testing in this article was done on a 12.5-MHz PC compatible running MS-DOS.
Expert systems can be classified into five categories: procedural, diagnostic, monitoring, configuration/design, and planning/scheduling. Procedural expert systems are step-by-step systems and typically take the form of a series of menus -- hypertext systems, desktop publishing systems, and application development environments, for example. Traditionally, these types of systems are not referred to as expert systems because the primary purpose of such systems is not to provide expert advice, conclusions, or actions, but rather to provide some other service. The expert elements of these systems exist simply to aid in the primary goal.
Diagnostic, monitoring, configuration/design, and planning/scheduling systems are commonly thought of as expert systems because their primary goal is to provide some expert guidance or advice. Diagnostic systems typically have a large number of rules and conclusions, and can provide estimates of the confidence of any conclusion -- a medical diagnostic system, for example.
Monitoring systems typically run continuously and sample various states of another system. Based on these states it can provide conclusions about the system as a whole.
Configuration/design systems consist of rules on how things may be linked together. You might specify to such a system a final product and it will inform you on how that product could be assembled or built. Closely related to Configuration/design systems are planning/scheduling systems. They differ in that with planning/scheduling systems time is a factor.
A common characteristic of all expert systems is that they have a "knowledge base" which usually consists of a set of rules that detail how data and information is interpreted. In some applications the knowledge base is maintained as a set of explicit rules, where as in others it's simply a matter of the structure and function flow of an application. In either case, an expert system operates by stepping through one rule after another in search of a conclusion. In general, expert systems use one of two methods to search for a conclusion. The first method is called "forward chaining" and is a goal-oriented, data-driven system in which input is supplied to the system and a conclusion is derived based upon these inputs. An example of a forward chaining system is a monitoring system that could accept a stream of data on the height of the terrain and provide an analysis of the particular landforms it "sees." This is precisely the type of system I'll implement in order to demonstrate how you can use YACC to build an expert system.
The second method is "backward chaining" and is a solution-oriented, fact-driven system in which an event is presented to the system and the cause of the event is derived. An example of a backward chaining system is a diagnostic system which could determine the cause of a computer failure. With such a system, you would supply a description of the current state of the computer. The expert system would then search its knowledge base for all possible causes. The system could then present a list of causes or it might ask for additional information in order to further refine the list.
Knowledge can be rather ambiguous at times. You can't always describe something with a mathematical formula -- sometimes you must deal with facts at a more abstract level. A landform recognizer is a good example of this. You might ask an expert on landforms (a geologist) "How do you classify a form as a 'peak'?" The expert might respond "Well, the land slopes upward and then it slopes downward." (Such a statement will lead to other questions, such as "How do you define a slope?" But for now let's not pursue this answer.)
Using YACC's rule syntax you could define a peak with the following text: peak: UP_SLOPE DOWN_SLOPE
In YACC, this is a rule and it would read as "a peak is an up slope followed by a down slope," which is how the expert described a peak. The extraction and codification of an expert's knowledge is called "knowledge engineering." The coding, in this case, is a rather simple operation.
As the conversation with the expert continues, you would probably find out that there are a variety of landforms that include peaks, valleys, plateaus, basins, and shelves. Listing One details all of the rules that describe each of these pieces of knowledge.
Interspersed with the rules in Listing One you'll find text, most of which is C code, and what remains are directives to YACC. I won't describe every detail of Listing One. What's important to note is that with YACC all actions associated with a rule are defined using C code. Once Listing One is processed by YACC, you will obtain a file that contains just C source code. This source code implements a state machine that consists of the rules translated into the proper C instructions and the actions you specified. This allows for the seamless integration of the output of YACC into any application.
Every state machine generated by YACC has three interfaces. The first is a function, called yyparse( ), which you can call within an application to start the state machine. The second interface is a function, called yylex( ), that you supply so that you can provide the fuel for the engine. For a state machine, this fuel is a series of tokens. The only allowed tokens yylex( ) can return are defined by the %token directive (there's one in Listing One). The third interface reports errors when the state machine encounters an impossible combination of tokens. Listing Two contains the source for a yyerror( ). Whenever yyerror( ) is called it prints a "?!" to indicate an impossible combination. It then clears the token stack so that the application can recover from the error.
Listing Three contains the source for a yylex( ) that will meet our immediate needs. In addition to yylex( ) there are the functions get_token( ), token_copy( ), location_copy( ), trend( ), pop_token( ), push_token( ), and init_source( ). The names of these functions reflect their purpose. Collectively, the functions in Listing Three read a source of data and tokenize the indivisible features (direction of slope), which exist in the data. It's in the function trend( ) that the definition of slope is codified. The slope is always calculated between adjacent data points and regions with the same slope are reduced to a single token.
One aspect of a landform expert system that complicates the source in Listing Three is that features overlap one another. That is, a peak and a valley which are adjacent to each other share a common line segment. Because the state engine produced by YACC consumes all tokens when a rule is satisfied, we must maintain a stack of tokens within yylex( ). Doing so allows us to accommodate overlapping features without fiddling with the simple form of the YACC rules specifications.
The source code for the expert system is designed to work either in a graphics or text mode. This is accomplished by using the preprocessor if/else/endif directive to selectively include code for each mode. If you wish to use a graphics mode simply define the token GRAPHICS. The simplest way to ensure that this token is defined for all source code is to place it in the include file lfclass.h. Listing Four contains the source for lfclass.h. In addition to the GRAPHICS token, there are several new data types that have been defined. These data types in class types are used throughout the source code.
All output produced by the landform expert system is rendered by the functions label( ), display_message( ), and draw_trace( ), with the help of set_screen_max( ), set_view_max( ), init_graphic( ), deinit_graphics( ), and scaled( ). These functions, which can be found in Listing Five serve as standard entry points for implementation specific graphics calls. The Turbo library is used in this implementation.
Now let's put all the source code just presented to work. Listing Six contains the source for a fully functional expert system and a makefile to aid in constructing the example provided in Figure 1. When the source in Listing Six is compiled and linked with the other files in this project, you will have an application that expects one command line argument. This argument should be the name of a file that contains terrain height data. The format of this data must be a pair of numbers separated by commas. The first number of the pair is the horizontal position of the sample and the second is the height of the terrain at the point.
Figure 1: A makefile for building the example application
#----------------------------------------------------------------------
# Makefile for the generation of a land form expert system -- Todd King
#----------------------------------------------------------------------
CFLAGS = -mc
LDFLAGS = $ (CFLAGS)
YFLAGS = -d
OBJ = example.obj lfx.obj lfclass.obj render.obj error.obj
example.exe : $ (OBJ)
$ (LD) $ (LDFLAGS) $ (OBJ) graphics.lib
lfx.c : lfx.y lfclass.h
lfclass.obj : lfclass.c lfclass.h ytab.h
render.obj : render.c lfclass.h
error.obj : error.c ytab.h
clean :
del *.obj
del lfx.c
A set of terrain data that contains one of each feature is provided in Figure 2. When you run the example (and if graphical output is selected) you will see a drawing of the representation of the terrain data with each landform properly labeled. If graphical output is not selected, then the type of landform and its location is printed to the screen. The final landform in every possible data set will be labeled as impossible. This is actually reasonable because the land ends abruptly. This might have been true before Columbus sailed to the new world, but is not true today.
0, 50
10, 60
20, 40
30, 30
40, 30
55, 50
60, 50
70, 80
80, 20
90, 80
100, 50
105, 50
110, 20
In most expert system shells (the typical tools used to build expert systems), the knowledge base is often constructed using a series of if/then statements called "production rules." There are several aspects of using YACC that sets it apart from the production rule approach, foremost being the compact YACC syntax. Also, the source produced by YACC contains efficient stack management code, which buffers tokens so that a match for the longest sequence of tokens is always found. Another benefit is that it is possible to have rules that are dependent on other rules. The expert system that results from using YACC is a C function which is compiled into machine code and executes very quickly, and the function can be included in any program. A limitation of the YACC approach, however, is that only forward-chaining systems can be constructed.
MKS LEX & YACC MKS Inc. 35 King Street North Waterloo, Ontario N2J 2W9 Canada 800-265-2797 $249 (DOS), $395 (OS/2)
_YACC FOR EXPERT SYSTEMS_ by Todd King[LISTING ONE]
/*-------------------------------------------------------------------------
FILE: lfx.y -- Rules and actions for a land form expert system -- Todd King
--------------------------------------------------------------------------*/
%{
#include <string.h>
#include "lfclass.h"
%}
%union {
LOCATION location;
}
%token <location> UP_SLOPE DOWN_SLOPE HORIZONTAL DONE
%start terrain
%%
terrain: land_form
{
YYACCEPT;
}
;
land_form : peak
| valley
| plateau
| basin
| shelf
| DONE
;
peak : UP_SLOPE DOWN_SLOPE
{
label("peak", $1.start_x, $2.end_x);
}
;
valley : DOWN_SLOPE UP_SLOPE
{
label("valley", $1.start_x, $2.end_x);
}
;
plateau : UP_SLOPE HORIZONTAL DOWN_SLOPE
{
label("plateau", $1.start_x, $3.end_x);
}
;
basin : DOWN_SLOPE HORIZONTAL UP_SLOPE
{
label("basin", $1.start_x, $3.end_x);
}
;
shelf : DOWN_SLOPE HORIZONTAL DOWN_SLOPE
{
label("shelf", $1.start_x, $3.end_x);
}
| UP_SLOPE HORIZONTAL UP_SLOPE
{
label("shelf", $1.start_x, $3.end_x);
}
;
%%
[LISTING TWO]
/*-------------------------------------------------------------------------
FILE: error.c -- Error reporting and recover function called by the YACC
state engine whenever an impossible combination of states is encountered.
Todd King
---------------------------------------------------------------------------*/
#include "lfclass.h" /* Must precede "ytab.h" */
#include "ytab.h"
yyerror(c)
{
extern SOURCE Map_source;
TOKEN *token;
label("?!", Map_source.last_token.location.start_x,
Map_source.last_token.location.start_x);
token = pop_token(&Map_source);
while(token->value != DONE) { /* Clear stack */
token = pop_token(&Map_source);
}
Map_source.last_token.value = UNDEF_TOKEN;
return(0);
}
[LISTING THREE]
/*--------------------------------------------------------------------------
FILE: lfclass.c -- All functions used to identify indivisable features of
the land form. This is just the slope of the terrain. -- Todd King
--------------------------------------------------------------------------*/
#include "lfclass.h" /* Must precede "ytab.h" */
#include "ytab.h"
yylex() {
extern SOURCE Map_source;
TOKEN last_token;
TOKEN *tptr;
int x;
char buffer[10];
token_copy(&last_token, get_token(&Map_source));
token_copy(&Map_source.last_token, &last_token);
if(last_token.value == DONE) return(last_token.value);
tptr = get_token(&Map_source);
while(tptr->value == last_token.value) {
tptr = get_token(&Map_source);
}
push_token(&Map_source, tptr);
location_copy(&yylval, &last_token.location);
return(last_token.value);
}
TOKEN *get_token(source)
SOURCE *source;
{
if(source->in_buffer > 0) {
return(pop_token(source));
} else {
return(trend(source));
}
}
token_copy(t1, t2)
TOKEN *t1;
TOKEN *t2;
{
t1->value = t2->value;
location_copy(&t1->location, &t2->location);
}
location_copy(l1, l2)
LOCATION *l1;
LOCATION *l2;
{
l1->start_x = l2->start_x;
l1->start_y = l2->start_y;
l1->end_x = l2->end_x;
l1->end_y = l2->end_y;
}
TOKEN *trend(source)
SOURCE *source;
{
static TOKEN _Trend_token;
int delta;
int x, y;
_Trend_token.value = DONE;
if(source->last_y == NOT_DEF) {
if(fscanf(source->fptr, "%d, %d", &source->last_x, &source->last_y) < 2) {
return(&_Trend_token);
}
}
if(fscanf(source->fptr, "%d, %d", &x, &y) < 1) {
return(&_Trend_token);
}
_Trend_token.location.start_x = source->last_x;
_Trend_token.location.start_y = source->last_y;
_Trend_token.location.end_x = x;
_Trend_token.location.end_y = y;
delta = y - source->last_y;
source->last_y = y;
source->last_x = x;
if(delta < 0) {
_Trend_token.value = DOWN_SLOPE;
} else if(delta > 0) {
_Trend_token.value = UP_SLOPE;
} else {
_Trend_token.value = HORIZONTAL;
}
return(&_Trend_token);
}
TOKEN *pop_token(source)
SOURCE *source;
{
static TOKEN _Pop_token;
_Pop_token.value = DONE;
if(source->in_buffer <= 0) return(&_Pop_token);
source->in_buffer--;
return(&source->token_buffer[source->in_buffer]);
}
push_token(source, token)
SOURCE *source;
TOKEN *token;
{
if(source->in_buffer >= MAX_TOKEN_BUFFER) {
return(-1);
}
if(token->value == UNDEF_TOKEN) {
return(-1);
}
token_copy(&source->token_buffer[source->in_buffer], token);
source->in_buffer++;
return(source->in_buffer);
}
init_source(source)
SOURCE *source;
{
source->in_buffer = 0;
source->fptr = NULL;
source->last_y = NOT_DEF;
source->last_token.value = UNDEF_TOKEN;
}
[LISTING FOUR]
/*----------------------------------------------------------------------
FILE: lfclass.h -- Type and misc. definitions for land form classifier.
Todd King
------------------------------------------------------------------------*/
#ifndef _LFCLASS.H_
#define _LFCLASS.H_
#define GRAPHICS 1 /* If defined graphics output is used */
#include <stdio.h>
/* Defines for use with the scaled() function */
#define X_AXIS 1
#define Y_AXIS 2
/* Definitions for token and token source items */
#define NOT_DEF -1
#define UNDEF_TOKEN 0 /* Must less than 255 for YACC's sake */
#define MAX_TOKEN_BUFFER 8
#define LABEL_AT_Y 12
typedef struct {
int start_x;
int start_y;
int end_x;
int end_y;
} LOCATION;
typedef struct {
LOCATION location;
int value;
} TOKEN;
typedef struct {
FILE *fptr;
int last_y;
int last_x;
TOKEN last_token;
int in_buffer;
TOKEN token_buffer[MAX_TOKEN_BUFFER];
} SOURCE;
/* Function Definitions */
TOKEN *pop_token();
TOKEN *get_token();
TOKEN *trend();
#endif _LFCLASS.H_
[LISTING FIVE]
/*---------------------------------------------------------
FILE: render.c -- All rendering functions -- Todd King
----------------------------------------------------------*/
#include <graphics.h>
#include "lfclass.h" /* Must Precede "ytab.h" */
#include "ytab.h"
/* Variables for graphics output */
int View_max_x;
int View_max_y;
int Screen_max_x;
int Screen_max_y;
set_screen_max(x, y)
int x;
int y;
{
Screen_max_x = x;
Screen_max_y = y;
}
set_view_max(fptr)
FILE *fptr;
{
int x, y;
View_max_x = 0;
View_max_y = 0;
while(fscanf(fptr, "%d, %d", &x, &y) > 0) {
if(x > View_max_x) View_max_x = x;
if(y > View_max_y) View_max_y = y;
}
rewind(fptr);
}
init_graphics()
{
#ifdef GRAPHICS
int gdriver;
int gmode;
/* Initialize graphics */
detectgraph(&gdriver, &gmode);
if(gdriver < 0) {
fprintf(stderr, "Unable to determine graphics device.\n");
return(0);
}
initgraph(&gdriver, &gmode, "c:\\turboc");
if(gdriver < 0) {
fprintf(stderr, "Unable to determine graphics device.\n");
return(0);
}
/* A few initializations. */
set_screen_max(getmaxx(), getmaxy());
#endif GRAPHICS
return(1); /* Success */
}
deinit_graphics()
{
closegraph();
}
display_message(msg)
char msg[];
{
#ifdef GRAPHICS
moveto(0, getmaxy() - 12);
outtext(msg);
getch();
#else
printf(msg);
getch();
#endif GRAPHICS
}
draw_trace(fptr)
FILE *fptr;
{
#ifdef GRAPHICS
int x, y;
int first = 1;
/* Draw surface */
while(fscanf(fptr, "%d, %d", &x, &y) > 0) {
if(first) {
moveto(scaled(x, X_AXIS), scaled(y, Y_AXIS));
first = 0;
}
lineto(scaled(x, X_AXIS), scaled(y, Y_AXIS));
}
rewind(fptr);
#endif GRAPHICS
}
label(text, start_x, end_x)
char text[];
int start_x;
int end_x;
{
#ifdef GRAPHICS
settextjustify(CENTER_TEXT, BOTTOM_TEXT);
moveto(scaled(start_x + (end_x - start_x) / 2, X_AXIS),
LABEL_AT_Y);
outtext(text);
settextjustify(LEFT_TEXT, BOTTOM_TEXT);
#else
printf("%s @ %d\n", text, start_x + (end_x - start_x) / 2);
#endif GRAPHICS
}
scaled(xy, axis)
int xy;
int axis;
{
int new_xy;
switch(axis) {
case X_AXIS:
new_xy = ((float)xy / View_max_x) * Screen_max_x;
break;
case Y_AXIS:
new_xy = Screen_max_y - ((float)xy / View_max_y) * Screen_max_y;
break;
}
return(new_xy);
}
[LISTING SIX]
/*----------------------------------------------------------------------
FILE: example.c -- An example integration of all the land form expert
system code -- Todd King
-----------------------------------------------------------------------*/
#include <stdio.h>
#include "mclass.h" /* Must precede "ytab.h" */
#include "ytab.h"
#define MAX_WIDTH 79
#define MAX_HEIGHT 10
SOURCE Map_source;
main(argc, argv)
int argc;
char *argv[];
{
FILE *fptr;
int temp[2];
/* Check arguments, if OK open file */
if(argc < 2) {
printf("Proper usage: example {terrain hieght file}\n");
exit(0);
}
init_source(&Map_source);
if((Map_source.fptr = fopen(argv[1], "r")) == NULL) {
perror(argv[1]);
exit(0);
}
init_graphics();
/* A few initializations. */
set_view_max(Map_source.fptr);
draw_trace(Map_source.fptr);
/* Now classify all features. */
while(Map_source.last_token.value != DONE) {
yyparse();
push_token(&Map_source, &Map_source.last_token);
}
/* All done, wait before cleaning up. */
display_message("Press any key to exit ...");
deinit_graphics();
exit(0);
}
[FIGURE 1]
#-------------------------------------------------------------------------
# Makefile for the generation of a land form expert system -- Todd King
#--------------------------------------------------------------------------
CFLAGS = -mc
LDFLAGS = $(CFLAGS)
YFLAGS = -d
OBJ = example.obj lfx.obj lfclass.obj render.obj error.obj
example.exe : $(OBJ)
$(LD) $(LDFLAGS) $(OBJ) graphics.lib
lfx.c : lfx.y lfclass.h
lfclass.obj : lfclass.c lfclass.h ytab.h
render.obj : render.c lfclass.h
error.obj : error.c ytab.h
clean :
del *.obj
del lfx.c
[FIGURE 2]
0, 50
10, 60
20, 40
30, 30
40, 30
55, 50
60, 50
70, 80
80, 20
90, 80
100, 50
105, 50
110, 20
120, 20
Copyright © 1991, Dr. Dobb's Journal