This month we begin the construction of a script interpreter for the SMALLCOM communications program. A script is a command language that allows a communications program to automatically interact with an online service. The language we'll use for this project is a subset of C, so the interpreter looks a lot like a traditional C language interpreter. For this phase of the project we will build an interpreter "engine," one that we can use to interpret C-like statements for general purposes. This month we'll concentrate on the engine. Later we'll integrate it into SMALLCOM as a script interpreter and try out some scripts.
SMALLCOM manages the connection with remote computers running online services, bulletin boards, other communications programs, or SMALLCOM itself. Most such remote programs involve sign-on sequences where the callers identify themselves, and the remote program grants or denies access. Once the caller gets in, a command language gives access to electronic mail, conferences, forums, files to download, and places to upload files. Virtually every such service is unique with respect to its sign-on sequence and command language. Most BBSs use one of the popular BBS programs (RBBS, FIDO, and so on), but have their own file and forum architecture. The online services are usually made of custom software and have their own proprietary command languages.
The command languages share one trait: They are designed to be understood by people in an interactive dialogue. This approach makes the services available to users with terminals and modems as well as to those with personal computers. The PC users have an advantage. They can use their PCs to automate routine parts of the dialogue. If you can execute the sign-on, move into the forum of your choice, send your messages, capture messages from other members, and all at the speed of the modem, then you have saved connect charges. A shareware program named TAPCIS automates this procedure for CompuServe. Other services are supported by scripts in shareware and commercial communications programs. A general-purpose communications program that uses a script must have a script processor.
Let's examine the requirements for a communications script. Once you have connected with a service, the script processor watches the input stream and waits for one or more character sequences. When one appears, the processor makes a decision. Perhaps it sends a character sequence to the service. It might tell the service to execute commands, such as to upload or download files. When the script processor decides that the online tasks are done, it disconnects. These actions are the same ones you would take if you signed on and did the typing yourself. You could hard-code the decisions, character sequences, and commands into a dedicated script processor such as the one in TAPCIS, or you could develop a general-purpose script processor that uses an external script file such as is done by ProComm. Our script processor will use an external interpreted script language that resembles C. We call the script language "S" and the script processor program the "S Interpreter," or simply SI.
SI's operation will be mostly transparent to the rest of SMALLCOM. It will be invoked when you dial a service that has a script named in the SMALLCOM phone directory. SI will use the waitforstring function in SMALLCOM to watch for character sequences, interceding when required. To emulate the user's keystrokes, SI will stuff the characters it wants to type into the BIOS keyboard buffer. These characters will then be returned to the program as if they had been keyed by the user.
S is a subset of C. If you know C you know S once you learn what the subset includes. I chose this approach for three reasons. First, the syntax of C is easily processed by an interpreter. Second, the decision-making properties of a communications script processor are well supported by C's structured programming features. Third, this is the "C Programming" column, and we all know C to one degree or another. Why wrestle with a new language?
The S syntax differs from C in these ways: S has two data types, the int and the char, and the two are usually synonymous. Either one can be declared as a pointer or used as a pointer by reference. A variable works the same whether you declare it as a pointer or not. If you reference it by name alone, you get its value. If you reference it with an asterisk in front of its name, then what the variable points to is what is returned. This is weak typing to the extreme and follows in the waning spirit of the earliest C compilers. All pointers are treated as character pointers. You can code a pointer-to-pointer reference but it will not work. All variables are treated as integers except in the case where the chartype is used in an array of char pointers like this:
This format is valid only in the global, external scope. Scope rules in S are the same as those in C.
S has no other arrays, structures, or unions. It supports decimal and hex constants and lets you code string literals as parameters to functions or identifiers to ints or chars (which then become implied character pointers). You build a script consisting of a main function and subordinate functions of your own. S provides for a library of intrinsic functions that support the environment the interpreter will support.
All global variables, except function parameters, are by default, static and extern. All local variables are auto. Because these scopes are defaults, S does not recognize the keywords. There are no shifts, bitwise logical operators, continue, do, goto, typedef, switch, casts, prototypes, or function declarations. The <expr>?<expr>:<expr> operation is not supported. The <op>= operators (+=, -=, etc.) are not supported. Every function is assumed to return an int unless it returns nothing. The unary + and - operators and the ones-complement operator are not supported. There is no preprocessor.
S includes for, while, break, return, if, else, brace surrounded statement blocks, recursion, the && and | | operators and the ! (not) and & (address of) operators. The relational operators = =, !=, <, >, <=, and >= are supported. Comments are the same as in C.
An implementation of SI must provide a library of intrinsic functions. These are not functions such as you would have in an object library or in interpreted source code, but ones that are built into the interpreter itself. Intrinsic functions are what give the generic interpreter its intended functionality. SI is built under a shell program that includes the intrinsic functions, provides an array of structures that describes the intrinsic functions, manages the source code file and errors, and executes the three phases of the interpreter.
The SI interpreter code is built to be reusable. We will separate those parts of SI that are script specific into a shell code module so that the interpreter can be used for other purposes. For example, we might add a macro processor to the editor. By isolating the interpreter from its original purpose, we leave the door open for its use in other areas.
Listing One is interp.h, a header file that an application shell will use to implement the interpreter. The first several items are configuration global macros that you set to the values that define limits for the interpreter. The first such global, TOK-BUFSIZE, defines the number of characters in the token buffer. Each S keyword and operator is translated into a token. Each identifier and string is kept in the token buffer. Each newline character in the source stream is recorded in the buffer with a newline token and a three character line number. If you get the error that says the token buffer has overflowed, you will need to increase the TOKBUFSIZE global or shorten the source file. The MAXSYM-BOLS global controls how many symbols can be in the symbol table at a given time. This maximum is equal to the total number of functions and external globals with any local automatics that are in scope. As locals go out of scope, they are removed from the symbol table and their entries are reused. MAXSTRINGS is the maximum number of strings that can appear in external character pointer arrays. MAXPARAMS is the maximum number of parameters a function can have.
The error codes are defined as an enumerated data type. SI is managed by a shell program, an example of which is described later. The shell provides all error processing and must react to these codes. The shell also provides the table of intrinsic functions. The format for entries in that table is described by the INTRINSIC structure.
Interp.h defines the format of the symbol table, which is a structure with four elements. The symbol table has an entry for each function and variable in the source file. The table entry includes a pointer to the symbol's name. If the symbol is a function, the entry has a pointer to the location in the token buffer of the start of the function's tokens. If the symbol is an array of character pointers, the table has a pointer to the array. If the symbol is an integer or pointer, the table has an entry that contains the symbol's value. Pointers in S are always equivalent to integers. Finally, interp.h contains function prototypes and external references to the critical data items produced by the lexical scan and linker. These references will allow a shell to record the pseudocompiled tokens for repeated interpreting later.
Listing Two is interp.c, the core of the interpreter. You would link this program with a shell that manages the source file and user interface and provides the intrinsic functions that give SI its purpose. These functions are described later.
Interpreting the S language involves these three steps. First, the source file is read and the statements are translated into tokens in the token buffer. In traditional language processing this step is called the lexical scan. The load-source function in interp.c starts the lexical scan. If the lexical scanner finds a character sequence in the source file that it cannot recognize, it declares an error and quits. During the lexical scan, source file line number tokens are inserted into the token buffer. These line number tokens identify the source file lines when errors occur. If anyone ever writes a debugger for the S language, these line numbers might be useful.
When loadsource has successfully loaded the source file and translated it into lexical tokens, loadsource calls the linker function. The linker scans the token buffer and builds the global symbol table, which contains the names and locations of functions and the names and values of variables. When the linker finds errors in the S grammar the scan terminates. Following this step the three data structures, the token buffer,string space, and symbol table could be written to a file for later interpretation. SI, as we build it here, will not take that intermediate step but will move directly into the interpreter. We do, however, provide for that feature in case we should want it later. An editor macro interpreter would probably use the compile-now-interpret-later strategy to speed up the process.
After the linker is finished, loadsource returns to its caller, the applications shell program that must start the interpreter by calling the interpreter macro defined in interp.h. This macro sets up a pseudotoken buffer with a tokenized call to the main function of the S language source program. The macro calls function in the interpreter and gets the main function underway. This procedure interprets the tokens, executes the language, and parses the S grammar for errors. Obviously, scripts should be tested before they are used on an expensive long-distance hookup.
Interpreting begins in function to execute the S main. All subsequent S function calls come through here, so the process is recursive. First, the interpreter looks for a parameter list in the function call. Each parameter is evaluated by the expression function, and its value is stored in an array of incoming parameters. Then the interpreter searches the table of intrinsic functions to see if the function is one of them. If so, the interpreter calls the intrinsic function through its pointer in the INTRINSIC structure array in the shell. If not, the interpreter searches the global symbol table for the called function and points the token pointer to the beginning of that function. The parameters named in the function declaration go into the local symbol table and the parameter values from the caller go into the value elements of the new local symbols. The interpreter now begins interpreting the function's procedure by calling the compound_statement function.
The compound_statement function processes brace surrounded statement blocks. It adds variables within the block to the local symbol table and then executes each statement in the block by calling the statements function. The statements function calls compound_statement recursively if it finds another left brace in the token stream. Otherwise it calls the statement function.
The statement function processes if, while, for, return, break, and simple expression statements. The first three operators involve expression evaluation and executing or skipping blocks of tokenized statements. The return and break set flags to be sensed by subsequent statements. Simple expressions are evaluated by the expression function.
SI evaluates expressions with a recursive descent parsing algorithm. The precedence of operators is controlled by the position of the operators' processing functions in the recursive descent function chain. You can trace the progress of this chain by beginning with the expression function. This algorithm expects the program token pointer variable, tptr, to point to the first token of an expression. When the expression function returns the evaluated result, the pointer will point to the first token past the expression. First the expression function calls the and function to get its result. Then, as long as there are logical or tokens in the expression, the result is ORed with subsequent results from the and function. Looking up at the and function, you can see flat it has a similar relationship with the eq function above it. All these operator functions trace up the chain of precedence to the primary function, which evaluates a primary element of an expression. A primary element is a constant, pointer, or variable. If, however, the primary function finds a left parenthesis as the next token, the primary function starts the evaluation all over again by first calling expression and then requiring a right parenthesis. The primary function handles the NOT operator, the pointer operator, the address-of operator, and auto-increment and decrement operators. Each function in the precedence chain returns its result to the function below it until control is back at the expression function. If you wanted to change the precedence of operators, you would change the sequence of these functions in the chain. When two or more operators have the same precedence, as do multiply and divide, they are processed together in the same function and have left-to-right precedence because the token scan is in that direction.
Listing Three is si.c, a throw-away demonstration shell that lets you test SI. It illustrates how an application would be integrated with the interpreter and has three intrinsic functions: printf, getchar, and putchar. An SI shell program has four responsibilities. First, the shell must provide the intrinsic functions and the array of INTRINSIC structures that describes them. Second, the shell must manage the S source code file by providing the functions named getsource and ungetsource, which the interpreter calls to get and push back characters from and to the source file. This way the interpreter does not care whether the code comes from a file or an edit buffer. Third, the shell must execute the interpreter by calling get-source and interpreter in that order. Finally, the shell must provide the sierrorfunction to manage the errors found by the interpreter. This strategy hides the details of error processing from the interpreter. The interpreter passes to sierroran error code, a string that might amplify the error such as the word being parsed when the error occurred, and the source code line number where the error was found.
For a look at a different approach to a subset C interpreter, watch for the DDJ annual C issue in August. Well-known C author Herbert Schildt has contributed an article that includes such a program. While our S interpreter is intended to be an engine for interpreting C-like script languages, Schildt's is oriented to the execution of C source programs and is offered as a study in interpreter design and implementation. You will see many similarities in our approaches, perhaps because interpreter theory is well-defined and understood. We selected different subsets of C, and some of our terminology differs, but the principles are similar.
After a year on the market, QuickC 1.0 is replaced by QuickC 2.0. QuickC 1.0 was a compiler with plenty of bugs and some design features that hindered the development environment. Its integrated environment limited you to the medium memory model. The .OBJ files generated by the environment and the command line compiler were incompatible. The .EXE files generated to be run by the environment would not run from the command line. There were known, but unacknowledged and unattended compiler bugs from the beginning.
QuickC 2.0 is another story. Microsoft has eliminated the dubious features and seems to have cleared up the bugs. QuickC 2.0 is faster than Microsoft C 5.1 but generates bigger programs. On a 20-MHz 386, QuickC built a 47,481 byte TWRP.EXE file in 1 minute, 14 seconds. MSC built a 35,507 byte file in 2 minutes, 59 seconds.
The programs in our "C Programming" column project now compile correctly with QuickC 2.0. You will recall from January that I had a problem with one module. The problem is gone with 2.0.
Microsoft changed the makefile syntax for their new NMAKE program, which works more like the traditional make programs that programmers are accustomed to. If you have the older MAKE program, you can use the make-files I published by changing the cl command to the qcl command to use the QuickC 2.0 command line compiler. With minor modifications, the makefiles can be adapted to NMAKE.
QuickC includes a book called C For Yourself that describes the C language at the introductory level -- at least as well as most books you will find at the store. The weakness in this book is its treatment (or neglect) of the run-time library. Microsoft expects you to use the Microsoft Advisor, their on-line help utility, for complete library reference information. This assumes that you use the QuickC editor, which many of us will not. QuickC has no equivalent to Turbo C's THELP program, which pops up the help data base from inside your own editor. You do get the HELPMAKE utility program that lets you modify and add to the help texts. Third-party library developers can provide help texts that users can integrate into their development environments. Microsoft does not, however, supply the format of the compressed help text database, which a TSR developer would need to write a THELP clone for QuickC.
In January I took Borland to task for conforming to what they said was a new ANSI rule that allowed the back-slash octal escape sequence to exceed three digits. The effect of the rule was that existing code would no longer compile the same way, and there was no warning message to let you know.
No sooner had the issue hit the stands than I found myself reading a letter from P.J. Plaugher, a member of the committee, and talking to Tom Plum, the vice chairman. Both members assured me in certain terms and with authority that the so-called rule was not the rule at all and that tradition prevails with respect to the octal escape sequence. Discussions with Borland reveal that they genuinely believed they were tracking the evolving ANSI standard with accuracy. The misunderstanding resulted from Borland's interpretation of the committee vote on the matter. They decided to conform to the rule and we had our original discussions (and I wrote my criticism) before the December draft was published. Now the draft is out, and Borland will fix the discrepancy in a future release of Turbo C.
The consequences of the issue will vary from user to user. Those of you using Turbo C 2.0 for new software are well-advised to avoid the octal escape sequence until the problem is fixed. The new ANSI hex sequence works as specified and will do the job. Those of you using Turbo C 2.0 to compile existing programs are counseled to watch for the offending octal sequences.
The upshot of this episode is that I am now the proud owner of a copy of the "Draft Proposed American National Standard for Information Systems -- Programming Language C" document dated December 7, 1988 and a card-carrying member of the X3J11 committee. Just in time -- their work is done.
All source code for articles in this issue is available on a single disk. To order, send $14.95 (Calif. residents add sales tax) to Dr. Dobb's Journal, 501 Galveston Dr., Redwood City, CA 94063, or call 800-356-2002 (from inside Calif.) or 800-533-4372 (from outside Calif.). Please specify the issue number and format (MS-DOS, Macintosh, Kaypro).
_C Programming Column_
by Al Stevens
[LISTING ONE]
Copyright © 1989, Dr. Dobb's JournalThe S Language
char*mylist[]={"password?","offline"};Intrinsic Functions
The S Interpreter
The SI Shell
Another C Interpreter
QuickC 2.0: Worth Another Look
ANSI Answers
Availability
/* ---------------- interp.h -------------------- */
#define TOKBUFSIZE 4096 /* token buffer size */
#define MAXSYMBOLS 100 /* maximum symbols in table */
#define MAXSTRINGS 50 /* maximum strings in arrays */
#define MAXPARAMS 10 /* maximum parameters in calls */
/* ----------- error codes ----------------- */
enum errs {EARLYEOF,UNRECOGNIZED,DUPL_DECLARE,TABLEOVERFLOW,
OMERR,UNDECLARED,SYNTAX,BRACERR,PARENERR,MISSING,
NOTFUNC,BREAKERR,OUTOFPLACE,TOOMANYSTRINGS,BUFFULL,
DIVIDEERR};
/* ------- intrinsic function table -------- */
typedef struct {
char *fname;
int (*fn)(int *);
} INTRINSIC;
/* -------- symbol table ------------ */
typedef struct {
char *symbol; /* points to symbol name */
char *location; /* points to function code (NULL if int) */
char **tblloc; /* points to table array (NULL if func) */
int ival; /* value of integer */
} SYMBOL;
/* ------- function prototypes -------- */
void loadsource(void);
int function(char *, SYMBOL *);
#define interpret() function("main\0();", symtop);
/* ------ functions provided by the shell -------- */
int getsource(void);
void ungetsource(int);
void sierror(enum errs, char *, int);
/* -------- the compiled (interpretable) S source -------- */
extern SYMBOL globals[];
extern char *tokenbf;
extern char *strings[];
extern SYMBOL *symtop;
[LISTING TWO]
/* ----------- interp.c ----------- */
#include <stdio.h>
#include <conio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <process.h>
#include "interp.h"
#define TRUE 1
#define FALSE 0
/* --------- the compiled (interpretable) S source ---------- */
SYMBOL globals[MAXSYMBOLS]; /* function/variable symbol table */
char *tokenbf = NULL; /* compiled token buffer */
char *strings[MAXSTRINGS]; /* char *[] string arrays */
SYMBOL *symtop; /* top of symbol table */
/* ---------- the external intrinsic functions ----------- */
extern INTRINSIC *infs; /* initialized by the SI shell */
/* ------ function macros ------------ */
#define bypass() tptr+=strlen(tptr)+1
#define isletter(c) (isalpha(c)||isdigit(c)||c =='_')
#define iswhite(c) (c==' '||c=='\t')
/* ---- function prototypes ----- */
static void linker(void);
static int gettoken(void);
static int getok(void);
static int iskeyword(void);
static int isident(void);
static int istoken(void);
static int getword(void);
static int getcx(void);
static void compound_statement(SYMBOL *);
static void statement(SYMBOL *);
static void statements(SYMBOL *);
static void skip_statements(SYMBOL *);
static void addsymbol(SYMBOL *, char *, char *, char **);
static SYMBOL *findsymbol(SYMBOL *, char *, SYMBOL *);
static SYMBOL *ifsymbol(SYMBOL *, char *, SYMBOL *);
static void freesymbols(SYMBOL *);
static void error(enum errs, char *);
static int iftoken(int);
static void skippair(int, int);
static void needtoken(int);
static int iftoken(int);
static int nexttoken(void);
static int expression(SYMBOL *);
static int escseq(void);
/* --------------- tokens ------------------ */
#define AUTOINC 'P'
#define AUTODEC 'D'
#define EQUALTO 'E'
#define NOTEQUAL 'N'
#define GE 'G'
#define LE 'L'
#define IF 'f'
#define ELSE 'e'
#define WHILE 'w'
#define FOR 'F'
#define CHAR 'c'
#define INT 'i'
#define STRING 's'
#define COMMENT1 '/'
#define COMMENT2 '*'
#define POINTER '*'
#define PLUS '+'
#define MINUS '-'
#define MULTIPLY '*'
#define DIVIDE '/'
#define EQUAL '='
#define LESS '<'
#define GREATER '>'
#define NOT '!'
#define LPAREN '('
#define RPAREN ')'
#define LBRACE '{'
#define RBRACE '}'
#define LBRACKET '['
#define RBRACKET ']'
#define COMMA ','
#define AND '&'
#define ADDRESS '@'
#define OR '|'
#define QUOTES '"'
#define QUOTE '\''
#define UNDERSCORE '_'
#define SEMICOLON ';'
#define IDENT 'I'
#define CONSTANT 'C'
#define LINENO 127
#define RETURN 'r'
#define BREAK 'b'
/* -------- table of key words and their tokens --------- */
static struct keywords {
char *kw;
int kwtoken;
} kwds[] = {
"\n", LINENO,
"for", FOR,
"while", WHILE,
"if", IF,
"else", ELSE,
"int", INT,
"char", CHAR,
"return",RETURN,
"break", BREAK,
NULL, 0
};
/* ------------ table of direct translate tokens -------- */
static int tokens[] = {
COMMA,LBRACE,RBRACE,LPAREN,RPAREN,EQUAL,NOT,POINTER,
LESS,GREATER,AND,OR,QUOTES,SEMICOLON,LBRACKET,RBRACKET,
MULTIPLY,DIVIDE,PLUS,MINUS,EOF,LINENO,0
};
/* --------------------- local data ----------------------- */
static char word[81]; /* word space for source parsing */
static int linenumber; /* current source file line number */
static int frtn; /* return value from a function */
static char *tptr; /* running token pointer */
static int stptr; /* running string allocation offset */
static int breaking, returning, skipping;
static SYMBOL *endglobals;
/* ----------- lexical scan and call linker ------------ */
void loadsource(void)
{
int tok = 0;
if (tokenbf == NULL)
if ((tokenbf = malloc(TOKBUFSIZE+81)) == NULL)
error(OMERR, "");
symtop = symtop ? symtop : globals;
freesymbols(globals);
memset(tokenbf, '\0', TOKBUFSIZE+81);
linenumber = 1;
tptr = tokenbf;
while (tok != EOF) {
if (tptr >= tokenbf + TOKBUFSIZE)
error(BUFFULL, "");
*tptr++ = tok = gettoken();
switch (tok) {
case LINENO:
sprintf(tptr, "%03d", linenumber);
tptr += 3;
break;
case CONSTANT:
case IDENT:
case STRING:
strcpy(tptr, word);
bypass();
break;
default:
break;
}
}
linker(); /* link the external variables and functions */
}
/* -- convert a script program to tokens for interpreter,
return the next token -------- */
static int gettoken(void)
{
int tok = getword();
if (tok == 0)
if ((tok = iskeyword()) == 0)
if ((tok = istoken()) == 0)
tok = isident();
if (tok == 0)
error(UNRECOGNIZED, word);
return tok;
}
/* ----- test to see if current word is a token ----- */
static int istoken(void)
{
int *t = tokens, t2;
while (*t && word[1] == '\0')
if (*word == *t++) {
switch (*word) {
case EOF:
break;
case AND:
if ((t2 = getcx()) != AND) {
*word = ADDRESS;
ungetsource(t2);
}
break;
case OR:
if (getcx() != OR)
error(MISSING, word);
break;
case PLUS:
case MINUS:
if ((t2 = getcx()) == *word)
*word = *word==PLUS ? AUTOINC : AUTODEC;
else
ungetsource(t2);
break;
default:
if ((t2 = getcx()) == EQUAL) {
switch (*word) {
case EQUAL: return EQUALTO;
case NOT: return NOTEQUAL;
case LESS: return LE;
case GREATER: return GE;
default: break;
}
}
ungetsource(t2);
break;
}
return *word;
}
return 0;
}
/* -------- test word for a keyword --------- */
static int iskeyword()
{
struct keywords *k = kwds;
while (k->kw)
if (strcmp(k->kw, word) == 0)
return k->kwtoken;
else
k++;
return 0;
}
/* ------ test for an ident -------- */
static int isident()
{
char *wd = word;
int n = 0;
if (isalpha(*wd) || *wd == UNDERSCORE)
return IDENT;
if (strlen(wd) <= 6) {
if (strncmp(wd, "0x", 2) == 0) {
wd += 2; /* 0x.... hex constant */
while (*wd) {
n = (n*16)+(isdigit(*wd) ? *wd-'0' :
tolower(*wd) - 'a' + 10);
wd++;
}
sprintf(word,"%d", n); /* converted hex constant */
}
else /* test for decimal constant */
while (*wd)
if (!isdigit(*wd++))
return 0;
return CONSTANT;
}
return 0;
}
/* -------- get the next word from the input stream ------- */
static int getword(void)
{
char *wd = word;
int c = ' ', tok = 0;
while (iswhite(c)) /* bypass white space */
c = getok();
if (c == QUOTE) {
tok = CONSTANT; /* quoted constant ('x') */
if ((c = getcx()) == '\\') /* escape sequence (\n) */
c = escseq();
sprintf(word, "%d", c); /* build the constant value */
wd += strlen(word);
if (getcx() != QUOTE) /* needs the other quote */
error(MISSING,"'");
}
else if (c == QUOTES) {
tok = STRING; /* quoted string "abc" */
while ((c = getcx()) != QUOTES)
*wd++ = c == '\\' ? escseq() : c;
}
else {
*wd++ = c; /* 1st char of word */
while (isletter(c)) { /* build an ident */
c = getok();
if (isletter(c))
*wd++ = c;
else
ungetsource(c);
}
}
*wd = '\0'; /* null terminate the word or token */
return tok;
}
/* ---- escape sequence in literal constant or string ---- */
static int escseq()
{
int c = getcx();
return (c == 'n' ? '\n' :
c == 't' ? '\t' :
c == 'f' ? '\f' :
c == 'a' ? '\a' :
c == 'b' ? '\b' :
c == 'r' ? '\r' :
c == '0' ? '\0' : c);
}
/* ------- get a character from the input stream -------- */
static int getok(void)
{
int c, c1;
while ((c = getsource()) == COMMENT1) {
if ((c1 = getcx()) != COMMENT2) { /* comment */
ungetsource(c1);
break;
}
while (TRUE) { /* found comment begin token pair */
while ((c1 = getcx()) != COMMENT2)
;
if ((c1 = getcx()) == COMMENT1)
break; /* found comment end token pair */
}
}
if (c == '\n') /* count source line numbers */
linenumber++;
return c;
}
/* ------- read a character from input, error if EOF ------ */
static int getcx(void)
{
int c;
if ((c = getsource()) == EOF)
error(EARLYEOF, "");
return c;
}
/* --------- build the global symbol table --------- */
static void linker(void)
{
int tok = 0;
char *svtptr;
INTRINSIC *ff = infs;
tptr = tokenbf;
/* --- add intrinsic functions to the symbol table --- */
while (ff->fname) {
addsymbol(globals,ff->fname,ff->fname,NULL);
ff++;
}
while (tok != EOF) {
switch (tok = nexttoken()) {
case CHAR:
svtptr = tptr;
if (iftoken(POINTER)) {
needtoken(IDENT);
bypass();
if (iftoken(LBRACKET)) {
tptr = svtptr;
nexttoken();
nexttoken();
addsymbol(globals,tptr,NULL,
strings+stptr);
bypass();
needtoken(LBRACKET);
needtoken(RBRACKET);
needtoken(EQUAL);
needtoken(LBRACE);
while (TRUE) {
if (!iftoken(STRING))
break;
if (stptr == MAXSTRINGS)
error(TOOMANYSTRINGS, "");
strings[stptr++] = tptr;
bypass();
if (!iftoken(COMMA))
break;
}
strings[stptr++] = NULL;
needtoken(RBRACE);
needtoken(SEMICOLON);
break;
}
}
tptr = svtptr;
case INT:
while (TRUE) {
if (iftoken(POINTER))
;
needtoken(IDENT);
addsymbol(globals,tptr,NULL,NULL);
bypass();
if (iftoken(EQUAL))
(symtop-1)->ival=expression(globals);
if (!iftoken(COMMA))
break;
}
needtoken(SEMICOLON);
break;
case IDENT:
addsymbol(globals, tptr, tptr, NULL);
bypass();
skippair(LPAREN, RPAREN);
skippair(LBRACE, RBRACE);
break;
case EOF:
break;
default:
error(OUTOFPLACE, (char *) &tok);
}
}
endglobals = symtop;
}
/* --------- a function is called ---------- */
int function(char *fnc, SYMBOL *sp)
{
int params[MAXPARAMS+1], p, i;
INTRINSIC *ff = infs;
char *savetptr = tptr;
frtn = 0;
tptr = fnc;
bypass();
needtoken(LPAREN);
for (p = 0; p < MAXPARAMS; ) { /* scan for parameters */
if (iftoken(RPAREN))
break;
params[p++]=expression(sp); /* build params */
if (!iftoken(COMMA)) { /* into parameter array */
needtoken(RPAREN);
break;
}
}
params[p] = 0;
while (ff->fname) { /* search the intrinsic table */
if (strcmp(fnc,ff->fname) == 0) {
frtn = (*ff->fn)(params); /* call intrinsic func */
tptr = savetptr;
return frtn;
}
ff++;
}
if ((tptr=findsymbol(globals,fnc,endglobals)->location)
== NULL)
error(NOTFUNC,fnc); /* function not declared */
bypass();
needtoken(LPAREN);
sp = symtop;
for (i = 0; i < p; i++) { /* params into local sym tbl */
needtoken(IDENT);
addsymbol(sp,tptr,NULL,NULL);
(symtop-1)->ival = params[i];
bypass();
if (i < p-1)
needtoken(COMMA);
}
needtoken(RPAREN);
compound_statement(sp); /* execute the function */
freesymbols(sp); /* release the local symbols */
tptr = savetptr;
breaking = returning = FALSE;
return frtn; /* the function's return value */
}
/* ------- execute one statement or a {} block -------- */
static void statements(SYMBOL *sp)
{
if (iftoken(LBRACE)) {
--tptr;
compound_statement(sp);
}
else
statement(sp);
}
/* -------- execute a {} statement block ----------- */
static void compound_statement(SYMBOL *sp)
{
char *svtptr = tptr;
SYMBOL *spp = symtop;
needtoken(LBRACE);
while (iftoken(CHAR) || iftoken(INT)) {
while (TRUE) { /* local variables in block */
if (iftoken(POINTER))
; /* bypass pointer token(s) */
needtoken(IDENT);
addsymbol(spp,tptr,NULL,NULL);
bypass();
if (iftoken(EQUAL)) /* handle assignments */
(symtop-1)->ival=expression(sp);
if (!iftoken(COMMA))
break;
}
needtoken(SEMICOLON);
}
while (!iftoken(RBRACE) && !breaking && !returning)
statements(sp);
tptr = svtptr; /* point to the opening { brace */
freesymbols(spp); /* free the local symbols */
skippair(LBRACE, RBRACE); /* skip to end of block */
}
/* --------- execute a single statement ---------- */
static void statement(SYMBOL *sp)
{
char *svtptr, *fortest, *forloop, *forblock;
int rtn, tok = nexttoken();
switch (tok) {
case IF:
needtoken(LPAREN);
rtn = expression(sp); /* condition being tested */
needtoken(RPAREN);
if (rtn)
statements(sp); /* condition is true */
else
skip_statements(sp); /* condition is false */
while (iftoken(ELSE))
if (rtn) /* do the reverse for else */
skip_statements(sp);
else
statements(sp);
break;
case WHILE:
rtn = TRUE;
breaking = returning = FALSE;
svtptr = tptr;
while (rtn && !breaking && !returning) {
tptr = svtptr;
needtoken(LPAREN);
rtn = expression(sp); /* the condition tested */
needtoken(RPAREN);
if (rtn)
statements(sp); /* true */
else
skip_statements(sp); /* false */
}
breaking = returning = FALSE;
break;
case FOR:
svtptr = tptr; /* svptr -> 1st ( after for */
needtoken(LPAREN);
if (!iftoken(SEMICOLON)) {
expression(sp); /* initial expression */
needtoken(SEMICOLON);
}
fortest = tptr; /* fortest -> terminating test */
tptr = svtptr;
skippair(LPAREN,RPAREN);
forblock = tptr; /* forblock -> block to run */
tptr = fortest;
breaking = returning = FALSE;
while (TRUE) {
if (!iftoken(SEMICOLON)) {
if (!expression(sp)) /* terminating test */
break;
needtoken(SEMICOLON);
}
forloop = tptr;
tptr = forblock;
statements(sp); /* the loop statement(s) */
if (breaking || returning)
break;
tptr = forloop;
if (!iftoken(RPAREN)) {
expression(sp); /* the end of loop expr */
needtoken(RPAREN);
}
tptr = fortest;
}
tptr = forblock;
skip_statements(sp); /* skip past the block */
breaking = returning = FALSE;
break;
case RETURN:
if (!iftoken(SEMICOLON)) {
frtn = expression(sp);
needtoken(SEMICOLON);
}
returning = !skipping;
break;
case BREAK:
needtoken(SEMICOLON);
breaking = !skipping;
break;
case IDENT:
--tptr;
expression(sp);
needtoken(SEMICOLON);
break;
default:
error(OUTOFPLACE, (char *) &tok);
}
}
/* -------- bypass statement(s) ------------ */
static void skip_statements(SYMBOL *sp)
{
skipping++; /* semaphore that suppresses assignments, */
statements(sp); /* breaks,returns,++,--,function calls */
--skipping; /* turn off semaphore */
}
/* -------- recursive descent expression analyzer -------- */
static int primary(SYMBOL *sp)
{
int tok, rtn = 0;
SYMBOL *spr;
switch (tok = nexttoken()) {
case LPAREN:
rtn = expression(sp);
needtoken(RPAREN);
break;
case NOT:
rtn = !primary(sp);
break;
case CONSTANT:
rtn = atoi(tptr);
bypass();
break;
case POINTER:
rtn = *(int *)primary(sp) & 255;
break;
case ADDRESS:
case AUTOINC:
case AUTODEC:
needtoken(IDENT);
case IDENT:
if ((spr = ifsymbol(sp,tptr,symtop)) == NULL)
spr = findsymbol(globals,tptr,endglobals);
if (spr->location) {
/* ---- this is a function call ---- */
if (tok != IDENT)
error(OUTOFPLACE, (char *) &tok);
rtn = skipping ? 0 : function(tptr,sp);
bypass();
skippair(LPAREN,RPAREN);
break;
}
bypass();
if (spr->tblloc) {
/* ---- this is a table ---- */
rtn = (tok == ADDRESS ?
(int) (&spr->tblloc) :
(int) ( spr->tblloc) );
break;
}
if (!skipping && tok == AUTOINC)
++(spr->ival);
else if (!skipping && tok == AUTODEC)
--(spr->ival);
if (tok != ADDRESS && iftoken(EQUAL)) {
rtn = expression(sp);
spr->ival = skipping ? spr->ival : rtn;
}
rtn = tok == ADDRESS ? (int)&spr->ival : spr->ival;
if (tok != ADDRESS)
if (iftoken(AUTOINC) && !skipping)
(spr->ival)++;
else if (iftoken(AUTODEC) && !skipping)
(spr->ival)--;
break;
case STRING:
rtn = (int) tptr;
bypass();
break;
default:
error(OUTOFPLACE, (char *) &tok);
}
return rtn;
}
static int mult(SYMBOL *sp)
{
int drtn, rtn = primary(sp);
while (TRUE)
if (iftoken(MULTIPLY))
rtn = (rtn * primary(sp));
else if (iftoken(DIVIDE)) {
if ((drtn = primary(sp)) == 0)
error(DIVIDEERR, "");
rtn /= drtn;
}
else
break;
return rtn;
}
static int plus(SYMBOL *sp)
{
int rtn = mult(sp);
while (TRUE)
if (iftoken(PLUS))
rtn = (rtn + mult(sp));
else if (iftoken(MINUS))
rtn = (rtn - mult(sp));
else
break;
return rtn;
}
static int le(SYMBOL *sp)
{
int rtn = plus(sp);
while (TRUE)
if (iftoken(LE))
rtn = (rtn <= plus(sp));
else if (iftoken(GE))
rtn = (rtn >= plus(sp));
else if (iftoken(LESS))
rtn = (rtn < plus(sp));
else if (iftoken(GREATER))
rtn = (rtn > plus(sp));
else
break;
return rtn;
}
static int eq(SYMBOL *sp)
{
int rtn = le(sp);
while (TRUE)
if (iftoken(EQUALTO))
rtn = (rtn == le(sp));
else if (iftoken(NOTEQUAL))
rtn = (rtn != le(sp));
else
break;
return rtn;
}
static int and(SYMBOL *sp)
{
int rtn = eq(sp);
while (iftoken(AND))
rtn = (eq(sp) && rtn);
return rtn;
}
static int expression(SYMBOL *sp)
{
int rtn = and(sp);
while (iftoken(OR))
rtn = (and(sp) || rtn);
return rtn;
}
/* ----- skip the tokens between a matched pair ----- */
static void skippair(int ltok, int rtok)
{
int pairct = 0, tok;
if ((tok = nexttoken()) != ltok)
error(ltok == LBRACE ? BRACERR : PARENERR, "");
while (TRUE) {
if (tok == ltok)
pairct++;
if (tok == rtok)
if (--pairct == 0)
break;
if ((tok = nexttoken()) == EOF)
error(ltok == LBRACE ? BRACERR : PARENERR, "");
}
}
/* ----- a specified token is required next ----- */
static void needtoken(int tk)
{
if (nexttoken() != tk)
error(MISSING, (char *) &tk);
}
/* ----- test for a specifed token next in line ----- */
static int iftoken(int tk)
{
if (nexttoken() == tk)
return TRUE;
--tptr;
return FALSE;
}
/* ----- get the next token from the buffer ----- */
static int nexttoken(void)
{
while (*tptr == LINENO)
tptr += 4;
return *tptr++;
}
/* ------ add a symbol to the symbol table ------------ */
static void
addsymbol(SYMBOL *s,char *sym,char *floc,char **tloc)
{
if (ifsymbol(s,sym,symtop) != NULL)
error(DUPL_DECLARE, sym);
if (symtop == globals + MAXSYMBOLS)
error(TABLEOVERFLOW, "");
if ((symtop->symbol = malloc(strlen(sym) + 1)) == NULL)
error(OMERR, "");
strcpy(symtop->symbol, sym);
symtop->location = floc;
symtop->tblloc = tloc;
symtop->ival = 0;
symtop++;
}
/* --------- find a symbol on the symbol table --------- */
static SYMBOL *findsymbol(SYMBOL *s, char *sym, SYMBOL *ends)
{
if ((s = ifsymbol(s, sym, ends)) == NULL)
error(UNDECLARED, sym);
return s;
}
/* -------- test for a symbol on the symbol table ------ */
static SYMBOL *ifsymbol(SYMBOL *s, char *sym, SYMBOL *sp)
{
while (s < sp--)
if (strcmp(sym, sp->symbol) == 0)
return sp;
return NULL;
}
/* ------- free the symbols from a symbol table ------- */
static void freesymbols(SYMBOL *s)
{
while (s < symtop)
free((--symtop)->symbol);
}
/* -------- post an error to the shell ------- */
static void error(enum errs erno, char *s)
{
while (*tptr != LINENO && tptr > tokenbf)
--tptr;
sierror(erno, s, (*tptr == LINENO) ? atoi(tptr+1) : 1);
}
[LISTING THREE]
/* ---------- si.c -------------- */
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include "interp.h"
/* ----------- intrinsic interpreter functions ---------- */
static int iprntf(int *p) /* printf */
{
printf((char*)p[0],p[1],p[2],p[3],p[4]);
return 0;
}
static int igtch(int *p) /* getchar */
{
return putch(getch());
}
static int iptch(int *c) /* putchar */
{
return putchar(*c);
}
INTRINSIC ffs[] = { "printf", iprntf,
"getchar", igtch,
"putchar", iptch,
NULL, NULL };
extern INTRINSIC *infs = ffs;
/* ---------- error messages ------------- */
char *erm[]={ "Unexpected end of file", "Unrecognized",
"Duplicate ident", "Symbol table full",
"Out of heap memory", "Undeclared ident",
"Syntax Error", "Unmatched {}",
"Unmatched ()", "Missing",
"Not a function", "Misplaced break",
"Out of place", "Too many strings",
"Token buffer overflow", "Divide by zero" };
static FILE *fp;
void main(int argc, char *argv[])
{
if (argc == 2)
if ((fp = fopen(argv[1], "r")) != NULL) {
loadsource();
fclose(fp);
interpret();
}
}
void sierror(enum errs erno, char *s, int line)
{
printf("\r\n%s %s on line %d\n",s,erm[erno],line);
exit(1);
}
int getsource(void) { return getc(fp); }
void ungetsource(int c) { ungetc(c, fp); }