This month I'll look at the lexical scan that Quincy uses to build a run-time interpretable token stream and at the symbol-table process. Quincy is operational: I finished the tutorial book it was designed to support and integrated its tutorial mode with the book's exercises. I'll devote the next several columns to the completion of the project.
I'm back at home after a two-week tour of the Midwest, playing the piano at jazz festivals and concerts. I always figure I'll get a lot of day-gig work done on the road by taking the laptop. I've got one of those converters that you plug into the car cigarette lighter to get household current. The idea is to pound out columns, books, and software while Judy drives. (Dream on, Al.) That converter would heat a small stadium. Luckily, the minivan has an adequate air conditioner. Judy has been working on her family tree, so between the two of us, we got a lot of computing mixed in with the commuting. She uses a program called "Family Tree Maker for Windows," and that is one slick program. Being a devoted DOS user, Judy reluctantly switched to Windows because she hit the wall on FTM's DOS version. It holds only about 1200 people, and there are a lot more Stauffers than that in Ringtown, Pennsylvania.
Upon our return, I hooked the laptop into the network to upload all our road work to the desk machines. I have one of those dongle-like Ethernet adaptors that plugs into the laptop printer port. Without it I'd have to move files around on diskettes. Since I was in a hurry, Murphy's law kicked in, and the network device's AC/DC adaptor fell apart when I plugged it into the wall. The tiny ends of the transformer winding were sheared off and too short to solder. A search of the workshop turned up no other adaptor that delivers 12 volts DC and 500 milliamps. The garage, however, provided the solution. I am now making high-speed file transfers across my high-tech network through a state-of-the-art network adaptor powered by a 30-year-old automobile battery charger/eliminator. Greasy, dented, bigger than a hatbox, with bug-eyed volt and amp meters, fat cables, and brass clamps, it squats on my desk next to all the fancy stuff and proudly does what it does best--delivers a well-regulated, flat, clean, 12 volts that you could weld with.
A language translator has several processes. When the language is C, the first process is the preprocessor. Not all languages have preprocessors; the operations served by C's preprocessor are often built-in operators in other languages. C's preprocessor converts C code and preprocessing directives into C code ready to be translated. I discussed Quincy's preprocessor in the June and July issues.
The first part of translation beyond preprocessing is the lexical scan, which reads the source code and translates it into tokens--single-character codes that represent discrete language elements. Subsequent translation operates on this stream of tokens. When the translator is an interpreter, the token stream is what the interpreter reads to execute the program.
Tokens represent identifiable language elements. The lexical scan parses the source code from beginning to end, extracting code fragments and translating them into tokens. The four discrete parts of C code are keywords, operators, constants, and identifiers. When the interpreter is integrated with a debugger, the token stream must also identify line numbers from the original source code.
The main purpose of the lexical scan is to reduce source code into smaller, more easily interpreted code. The code's grammatical correctness is the concern of a later process. Compilers do that in the code-generation process when they compile tokens into executable code. An interpreter such as Quincy involves some mix of compile-time and run-time grammatical-syntax checking. In general, the lexical scan determines only that each element of code is translatable into a language token without respect to its grammatical context, but there are exceptions. For example, the colon character has four uses in the C language. It terminates statement labels; terminates case expressions; serves as the second, else delimiter of the ?: conditional operator; and may appear in string and character constants. (Its potential appearance in comments is unimportant in this context because the preprocessor strips comments from the source code.) The same thing applies to the dot, which can be part of a floating-point constant, the structure-member dereferencing operator, or one-third of the ellipse token. The lexical scan recognizes and translates constants, ellipses, and statement labels, so it must do some contextual analysis of the source code.
Listing One is scanner.c, Quincy's lexical scanner, which consists of the tokenize function and some local functions. The tokenize function accepts two pointer arguments. The first points to a buffer to receive the token stream, and the second, to the preprocessed source code. The scanner reads the source code a character at a time and determines which token to build, based on the character's value. First the scanner calls the FindOperator function to see if the current character and the next one constitute a two-character operator, such as the != not-equal operator. That function returns the token when it finds such an operator and 0 when it does not. If FindOperator returns a token, the scanner copies it to the token stream. If the token is one of the shift operators, the scanner tests to see if it is followed by an equal sign, which signifies a shift-assignment operator. If so, the token is modified accordingly.
The scanner recognizes the newline character to identify the file and line number for the debugger. The preprocessor inserts a newline and comment for each nonblank source-code line. The comment has the format /*01:02*/, where the first value is the file number of the source-code file, and the second is the line number within that file. The preprocessor assigns a file number to each included file and puts the filename in a global table that the interpreter and debugger can use to report errors to the programmer. The token stream for a newline contains a T_LINENO token followed by two unsigned character integers containing the file and line numbers.
A double quote in the source code indicates the beginning of a string constant. The scanner copies the T_STRCONST token to the token stream, followed by the length of the string and the null-terminated string value itself. The scanner remembers that it just processed a string constant so that if another one immediately follows, it can properly concatenate adjacent strings. Translation of the string calls the uncesc function for each character, which returns the character value or its value as represented by a backslash-escape sequence.
Character constants are like string constants except that character constants occupy one character position. The scanner builds a character constant when it sees an apostrophe, inserting a T_CHRCONST token and the character-constant value into the token stream. The scanner gets the character value from the unesc function, which decodes escape sequences.
The scanner converts C operators into their token equivalents, which are the same as the operators themselves. If the operator is one of those that can combine with the equal sign to form an assignment operation (+=, --=, etc.), the scanner puts a true value into the op variable. Then, if the next character in the source-code stream is an equal sign, the scanner sets the operator's most significant bit, which identifies the operator as an assignment operation.
If a dot is followed by two more dots, they are converted into the T_ELLIPSE token. If the dot is followed by a digit, however, the scanner calls the fltnum function to build a floating-point constant.
The scanner counts and balances pairs of left and right braces and then copies them into the token stream as their own tokens. It uses the brace count when it builds statement labels, and it uses a brace count of 0 to know that it can assume an external function declaration or definition.
The question mark is the if part of a conditional expression. The scanner remembers that one has been seen so that it does not try to interpret a subsequent identifier/colon pair as a statement label.
When the scanner sees a digit in the source code, it assumes a numerical constant. It calls the intnum function to decide which kind of constant to build into the token stream.
The intnum function scans the characters from the point of the digit until it finds a nondigit character. If that character is a dot or an upper- or lowercase "E," the constant is a floating constant, and intnum calls fltnum to translate it. Otherwise, intnum looks at the first digit. If it is a 0 and the next character is a digit, the constant is an octal number. If the first two characters are 0x, the constant is a hexadecimal number. Otherwise, the constant is a decimal integer. Depending on the range of the number and whether or not it is followed by L, the constant is either short or long. Accordingly, the program builds either the T_INTCONST or T_LNGCONST token into the token stream followed by the constant's integer value of the appropriate length.
The fltnum function builds a floating constant, which follows the T_FLTCONST token in the token stream.
An alphabetic character or an underscore means that the next language element is either a keyword or an identifier. The scanner calls the FindKeyword function, which returns a keyword's token if the text is a keyword, and 0 if not.
If the text is not a keyword, it is either a variable, a function identifier, or a statement label. If it is followed by a colon (and a case or conditional expression is not being built), the identifier is a statement label, so the scanner builds and installs a Variable structure for the label. The label includes the current brace-nesting count, which helps the interpreter to find its way to the label when it processes a matching goto statement.
If the text is neither a keyword nor a statement label, the scanner calls AddSymbol to add the identifier to the symbol table and to get an integer value to represent the symbol. The AddSymbol function adds the symbol to the table and returns its integer value or, if the symbol is already in the table, returns the previously assigned integer value. The scanner is not interested in identifier scope or reuse. It needs only a unique integer offset for each unique identifier.
An identifier can be either a variable reference, a function declaration/definition, or a call to a function, for which the scanner assigns the tokens T_SYMBOL, T_FUNCTION, and T_FUNCTREF, respectively.
The interpreter does not deal with actual identifiers. It uses the integer offsets that the scanner assigns to represent unique identifier values. The interpreter keeps the symbol table throughout the run so that the user can name identifiers to watch, examine, and modify from within the debugger.
You might wonder why Listing One includes a preprocessing directive statement that undefines isxdigit, forcing the compiler to use the function version of isxdigit rather than the macro version. For some reason, the Borland C++ compilers (3.1 and 4.0) that I use to compile Quincy do not compile the macro version correctly all the time.
Quincy uses five symbol tables during translation. Four of them are static; they record library-function identifiers, keywords, multiple-character operators, and preprocessor directives. The fifth is the dynamic table of identifiers that the program declares.
Listing Two is symbols.c, the program that declares and manages symbol tables. The tables are arrays of SYMBOLTABLE objects. SYMBOLTABLE is a structure defined by the qnc.h header file. Listing Two declares and initializes the static symbol tables.
Each symbol table associates its symbols with an integer code. Library functions are associated with codes that tell the interpreter which functions to execute. Keywords and operators are associated with interpreter tokens. Preprocessor identifiers are associated with a value that the preprocessor uses to determine which directive to process. Declared identifiers are associated with unique integer values that the scanner assigns in the order in which the identifiers occur.
Symbol tables are maintained in identifier sequence to facilitate a binary search on an identifier argument. The SearchSymbols function implements the binary search and is followed by several specialized functions that call SearchSymbols to search the individual tables. (The FindOperator function mentioned earlier is one such function.) Each of these functions takes an identifier as an argument and returns the code that matches the identifier or 0 if the identifier is not in the table. The FindSymbolName function takes an identifier code and returns a pointer to the matching identifier. The debugger uses this function to display function lists and matched variable names. The AddSymbol function adds an identifier to the dynamic symbol table and returns the code that the function associates with the new entry.
Quincy's lexical scanner and linker--the process that follows the scanner--defers until run time some of the language translation and error checking. This approach reflects the compromises that I made in the interest of performance. Inasmuch as Quincy is an interactive interpreter, I want to get the program running as soon as possible after the user makes some changes and chooses the run command. Therefore, some of the operations normally done by a compiler are done during run time by the interpreter. As a result, the program does not run as efficiently as it would if things such as the recursive-descent parser were earlier done by the translator. The performance hit does not concern me; Quincy's purpose is to support an interactive C-language tutorial, not to be a comprehensive software-development environment. Sometimes I mull over design changes that would support object linking, build stand-alone EXE files, optimize the run-time token interpretation, or yield some other slick improvement. After pondering those subjects for a while, it occurs to me that I would just be rebuilding Turbo C 1.0, and nobody needs that.
It's supposed to be hip to name a book the "Zen" of something. The local library lists over a dozen such books on as many subjects. The classic standard-bearer of that title is Zen and the Art of Motorcycle Maintenance, by Robert M. Pirsig, which had a lot to do with Zen and something to do with motorcycle maintenance. Most of the others have very little to do with Zen and a lot to do with the other part of their titles. The Zen of Code Optimization, by Michael Abrash (Coriolis Group Books, 1994, ISBN 1-883577-03-9) falls into that category. It is one of a series of Zen books from the publisher who dubs his authors "Zen masters." This represents, I suppose, the other side of the world--far away from a population of Dummies who talk about computers and programming in spite of their doubts, lack of confidence, and low self-esteem. It also reflects a trend among book publishers to create "lines" of books with titles that are like one another, usually following an unexpected success. Thus, we might see Advanced C++ Class Design for Dummies, Teach Yourself Quantum Mechanics, and OLE2 Made Easy in 21 Days.
Zen is a school of Buddhism wherein enlightenment is attained through meditation, self-contemplation, and intuition rather than through scriptures. A Zen Buddhist would therefore conclude that such a line of contemporary scriptures could not lead the reader anywhere near enlightenment. So, do not buy this book expecting to learn about Zen or to attain perfect programming enlightenment.
But, by all means, buy this book.
The Zen of Code Optimization is about writing fast code for the PC. More to the point, it is about viewing code that you've written with an eye to optimizing its performance. To optimize is to make as good as possible. What's good? Fast? Cheap? Small? Portable? On time? Depends on who you ask. In this case, only fast is good. Use the fewest processor cycles to do the same job. Optimize for speed. The book takes the position that you, the programmer, are the best speed optimizer. Only you can find the bottlenecks and only you can widen them. When is speed important? Are all bottlenecks bad? Why worry about cutting down on the cycles used waiting for a keystroke? Where in your program is valuable time wasted? This book encourages you to ask and answer such questions and does so by example. It starts with a program that works but that is not as fast as it might be. Then it moves the program through successive performance enhancements, explaining each time, the implication of the change, why it speeds things up, and what there is about the earlier version that should call your attention to the change. That's the important lesson. Learn how to spot the performance bottlenecks. Can you remove them without compromising the program's readability and maintainability? If not, is it worth it?
To help spot the bottlenecks, Abrash provides a timer program written in assembly language that uses the PC's 8253 timer chip to measure the performance in C programs. The C program links with these functions and calls them to record the elapsed times of processes that you want to measure. A poor man's profiler, of sorts.
The book teaches about exploiting hardware, using assembly language where it makes sense, and not using assembly language where it doesn't matter; it also includes several chapters on the Pentium. There are examples that optimize the game of Life, the Boyer-Moore string-searching algorithm, and much more. The book includes a diskette. Perhaps you will use the code in a project, perhaps not. The important contribution is not so much Abrash's code, which is certainly good, but his influence on your view and attitude toward your own code. When a book keeps you thinking well after you are finished reading it, then it has served you well, and this is such a book.
Copyright © 1994, Dr. Dobb's JournalOn and Off the Road Again
Lexical Scanning
Syntax Checking
The tokenize Function
Newlines and Line-Number Information
String Constants
Character Constants
Operators
Dots, Braces, Colons, and Question Marks
Numerical Constants
Keywords and Identifiers
Symbol Tables
Run Time versus Compile Time
Book Report: The Zen of Code Optimization
Listing One
/* --- scanner.c - Quincy's lexical scanner --- */
#include <stdio.h>
#include <stdlib.h>
#include "qnc.h"
#undef isxdigit
static int uncesc(char **);
static void fltnum(char **, char **);
static void intnum(char **, char **);
/* --- Convert C in srcbuf to tokens in tknbuf --- */
int tokenize(char *tknbuf, char *srcbuf)
{
char *start, *laststring = NULL, *cp, c, c2, c3, op;
char buf[8];
int i;
int BraceCount = 0;
char *tknptr = tknbuf;
int sawCond = 0;
int sawCase = 0;
while (*srcbuf) {
/* --- search for 2-char C operators --- */
if ((i = FindOperator(srcbuf)) != 0) {
srcbuf+=2;
if ((i == T_SHL || i == T_SHR) && *srcbuf == '=') {
srcbuf++;
i |= 0x80; /* encode op= operator */
}
*tknptr++ = i;
continue;
}
c = *srcbuf++; /* next src code char */
c &= 0x7f;
op = 0;
c2 = *srcbuf; /* lookahead 1 */
c3 = *(srcbuf+1); /* lookahead 2 */
if (c != '"' && c != '\n')
laststring = NULL;
switch (c) {
case '\n': /* File/Line */
/* _____________
* | T_LINENO |
* |_____________|
* |fileno (byte)|
* |_____________|
* |lineno (word)|
* |_____________| */
handshake(); /* keep D-Flat clock ticking */
*tknptr++ = T_LINENO;
Ctx.CurrFileno = atoi(srcbuf+2);
*tknptr++ = (unsigned char) Ctx.CurrFileno;
srcbuf = strchr(srcbuf, ':');
Assert(srcbuf != NULL);
srcbuf++;
Ctx.CurrLineno = atoi(srcbuf);
*(int*)tknptr = Ctx.CurrLineno;
tknptr += sizeof(int);
srcbuf = strchr(srcbuf, '/');
Assert(srcbuf != NULL);
srcbuf++;
break;
case '"': /* string constant */
/* ___________
* | T_STRCONST|
* |___________|
* | length |
* |___________|
* | char(s) |
* |___________|
* | 0 |
* |___________| */
if (laststring != NULL)
/* ---- concatenated string ---- */
tknptr = laststring+strlen(laststring);
else {
*tknptr++ = T_STRCONST;
laststring = tknptr++;
}
while ((c = *srcbuf) != '"' && c)
*tknptr++ = uncesc(&srcbuf);
*tknptr++ = '\0';
*laststring = tknptr - laststring;
if (c)
++srcbuf;
break;
case '\'': /* character constant */
/* ___________
* | T_CHRCONST|
* |___________|
* | value |
* |___________| */
*tknptr++ = T_CHRCONST;
*tknptr++ = uncesc(&srcbuf);
/* --- Skip to delimiting apostrophe --- */
while ((c = *srcbuf++) != '\'' && c)
;
if (!c)
--srcbuf;
break;
/* --- operators --- */
/* ___________
* | op token |
* |___________| */
case '*':
case '^':
case '%':
case '&':
case |:
case '+':
case '-':
case '/':
op = c;
case '=':
case '!':
case '<':
case '>':
case '[':
case ']':
case '(':
case ')':
case ',':
case '~':
case ' ':
case ';':
/* --- single character operator --- */
*tknptr++ = c;
break;
case '?':
sawCond++;
*tknptr++ = c;
break;
case ':':
if (sawCond)
--sawCond;
sawCase = 0;
*tknptr++ = c;
break;
case '{':
BraceCount++;
*tknptr++ = c;
break;
case '}':
--BraceCount;
*tknptr++ = c;
break;
case '.':
if (c2 == '.' && c3 == '.') {
*tknptr++ = T_ELLIPSE;
srcbuf += 2;
}
else if (isdigit(c2)) {
/*
* floating pointer number.
*/
--srcbuf;
fltnum(&srcbuf, &tknptr);
}
else
*tknptr++ = c;
break;
default:
if (isdigit(c)) {
/* --- constant --- */
/* ___________
* | T_INTCONST| (or T_LNGCONST,
* |___________| T_FLTCONST, etc.)
* | value | <- binary value of the
* |___________| number. Number of
* | . | bytes depends on type
* |___________| */
--srcbuf;
intnum(&srcbuf, &tknptr);
}
else if (alphanum(c)) {
/* --- identifier --- */
start = cp = tknptr+2;
--srcbuf;
while (alphanum(*srcbuf))
*cp++ = *srcbuf++;
*cp++ = 0;
if ((i = FindKeyword(start)) != 0) {
/* --- keyword --- */
/* ___________
* | key token |
* |___________| */
*tknptr++ = i;
if (i == T_CASE)
sawCase = 1;
}
else if (!sawCond && !sawCase &&
*srcbuf == ':') {
/* --- label for gotos --- */
VARIABLE var, *lvar;
NullVariable(&var);
var.vkind = LABEL;
var.vsymbolid = AddSymbol(start);
var.vclass = BraceCount;
lvar = InstallVariable(&var,
&Ctx.Curfunction->locals, 0,0,1,0);
lvar->voffset = tknptr - tknbuf;
srcbuf++;
}
else {
/* symbol, function declaration,
prototype, or call? */
FUNCTION *funcp;
int fsymbol = AddSymbol(start);
if ((funcp =
FindFunction(fsymbol)) != NULL) {
/* decl, func call, or addr */
/* ____________
* | T_FUNCTREF |
* |____________|
* | Function |
* | Number |
* |____________| */
*tknptr++ = T_FUNCTREF;
*(unsigned *)tknptr =
(funcp - FunctionMemory);
tknptr += sizeof(unsigned);
}
else if (*srcbuf == '(' &&
BraceCount == 0) {
FUNCTION func;
NullFunction(&func);
/* declaration or prototype */
/* _____________
* | T_FUNCTION |
* |_____________|
* |symbol offset|
* |_____________| */
/* --- install the function --- */
func.symbol = fsymbol;
func.libcode = SearchLibrary(start);
func.ismain =
(strcmp(start, "main") == 0);
func.fileno = Ctx.CurrFileno;
func.lineno = Ctx.CurrLineno;
Ctx.Curfunction = NextFunction;
InstallFunction(&func);
*tknptr++ = T_FUNCTION;
*(int *)tknptr = func.symbol;
tknptr += sizeof(int);
}
else {
/* variable reference */
/* _____________
* | T_SYMBOL |
* |_____________|
* |symbol offset|
* |_____________| */
*tknptr++ = T_SYMBOL;
*(int *)tknptr = fsymbol;
tknptr += sizeof(int);
}
}
}
else
/* --- Bad character in input line --- */
error(LEXERR);
}
if (*srcbuf == '=' && op) {
tknptr[-1] |= 128;
++srcbuf;
}
}
*tknptr++ = T_EOF;
*tknptr = '\0';
return tknptr - tknbuf;
}
static int uncesc(char **bufp)
{
/* Unescape character escapes */
char *buf, c;
buf = *bufp;
if ((c = *buf++) == '\\') {
int i;
char n[4];
switch (c = *buf++) {
case 'a': c = '\a'; break;
case 'b': c = '\b'; break;
case 'f': c = '\f'; break;
case 'n': c = '\n'; break;
case 'r': c = '\r'; break;
case 't': c = '\t'; break;
case 'v': c = '\v'; break;
case '\\': c = '\\'; break;
case '\'': c = '\''; break;
case '"': c = '"'; break;
case 'x':
sscanf(buf, "%x", &i);
c = i;
while (isxdigit(*buf))
buf++;
break;
default:
if (isdigit(c)) {
--buf;
for (i=0; i<3 && isdigit(*buf); ++i)
n[i] = *buf++;
n[i] = 0;
sscanf(n, "%o", &i);
c = i;
}
break;
}
}
*bufp = buf;
return c;
}
static void fltnum(char **srcstr, char **tknstr)
{
/* Parse a floating point number */
char *srcp, *cp;
char numbuf[64];
char c, n, dot, e, sign;
double f;
n = dot = e = sign = 0;
srcp = *srcstr;
**tknstr = T_FLTCONST;
++(*tknstr);
while (*srcp) {
if ((c = *srcp++) == '.') {
if (dot) {
/* Already saw a dot */
--srcp;
break;
}
++dot;
}
else if (c=='e' || c=='E') {
if (!(dot || n) || e) {
/* 'E' does not immediately follow dot
or number */
--srcp;
break;
}
++e;
}
else if (c=='+' || c=='-') {
if (e!=1 || sign) {
/* Sign does not immediately follow an 'E' */
--srcp;
break;
}
++sign;
}
else if (isdigit(c)) {
++n;
if (e) {
/* number follows an 'E' - don't allow
the sign anymore */
++e;
}
}
else {
--srcp;
break;
}
}
/* copy number into local buffer and null terminate it */
n = 0;
cp = *srcstr;
while (cp < srcp)
numbuf[n++] = *cp++;
numbuf[n] = 0;
f = atof(numbuf);
*((double*)*tknstr) = f;
*srcstr = srcp;
*tknstr += sizeof(double);
}
/* --- Parse a decimal, octal or hexadecimal number --- */
static void intnum(char **srcstr, char **tknstr)
{
char *srcp, *cp, c;
int i;
long j;
int isDecimal = 1;
/* ---- test for float number ---- */
srcp = *srcstr;
while (isdigit(*srcp))
++srcp;
if (*srcp == '.' || *srcp == 'e' || *srcp == 'E') {
fltnum(srcstr, tknstr);
return;
}
/* ----- not a float ----- */
c = T_INTCONST;
srcp = *srcstr;
if (*srcp++ == '0') {
if (isdigit(*srcp)) {
/* --- octal constant --- */
sscanf(srcp, "%o", &i);
while (isdigit(*srcp))
++srcp;
isDecimal = 0;
}
else if (tolower(*srcp) == 'x') {
/* --- hexadecimal constant --- */
sscanf(++srcp, "%x", &i);
while (isxdigit(*srcp))
++srcp;
isDecimal = 0;
}
}
if (isDecimal) {
cp = --srcp;
while (isdigit(*cp))
++cp;
/* --- decimal integer number --- */
i = atoi(srcp);
j = atol(srcp);
if (*cp == 'U')
cp++;
if (*cp == 'l' || *cp == 'L') {
c = T_LNGCONST;
++cp;
}
else if (j != (long)i)
c = T_LNGCONST;
srcp = cp;
}
*srcstr = srcp;
**tknstr = c;
++(*tknstr);
if (c == T_LNGCONST) {
*((long *)*tknstr) = j;
*tknstr += sizeof(long);
}
else {
*((int *)*tknstr) = i;
*tknstr += sizeof(int);
}
}
Listing Two
/* --------- symbols.c --------- */
#include <string.h>
#include <stdlib.h>
#include "qnc.h"
#include "sys.h"
SYMBOLTABLE LibraryFunctions[] = {
/* --- These have to be maintained in alphabetic order --- */
{ "_Errno", SYSERRNO },
{ "_filename", SYSFILENAME },
{ "_lineno", SYSLINENO },
{ "abs", SYSABS },
{ "acos", SYSACOS },
{ "asctime", SYSASCTIME },
{ "asin", SYSASIN },
{ "atan", SYSATAN },
{ "atan2", SYSATAN },
{ "atof", SYSATOF },
{ "atoi", SYSATOI },
{ "atol", SYSATOL },
{ "ceil", SYSCEIL },
{ "clrscr", SYSCLRSCRN },
{ "cos", SYSCOS },
{ "cosh", SYSCOSH },
{ "cprintf", SYSCPRINTF },
{ "cursor", SYSCURSOR },
{ "exit", SYSEXIT },
{ "exp", SYSEXP },
{ "fabs", SYSFABS },
{ "fclose", SYSFCLOSE },
{ "fflush", SYSFFLUSH },
{ "fgetc", SYSFGETC },
{ "fgets", SYSFGETS },
{ "findfirst", SYSFINDFIRST },
{ "findnext", SYSFINDNEXT },
{ "floor", SYSFLOOR },
{ "fopen", SYSFOPEN },
{ "fprintf", SYSFPRINTF },
{ "fputc", SYSFPUTC },
{ "fputs", SYSFPUTS },
{ "fread", SYSFREAD },
{ "free", SYSFREE },
{ "fscanf", SYSFSCANF },
{ "fseek", SYSFSEEK },
{ "ftell", SYSFTELL },
{ "fwrite", SYSFWRITE },
{ "getch", SYSGETCH },
{ "getchar", SYSGETCHAR },
{ "gets", SYSGETS },
{ "gmtime", SYSGMTIME },
{ "localtime", SYSLOCALTIME },
{ "log", SYSLOG },
{ "log10", SYSLOG10 },
{ "longjmp", SYSLONGJMP },
{ "malloc", SYSMALLOC },
{ "mktime", SYSMKTIME },
{ "pow", SYSPOW },
{ "printf", SYSPRINTF },
{ "putch", SYSPUTCH },
{ "putchar", SYSPUTCHAR },
{ "puts", SYSPUTS },
{ "remove", SYSREMOVE },
{ "rename", SYSRENAME },
{ "rewind", SYSREWIND },
{ "scanf", SYSSCANF },
{ "setjmp", SYSSETJMP },
{ "sin", SYSSIN },
{ "sinh", SYSSINH },
{ "sprintf", SYSSPRINTF },
{ "sqrt", SYSSQRT },
{ "sscanf", SYSSSCANF },
{ "strcat", SYSSTRCAT },
{ "strcmp", SYSSTRCMP },
{ "strcpy", SYSSTRCPY },
{ "strlen", SYSSTRLEN },
{ "strncat", SYSSTRNCAT },
{ "strncmp", SYSSTRNCMP },
{ "strncpy", SYSSTRNCPY },
{ "system", SYSSYSTEM },
{ "tan", SYSTAN },
{ "tanh", SYSTANH },
{ "time", SYSTIME },
{ "tmpfile", SYSTMPFILE },
{ "tmpnam", SYSTMPNAM },
{ "ungetc", SYSUNGETC }
};
#define MAXLIBFUNCTIONS (sizeof(LibraryFunctions)/sizeof(SYMBOLTABLE))
/* --------- keyword lookup table ------------ */
static SYMBOLTABLE Keywords[] = {
/* --- These have to be maintained in alphabetic order --- */
{ "auto", T_AUTO },
{ "break", T_BREAK },
{ "case", T_CASE },
{ "char", T_CHAR },
{ "const", T_CONST },
{ "continue", T_CONTINUE },
{ "default", T_DEFAULT },
{ "do", T_DO },
{ "double", T_DOUBLE },
{ "else", T_ELSE },
{ "enum", T_ENUM },
{ "extern", T_EXTERN },
{ "float", T_FLOAT },
{ "for", T_FOR },
{ "goto", T_GOTO },
{ "if", T_IF },
{ "int", T_INT },
{ "long", T_LONG },
{ "register", T_REGISTER },
{ "return", T_RETURN },
{ "short", T_SHORT },
{ "sizeof", T_SIZEOF },
{ "static", T_STATIC },
{ "struct", T_STRUCT },
{ "switch", T_SWITCH },
{ "typedef", T_TYPEDEF },
{ "union", T_UNION },
{ "unsigned", T_UNSIGNED },
{ "void", T_VOID },
{ "volatile", T_VOLATILE },
{ "while", T_WHILE }
};
#define MAXKEYWORDS (sizeof(Keywords)/sizeof(SYMBOLTABLE))
/* -------- multi-character operator lookup tbl ------------ */
static SYMBOLTABLE Operators[] = {
/* --- These have to be maintained in collating order --- */
{ "!=", T_NE },
{ "&&", T_LAND },
{ "++", T_INCR },
{ "--", T_DECR },
{ "->", T_ARROW },
{ "<<", T_SHL },
{ "<=", T_LE },
{ "==", T_EQ },
{ ">=", T_GE },
{ ">>", T_SHR },
{ ||, T_LIOR }
};
#define MAXOPERATORS (sizeof(Operators)/sizeof(SYMBOLTABLE))
static SYMBOLTABLE PreProcessors[] = {
/* --- These have to be maintained in collating order --- */
{ "define", P_DEFINE },
{ "elif", P_ELIF },
{ "else", P_ELSE },
{ "endif", P_ENDIF },
{ "error", P_ERROR },
{ "if", P_IF },
{ "ifdef", P_IFDEF },
{ "ifndef", P_IFNDEF },
{ "include", P_INCLUDE },
{ "undef", P_UNDEF }
};
#define MAXPREPROCESSORS (sizeof(PreProcessors)/sizeof(SYMBOLTABLE))
/* --- search a symbol table for matching entry --- */
int SearchSymbols(char *arg, SYMBOLTABLE *tbl, int siz, int wd)
{
int i, mid, lo, hi;
lo = 0;
hi = siz-1;
while (lo <= hi) {
mid = (lo + hi) / 2;
i = wd ? strncmp(arg, tbl[mid].symbol, wd) :
strcmp(arg, tbl[mid].symbol);
if (i < 0)
hi = mid-1;
else if (i)
lo = mid + 1;
else
return tbl[mid].ident;
}
return 0;
}
/* --- search for library function identifier --- */
int SearchLibrary(char *fname)
{
return SearchSymbols(fname,LibraryFunctions,MAXLIBFUNCTIONS,0);
}
/* --- search for keyword --- */
int FindKeyword(char *keyword)
{
return SearchSymbols(keyword, Keywords, MAXKEYWORDS, 0);
}
/* --- search for two-character operator --- */
int FindOperator(char *oper)
{
return SearchSymbols(oper, Operators, MAXOPERATORS, 2);
}
/* --- search for preprocessing directive --- */
int FindPreProcessor(char *preproc)
{
return SearchSymbols(preproc,PreProcessors,MAXPREPROCESSORS,0);
}
/* --- search for user-declared identifier --- */
int FindSymbol(char *sym)
{
if (SymbolTable != NULL)
return SearchSymbols(sym, SymbolTable, SymbolCount, 0);
return 0;
}
/* --- find identifier given code --- */
char *FindSymbolName(int id)
{
int i;
for (i = 0; i < SymbolCount; i++)
if (SymbolTable[i].ident == id)
return SymbolTable[i].symbol;
return NULL;
}
/* --- add identifier to symbol table --- */
int AddSymbol(char *sym)
{
int symbolid = 0;
if (SymbolTable != NULL) {
symbolid = FindSymbol(sym);
if (symbolid == 0) {
if (SymbolCount < qCfg.MaxSymbolTable) {
int i, j;
int len = strlen(sym)+1;
char *s = getmem(len);
strcpy(s, sym);
for (i = 0; i < SymbolCount; i++)
if (strcmp(sym, SymbolTable[i].symbol) < 0)
break;
for (j = SymbolCount; j > i; --j)
SymbolTable[j] = SymbolTable[j-1];
SymbolTable[i].symbol = s;
SymbolTable[i].ident = ++SymbolCount;
symbolid = SymbolCount;
}
else
error(SYMBOLTABLERR);
}
}
return symbolid;
}
/* --- delete the symbol table entries --- */
void DeleteSymbols(void)
{
int i;
for (i = 0; i < SymbolCount; i++)
free(SymbolTable[i].symbol);
}