Randy is the designer of numerous hardware and software projects, including assemblers for a variety of systems. In addition to consulting, he is currently a part-time instructor in computer science at California Polytechnic University in Pomona and at UC Riverside. He can be contacted at 9570 Calle La Cuesta, Riverside, CA 92503.
One of the promises of the object-oriented paradigm is that it will reduce program complexity and implementation effort for many different types of programs. Object-oriented programming, however, is no panacea. It is a technique, like recursion, that you can apply in certain cases to reduce programming effort. While there are certain types of programs whose object-oriented implementation is better, examples abound where object-oriented programming systems (OOPS) buy you nothing. Nonetheless, object-oriented programming techniques are a valuable tool to have in one's war chest.
OOPS are nothing new. They have been around since the late 1960s. Yet the object-oriented paradigm was languishing until Object Pascal and C++ began generating mainstream interest. The success of these languages demonstrates that OOP is not the domain of a few esoteric programming languages. Rather, object-oriented programming is applicable to almost any programming language.
Still, assembly language may not seem like the place to apply the object-oriented programming paradigm. But keep in mind that people were saying the same thing about Pascal and C five years ago.
What does an object-oriented assembly language program look Like? A better question to ask is, "What is the essence of an object-oriented program, and how does one capture it within an assembly language program?" Once you strip away the gloss and notation convenience provided by languages such as C++, you'll find that the two main features of an object-oriented program are polymorphism and inheritance.
Polymorphism takes two basic forms in most programming languages: static and dynamic. The general idea, however, is the same. You call different subroutines by the same name. Static polymorphism provides notational convenience in the form of operator/ function overloading in languages such as C++. Static polymorphism uses the parameter list, along with the routine's name (together they form the routine's signature), to determine which routine to call. For example, consider the C routines:
CmplxAddCC(C1, C2, C3); /*C1=C2+C3;*/ CmplxAddCR(C1, C2, R1); /*C1=C2+ToCmplx(R1);*/ CmplxAddRC(C1, R1, C2); /*C1=ToCmplx(R1)+C2;*/
In C++ you could write:
CmplxAdd(C1, C2, C3); CmplxAdd(C1, C2, R1); CmplxAdd(C1, R1, C2);
and the C++ compiler would figure out whether to call CmplxAddCC, CmplxAddCR, or CmplxAddRC. (Actually, you could overload C++'s "+" operator and use the three forms C1=C2+C3;, C1=C2+R1;, or C1=R1+C2;, but the example above would be still valid).
Static overloading, while convenient, does not add any power to the language. The calls to CmplxAdd call three different routines. CmplxAdd-(C1,C2,C3) calls CmplxAddCC, CmplxAdd(C1,C2,R) calls CmplxAddCR, and CmplxAdd-(C1,R,C2) calls CmplxAddRC. The C++ compiler determines which routine this code will call at compile time. Static polymorphism is a mechanism that lets the compiler choose one of several different routines to call depending upon the calling signature.
Sometimes you may want to use the same signature to call different routines. For example, suppose you have a class shape in which there are three graphical objects: circles, rectangles, and triangles. If you have an arbitrary object of type shape, the compiler cannot determine which DRAW routine to call. The program determines this at run time. This allows a single call to draw circles, rectangles, triangles at run time with the same machine instructions. This is dynamic polymorphism -- determining at run time which routine to call. C++ uses virtual functions and Object Pascal uses override procedures and functions to implement dynamic polymorphism.
Inheritance lets you build up data structures as supersets of existing data structures. This provides a mechanism whereby you can generalize data types, allowing you to handle various objects regardless of their actual type. This lets you define such diverse shapes as circles, rectangles, and triangles and treat them as compatible structures.
Because structures and classes are closely related, it may be instructive to look at the implementation of structures before looking at classes. Consider S, a variable of the type in Example 1. Somewhere in memory the compiler needs to generate storage for the fields of S. Traditionally, compilers allocate these fields contiguously (see Figure 1). Indeed, Microsoft's assembler (MASM) allows you to declare structures in a similar fashion, as shown in Example 2. If S provides the base address of this structure, S+0 is the address of S.i, S+2 is the address of S.j, and S+4 is the address of S.c.
struct { int i; int j; char *c; } S;
SType struc i dw ? j dw ? c dd ? SType ends
Now consider the case of a pair of C++ classes (Sc and Tc) in Example 3. Pointers to objects (pS and pT) may point at an object of the prescribed type or to an object that is a descendant of the pointer's base class. For example, pS can point at an object of type Sc or at an object of type Tc. Remember, accessing *pS.j is equivalent to (int) *(pS+2), so if pS points at an object of type Tc, the j field must also appear at offset two within the structure. For inheritance to work properly, the common fields must appear at the same offset within the structure (see Figure 2).
class Sc { int i; int j; char *c; }; Sc *pS; Tc *pT; ------------------------ Tclass Tc:Sc { int k; char *d; };
Additional fields in the subclass often appear after the fields in the parent class, so most compilers implement class Tc as in Example 4. Note that the offsets to i, j, and c are the same for both Sc and Tc.
struct Tc Tc struc { i dw ? int i; j dw ? int j; c dw ? char *c; k dw ? int k; d dd ? char *d Tc ends ? };
When I first began exploring how to implement inheritance in assembly, I got the bright idea of using macros inside structure definitions to handle the problem of inheritance. Briefly, I wanted to implement Sc and Tc as in Example 5. Unfortunately, MASM doesn't allow you to expand macros or strucs inside a structure. Disappointed, I tried the brute force way to implement Sc and Tc, as illustrated in Example 6.
ScItems macro i dw ? j dw ? c dd ? endm ; TcItems macro ScItems k dw ? d dd ? endm ; Sc struc ScItems Sc ends ; Tc struc TcItems Tc ends
Sc struc i dw ? j dw ? c dd ? Sc ends Tc struc i dw ? j dw ? c dd ? k dw ? d dd ? Tc ends
Unfortunately, I'd forgotten that MASM doesn't treat these symbols as part of the structure. Names such as i, j, c, and so on must be unique in the program. As you can plainly see in Example 6, I declared i twice, and the assembler gave me a "redefinition of symbol" error. Almost ready to give up, I tried the method in Example 7.
Sc struc i dw ? j dw ? c dd ? Sc ends Tc struc dw ? dw ? dd ? k dw ? d dd ? Tc ends S Sc T Tc
MASM simply equates the field names to the offsets within the structure. So it equates i to zero, j to two, and so on. MASM does not associate i with labels of type Sc. You can use the symbols T.j and S.j in your program. Because the "." operator behaves like the "+" operator, T.j is just like T+2.
For Tc to inherit the fields of Sc, all we have to do is reserve enough space at the beginning of the Tc structure for each of the fields of Sc. Above, I stuck in the two DW and the DD pseudoopcodes to reserve space for the i, j, and c fields. This technique might get inconvenient if the number of inherited fields is large. The code in Example 8 solves this problem.
Sc struc i dw ? j dw ? c dd ? Sc ends Tc struc db (size Sc) dup (?) k dw ? d dd ? Tc ends Uc struc db (size Tc) dup (?) e dw ? Uc ends S Sc T Tc U Uc
The first DB pseudoopcode in Tc reserves the necessary space for the fields Tc inherits from Sc. Likewise, Uc (which is a subclass of Tc) reserves space at the beginning of the structure for the fields inherited from Tc and Sc. The code in Example 8 works great if you don't need to initialize any of the fields inherited from Sc; if you need to initialize some fields, you'll have to use the brute force method and redeclare space for each field.
The earlier paragraphs discuss how to implement objects whose fields are all variables. What happens when you introduce methods? If you're not overloading a method, you can treat it in the same manner as any other assembly language procedure and call it directly. If you are overloading a method, you must call it indirectly via a pointer within the object.
Consider the C++ class declaration in Example 9. The assembly code implementing this class is shown in Example 10. To call S.geti, you would use the 8086 instruction: CALL S.geti.
class Sc { int i, j; char *c; public: int geti() {return i}; /* Ignore the fact that C++ */ int getj() {return j}; /* would implement these */ void seti (x) int X; {i = x;}; /* methods in-line. */ void setj (x) int X; {j = x;}; };
Sc struc i dw ? j dw ? c dd ? geti dd Sc_geti getj dd Sc_getj seti dd Sc_seti setj dd Sc_setj Sc ends S Sc
Because S.geti is a double word memory variable, the CALL instruction will call the procedure S.geti, which points at Sc_geti. The fact that we're calling the methods indirectly will be useful when we look at overloading a little later.
Suppose we have three instances of class Sc, say S1, S2, and S3 declared in assembly language as follows:
S1 Sc S2 Sc S3 Sc
S1.geti, S2.geti, and S3.geti all call the same procedure (call it Sc_geti). How does Sc_geti differentiate between S1.i, S2.i, and S3.i? In object-oriented languages such as Object Pascal and C++, the compiler automatically passes a special parameter named this to the method. this always points at the object through which you've invoked the method. When you execute S1.geti, the compiler passes the address of S1 in this to geti. Likewise, the compiler passes the address of S2 in this when you call S2.geti.
You can pass this to a method just as any other parameter. Because the most efficient way of passing parameters is in the 8086's registers, I've adopted the convention of passing this in the ES:BX registers. The Sc_geti method would look something like Example 11 (assuming we're returning i in the AX register). This example demonstrates a major problem with object-oriented programming -- it is very inefficient. To load S1.i into AX, see Example 12. This requires six instructions where, logically, you should only need one (mov ax, S1.i). Welcome to the wonderful world of object-oriented programming! Yet circumventing all this overhead by loading S1.i directly into AX will eliminate the benefits of object-oriented programming.
Actually, this isn't as bad as it looks. A good part of the time ES:BX will already be pointing at the object you want to access. Nevertheless, the call and return are considerable overhead just to load the AX register with a word value. Stroustrup anticipated this problem when designing C++ and he solved it by providing inline functions (a.k.a. macros). We can use this same technique in assembly language to improve efficiency as Example 13 illustrates. This code snippet demonstrates another convention I adhere to: I make macros for all method calls, even those that are actual calls. This lets me use a consistent calling format for all methods, whether they are actual subroutines or are expanded in-line.
_THIS equ es:[bx] Sc_geti proc far mov ax, _THIS.i ret Sc_geti endp
mov bx, seg S1 mov es, bx mov bx, offset S1 call S1.geti ;Assuming S1 is in the data seg.
; ; Inline expansion of geti to improve efficiency: ; _Geti macro mov ax, _THIS.i endm ; ; Perform actual call to routines which are too big to ; expand in-line in our code: ; _Printi macro call _THIS.Printi endm * * * _Geti ;Get i into AX. * * * _Printi ;Call Printi routine.
There is one major drawback to expanding a procedure inline; you cannot overload procedures (C++'s inline functions suffer from this as well. You cannot have an inline virtual function). Therefore, you should only use this technique for those particular methods that you will never need to overload. Fortunately, the macro implementation makes it easy to switch to a call later if you need to overload the procedure. Just substitute a call for the inline code inside the macro.
Overloaded procedures allow the "same" method to perform different operations, depending upon the object passed to the method. Consider the class definitions in Example 14. Rect and Circle are types derived from Shape. If ES:BX points at a generic shape (that is, ES:BX points at an object of type Shape, Rect, or Circle) then CALL_THIS.Draw will call Shape_Draw, Rect_Draw, or Circle_Draw, depending upon where ES:BX points. This lets you write generic code that needn't know the particular details of the shape it's drawing. The object itself knows how to draw itself via the pointer to the specific draw routine.
Shape struc ulx dw ? ;Upper left X coordinate uly dw ? ;Upper left Y coordinate lrx dw ? ;Lower right X coordinate lry dw ? ;Lower right Y coordinate Draw dd Shape_Draw ;Default (overridden) DRAW routine Shape ends ; Rect struc dw 4 dup (?) ;Reserve space for coordinates dd Rect_Draw ;Draw a rectangle Rect ends ; Circle struc dw 4 dup (?) ;Reserve space for coordinates dd Circle_Draw ;Draw a circle Circle ends
High-level object-oriented languages such as Object Pascal and C++ tend to hide many of the allocation details from you. In assembly language, naturally, the programmer has to handle all of the allocation details. Although a complete discussion of dynamic allocation of objects is beyond the scope of this article, the subject is so pervasive that it warrants a brief mention.
Static allocation of an object in assembly language is quite simple. If you have the shape class definitions (shape, rect, and circle) mentioned earlier, you can easily declare variables of these types using declarations of the form:
MyRect rect MyCircle circle MyShape shape
This automatically fills in the DRAW field for these variables (the linker/loader fills in such addresses when it loads the program into memory). What happens if you are dynamically allocating storage for an object? Assume we have a routine, alloc, to which we pass a byte count in CX, and it returns a pointer to a block of memory that size in ES:BX. Now suppose we allocate a rectangle with the code in Example 15. Alloc will not be smart enough to fill in the pointer to the rect.DRAW routine. This is something we'll have to do ourselves. This requires the four instructions in Example 16.
mov cx, size rect call alloc mov word ptr MyRectPtr, bx mov word ptr MyRectPtr+2, es
mov ax, offset rectDRAW mov _this.DRAW, ax mov ax, seg rectDRAW mov _this.DRAW+2, ax
Eight instructions may not seem like a lot to create a simple object. Keep in mind, however, that our simple shape object only has one overridden method. If there were a dozen methods, you would need 52 instructions. Clearly, a CREATE procedure begins to make a lot of sense. Each subclass (shape, rect, and circle) will need its own CREATE method. CREATE is not a method you normally overload, because during the creation process you know exactly the type of object you're creating. By convention, the CREATE methods I write always allocate the appropriate amount of storage, initialize any important fields, and then return a pointer to the new object in ES:BX. The code in Example 17 provides an example, using the rect and circle types. To manipulate these objects, we need only load the appropriate pointer into ES:BX and access the appropriate fields or call the appropriate methods via this.
mov cx, size circle call CreateCircle mov word ptr CircVarPtr, bx mov word ptr CircVarPtr+2, es ; mov cx, size rect call CreateRect mov word ptr RectVarPtr, bx mov word ptr RectVarPtr+2, es
While writing object-oriented programs in assembly language, I've found certain guidelines helpful in the initial design phases (that is, before having to take efficiency into consideration). Most of these guidelines are widely accepted object-oriented practices; others pertain mainly to assembly language. Here are the major ones I'm using:
The example in Listing One (page 110) is a program that adds, subtracts, and compares signed binary integers, unsigned binary integers, and BCD values. While not a complete example (it's missing several important methods such as CREATE, PRINT, DISPOSE, and so on) it demonstrates the flavor of object-oriented programming in assembly language.
OBJECT-ORIENTED PROGRAMMING IN ASSEMBLY LANGUAGE
by Randall L. Hyde
[LISTING ONE]
page 62, 132
;
;***************************************************************************
; OBJECTS.ASM -- This program demonstrates object-oriented programming
; techniques in 8086 assembly language.
;
dseg segment byte public 'data'
;
; Unsigned Data Type:
Unsigned struc
Value dw 0
_Get_ dd ? ;AX = This
_Put_ dd ? ;This = AX
_Add_ dd ? ;AX = AX + This
_Sub_ dd ? ;AX = AX - This
_Eq_ dd ? ;Zero flag = AX == This
_Lt_ dd ? ;Zero flag = AX < This
Unsigned ends
;
; UVar lets you (easily) declare an unsigned variable.
UVar macro var
var Unsigned <,uGet,uPut,uAdd,uSub,uEq,uLt>
endm
;
; Signed Data Type:
Signed struc
dw 0
dd ? ;Get method
dd ? ;Put method
dd ? ;Add method
dd ? ;Sub method
dd ? ;Eq method
dd ? ;Lt method
Signed ends
;
; SVar lets you easily declare a signed variable.
SVar macro var
var Signed <,sGet, sPut, sAdd, sSub, sEq, sLt>
endm
;
; BCD Data Type:
BCD struc
dw 0 ;Value
dd ? ;Get method
dd ? ;Put method
dd ? ;Add method
dd ? ;Subtract method
dd ? ;Eq method
dd ? ;Lt method
BCD ends
;
; BCDVar lets you (easily) declare a BCD variable.
BCDVar macro var
var BCD <,bGet, bPut, bAdd, bSub, bEq, bLt>
endm
;
; Declare variables of the appropriate types (For the sample pgm below):
; Also declare a set of DWORD values which point at each of the variables.
; This provides a simple mechanism for obtaining the address of an object.
UVar u1
U1Adr dd U1 ;Provide convenient address for U1.
;
UVar u2
U2Adr dd U2 ;Ditto for other variables.
;
SVar s1
S1Adr dd s1
;
SVar s2
S2Adr dd s2
;
BCDVar b1
B1Adr dd b1
;
BCDVar b2
B2Adr dd b2
;
; Generic Pointer Variables:
Generic1 dd ?
Generic2 dd ?
;
dseg ends
;
cseg segment byte public 'CODE'
assume cs:cseg, ds:dseg, es:dseg, ss:sseg
;
_This equ es:[bx] ;Provide a mnemonic name for THIS.
;
; Macros to simplify calling the various methods
_Get macro
call _This._Get_
endm
;
_Put macro
call _This._Put_
endm
;
_Add macro
call _This._Add_
endm
;
_Sub macro
call _This._Sub_
endm
;
_Eq macro
call _This._Eq_
endm
;
_Lt macro
call _This._Lt_
endm
;
;***************************************************************************
; Methods for the unsigned data type:
uGet proc far
mov ax, _This
ret
uGet endp
;
uPut proc far
mov _This,ax
ret
uPut endp
;
uAdd proc far
add ax, _This
ret
uAdd endp
;
uSub proc far
sub ax, _This
ret
uSub endp
;
uEq proc far
cmp ax, _This
ret
uEq endp
;
uLt proc far
cmp ax, _This
jb uIsLt
cmp ax, 0 ;Force Z flag to zero.
jne uLtRtn
cmp ax, 1
uLtRtn: ret
;
uIsLt: cmp ax, ax ;Force Z flag to one.
ret
uLt endp
;
;***************************************************************************
; Methods for the unsigned data type.
sPut equ uPut ;Same code, why duplicate it?
sGet equ uGet
sAdd equ uAdd
sSub equ uSub
sEq equ uEq
;
sLt proc far
cmp ax, _This
jl sIsLt
cmp ax, 0 ;Force Z flag to zero.
jne sLtRtn
cmp ax, 1
sLtRtn: ret
;
sIsLt: cmp ax, ax ;Force Z flag to one.
ret
sLt endp
;
;***************************************************************************
; Methods for the BCD data type
bGet equ uGet ;Same code, don't duplicate it.
bPut equ uPut
bEq equ uEq
bLt equ uLt
;
bAdd proc far
add ax, _This
daa
ret
bAdd endp
;
bSub proc far
sub ax, _This
das
ret
bSub endp
;
;***************************************************************************
; Test code for this program:
TestSample proc near
push ax
push bx
push es
;
; Compute "Generic1 = Generic1 + Generic2;"
les bx, Generic1
_Get
les bx, Generic2
_Add
les bx, Generic1
_Put
;
pop es
pop bx
pop ax
ret
TestSample endp
;
; Main driver program
MainPgm proc far
mov ax, dseg
mov ds, ax
;
; Initialize the objects:
; u1 = 39876. Also initialize Generic1 to point at u1 for later use.
les bx, U1Adr
mov ax, 39876
_Put
mov word ptr Generic1, bx
mov word ptr Generic1+2, es
;
; u2 = 45677. Also point Generic2 at u2 for later use.
les bx, U2Adr
mov ax, 45677
_Put
mov word ptr Generic2, bx
mov word ptr Generic2+2, es
;
; s1 = -5.
les bx, S1Adr
mov ax, -5
_Put
;
; s2 = 12345.
les bx, S2Adr
mov ax, 12345
_Put
;
; b1 = 2899.
les bx, B1Adr
mov ax, 2899h
_Put
;
; b2 = 195.
les bx, B2Adr
mov ax, 195h
_Put
;
; Call TestSample to add u1 & u2.
call TestSample
;
; Call TestSample to add s1 & s2.
les bx, S1Adr
mov word ptr Generic1, bx
mov word ptr Generic1+2, es
les bx, S2Adr
mov word ptr Generic2, bx
mov word ptr Generic2+2, es
call TestSample
;
; Call TestSample to add b1 & b2.
les bx, B1Adr
mov word ptr Generic1, bx
mov word ptr Generic1+2, es
les bx, B2Adr
mov word ptr Generic2, bx
mov word ptr Generic2+2, es
call TestSample
;
mov ah, 4ch ;Terminate process DOS cmd.
int 21h
;
;
MainPgm endp
cseg ends
;
sseg segment byte stack 'stack'
stk dw 0f0h dup (?)
endstk dw ?
sseg ends
;
end MainPgm
[Example 1]
struct
{
int i;
int j;
char *c;
} S;
[Example 2]
SType struc
i dw ?
j dw ?
c dd ?
SType ends
[Example 3]
class Sc
{
int i;
int j;
char *c;
};
Sc *pS;
Tc *pT;
Tclass Tc:Sc
{
int k;
char *d;
};
[Example 4]
struct Tc Tc struc
{ i dw ?
int i; j dw ?
int j; c dd ?
char *c; k dw ?
int k; d dd ?
char *d Tc ends
};
[Example 5]
ScItems macro
i dw ?
j dw ?
c dd ?
endm
;
TcItems macro
ScItems
k dw ?
d dd ?
endm
;
Sc struc
ScItems
Sc ends
;
Tc struc
TcItems
Tc ends
[Example 6]
Sc struc
i dw ?
j dw ?
c dd ?
Sc ends
Tc struc
i dw ?
j dw ?
c dd ?
k dw ?
d dd ?
Tc ends
[Example 7]
Sc struc
i dw ?
j dw ?
c dd ?
Sc ends
Tc struc
dw ?
dw ?
dd ?
k dw ?
d dd ?
Tc ends
S Sc
T Tc
[Example 8]
Sc struc
i dw ?
j dw ?
c dd ?
Sc ends
Tc struc
db (size Sc) dup (?)
k dw ?
d dd ?
Tc ends
Uc struc
db (size Tc) dup (?)
e dw ?
Uc ends
S Sc
T Tc
U Uc
[Example 9]
class Sc
{
int i,j;
char *c;
public:
int geti() {return i}; /* Ignore the fact that C++ */
int getj() {return j}; /* would implement these */
void seti(x) int x; {i = x;}; /* methods in-line. */
void setj(x) int x; {j = x;};
};
[Example 10]
Sc struc
i dw ?
j dw ?
c dd ?
geti dd Sc_geti
getj dd Sc_getj
seti dd Sc_seti
setj dd Sc_setj
Sc ends
S Sc
[Example 11]
_THIS equ es:[bx]
Sc_geti proc far
mov ax, _THIS.i
ret
Sc_geti endp
[Example 12]
mov bx, seg S1
mov es, bx
mov bx, offset S1
call S1.geti ;Assuming S1 is in the data seg.
[Example 13]
;
; Inline expansion of geti to improve efficiency:
;
_Geti macro
mov ax, _THIS.i
endm
;
; Perform actual call to routines which are too big to
; expand in-line in our code:
;
_Printi macro
call _THIS.Printi
endm
.
.
.
_Geti ;Get i into AX.
.
.
.
_Printi ;Call Printi routine.
[Example 14]
Shape struc
ulx dw ? ;Upper left X coordinate
uly dw ? ;Upper left Y coordinate
lrx dw ? ;Lower right X coordinate
lry dw ? ;Lower right Y coordinate
Draw dd Shape_Draw ;Default (overridden) DRAW routine
Shape ends
;
Rect struc
dw 4 dup (?) ;Reserve space for coordinates
dd Rect_Draw ;Draw a rectangle
Rect ends
;
Circle struc
dw 4 dup (?) ;Reserve space for coordinates
dd Circle_Draw ;Draw a circle
Circle ends
[Example 15]
mov cx, size rect
call alloc
mov word ptr MyRectPtr, bx
mov word ptr MyRectPtr+2, es
[Example 16]
mov ax, offset rectDRAW
mov _this.DRAW, ax
mov ax, seg rectDRAW
mov _this.DRAW+2, ax
[Example 17]
mov cx, size circle
call CreateCircle
mov word ptr CircVarPtr, bx
mov word ptr CircVarPtr+2, es
;
mov cx, size rect
call CreateRect
mov word ptr RectVarPtr, bx
mov word ptr RectVarPtr+2, es