I really needed a better way of debugging my C programs. I am happy with my compiler but it doesn't include a source level debugger. Assembler language debuggers are difficult to use with compiled programs. After a little thought and some more work I developed the source debugger you see in this article. It:
Use of the debugger will vary somewhat from system to system. The description here applies to my computing environment. This consists of a SB-180 single board computer, 386k floppy, and a Beehive terminal. I use CP/M compatible ZRDOS, ZAS assembler, ZLINK linker and Q/C C compiler.
Suppose I want debug the program HELLO which prints a simple message on the screen and terminates.
#includeNormally I link this program with the command:main() { printf("Hello World\n"); }
ZLINK HELLO,CRUNLIB/and execute with the following:
HELLOWhen I want to analyze and control execution with my debugger I link with the command:
ZLINK DEBUG,HELLO,CRUNLIB/ $Sand execute with:
DEBUGOn encountering $S in the link command line, ZLINK generates a symbol table named DEBUG.SYM for the resulting linked program. This symbol table is key to the debugging process. Before starting execution of the main() function, the symbol file is read, the debugger command interpreter takes control and prompts with *. Now I can type in debugger commands.
Debugger commands are one letter optionally followed by one or more arguments. The command letter can be upper or lower case.
The S command is to set or display symbols. S without arguments displays the whole symbol table. S with one argument displays the current value of that symbol. The value of a symbol is usually the address of a data item or function entry point. If a hex number is specified, the symbol is assigned a new value. If the symbol did not previously exist, it is created at this time. Examples:
*S main 02C2 mainIf the symbol has been designated as a break point, the B character preceeds the value. Normally this command is used to check which symbols exist and what their current state is. Symbols are loaded from the DEBUG.SYM file at the beginning of program execution so it is rare that new symbols need be defined.
The D command is used to display contents of memory. If D is used with a symbol argument, the contents of that memory location are displayed. Optionally one may specify a format string to be used to display the memory contents. (see below). If D is used without arguments, the contents of memory for each symbol in the data segment are displayed. Examples:
*D stdin stdin=4548 *D stdin input file is %d stdin=input file is 17736
The F command is used to specify the format string to used to display the contents addressed by the symbol. This format string is used by a printf() statement whenever the contents of this memory location are displayed. One memory location will be displayed for each % character in the format string. If no format string has been associated with a symbol, a default format string of %x is assumed. The S command can be used to display a symbols current format string.
Format strings are the same ones used by the printf() statement except that the *, (, and ) characters have a special meaning. They are used to permit the formated display of the objects of pointers. The * character preceeding a % character indicates that the object to be displayed is pointed to by the contents of memory. As an example, take the following program fragment:
int a[] = {1, 2, 3,}, /* array */
*b, /* pointer to an integer */
*c; /* pointer to array */
...
*b = 0;
c[0] = 4; c[1] = 5; c[2] = 6;
appropriate debugger commands might be:
*D a a[0]=%d, a[1]=%d, a[2]=%d a a[0]=1, a[1]=2, a[2]=3 *D b &b=%x b &b=5F6A *D b b=*%d b 0 *D c *(c[0]=%d, c[1]=%d, c[2]=%d) c c[0]=4, c[1]=5, c[2]=6The ( character pushes the next address on to a stack and loads the pointed to address. The ) character recovers the saved address. Parenthesis can be used when the data structures to be displayed are more complex. For example, we might store a series of words as an array of linked lists.
struct word {
struct word *next, *prev;
char *spelling;
}
struct word *chain[ARRAYSIZE];
The command to display the first three pointers would be:
*D chain %x %x %x chain 4556 4578 45A9To display the first structure in each of the first two chains:
*D chain *(nxt=%x, prv=%x, %s), *(nxt=%, prv=%x, %s) chain (nxt=5543, prv=554B, martha), (nxt=558B, prv=83BF, anne)To display the first two structures in the first chain:
*D chain *(next=*(%x %x %s) prev=%x %s) chain (next=(4384 340D george) prev=89E4 martha)When a symbol corresponds to a function entry point, the format string is used to display the arguments when the function is entered and return value when the function exited. The first % or pair of parenthesis is used as the format for the return value while the rest of the format string is used for the arguments. For example, if the program contains:
fp = fopen("LST:","r");
and the following F debugger command is specified:
*F fopen file pointer=%x filename=%s mode=%sthe following will be displayed as the program executes:
>fopen filename=LST: mode=r ... <fopen=file pointer=457DOne final tricky example on the format string:
a[] = "abc"; b = "abc";a and b cannot be displayed with same format string. The correct format strings are:
*D a %c%c%c a abc *D b %s b abcThis fooled me when I first started using the debugger. Think about it.
The F, D, and B commands can be used to assign a format string to a symbol. In all cases the most recently specified format string becomes the default format string. The F command can be used to reinitialize the format string for a symbol to null.
Program flow is traced when trace mode is on. This means that as each function is entered and exited its name, arguments and return value are displayed. In order for arguments to be displayed a format string must have been previously supplied for the function.
The indicated symbol becomes a break point. This means that when the function corresponding to the symbol is entered, normal execution will be suspended, the name of the function with its arguments will be displayed, and the debugger will accept commands from the console. The break point can be cleared by reissueing the B command. If the B command is issued without arguments, the B command is applied to every symbol in the code segment. Any number of symbols may be simultaneously designated as break points.
This command will continue execution from where it was last suspended. A simple G command will continue execution until a break point is encountered. If G is followed by a number, that number of break points will be ignored before execution is again suspended.
This command displays the names and arguments of all the functions which have been entered but not yet exited. This provides a clear picture of how we got to where we are in the program. For example:
*W >main >printf Hello World >putc H *In order to have the function arguments displayed, format strings should be assigned to each symbol. Format strings can be reassigned and the W command reissued as many times as desired.
If the filename has no . character, .SYM is appended. The file is opened and the symbols created. This command is used to load a symbol table file whose name might not be DEBUG.SYM .
If the filename has no . character, .DBG is appended. The file is opened and lines of text are processed as if they were read from the console. This can be useful for automating repetitive commands. I use it for loading format strings for the functions in my C library. Debug command files can be nested 3 deep.
Figure 1 contains a summary of the debugger commands. Figure 2 shows a very simple sample debugger session.
Figure 3 shows how the debugger operates:
When moving to debugger to another environment, the first thing to review is the function of the stack, function parameter passing, and storage of local arguments. On my machine (HD64180) the stack grows from high addresses to low addresses. This affects the expressions used for retrieving and altering data on the stack in the trap() and resume() routines.
Q/C allocates storage for local variables on the stack such that the first local variable is stored in the lowest address and subsequent local variables are stored in higher addresses. This is very similer to storage of function parameters. farg() and ddata() are dependent on this arrangement. If your compiler and/or linker do not function in this way, these functions will have to be modified slightly.
Q/C allocates 1 byte for character variables in memory, but two bytes when a character variable is pushed on the stack as a function argument. Q/C floats and longs are twice as long as integers. Sizes of objects may vary among environments. farg() should be altered accordingly.
Functions must be long enough so that "call trap()" followed by the symbol table entry address does not extend past the end of the function. In my environment, this means that functions must be at least 5 bytes long.
There must be a way to determine if an address is a function entry point or a data area. I my environment this is done by comparing the address with the start of the data segment. Since we always link in the DEBUG.REL module before any others, the first static data item is stored at the beginning of the data segment. By taking the address of this item with the & operator we have the start of the data segment.
The debugger calls some functions which might not be in your library yet. These include the symbol table functions symmk(), symadd(), symlkup() and symdat() described in a previous article. The function strlwr() converts a string to lower case and returns a pointer to the string. isatty() returns true if the file number arguments corresponds to a terminal, and strchr() returns a pointer to the first occurance of a given character in a string. (This used to be called index() in the standard library). To implement this debugger these functions must be included.
Possibly the most bothersome aspect of installing DEBUG is working out a method for passing control to the setup routine before execution of the program itself starts. One easy way would be to modify the main() function to explicitly call setup() before it does anything else. I wanted to avoid having to recompile modules just to use the debugger so I took a different approach. Q/C includes a function called _shell() which is used to open standard files and initialize arguments for the main() function. _shell() is executed just after the program is loaded. Since the Q/C package includes the code for the _shell() function I modified it to call an external function setup() before execution was passed to main(). Also I added a module to the C library named setup() which contains only a return instruction. When a program is linked without DEBUG, _shell() calls setup() which returns inmediatly. When DEBUG is included on the link command line, _shell() calls setup() which intializes the debugger and permits setting of break points even before main() is called. This scheme adds four bytes to every program I link (For the extraneous "call setup()" and return instruction), but means I don't have to recompile anything to use the debugger. This approach will be useful when I add an execution time profiler to my C compiler. You'll have to workout your own scheme for passing control to setup().
Finally, the debugger calls some functions which might not be in your library yet. These include the symbol table functions symmk(), symadd(), symlkup() and symdat() described in a previous article. The function strlwr() converts a string to lower case and returns a pointer to the string. isatty() returns true if the file number arguments corresponds to a terminal, and strchr() returns a pointer to the first occurance of a given character in a string. (This used to be called index() in the standard library). To implement this debugger these functions must be included.
If your C compiler needs a source level debugger, I'd be flattered if you'd consider the one presented here.
Bibliography
O'Connell, Patrick, Relocating Macro Assembler and Linker for Z80 and HD64180. Echelon Inc. 101 First Street, Los Altos CA 94022
Colvin, Jim Q/C Users Manual. The Code Works. Santa Barbara, California.
This article originally appreared in Computer Language, April '88