creating a assembler

beyondsociety · May 19, 2002

Hi,

I am creating a assembler based on the intel 80x86 processors. I have the opcodes and am converting them to binary to show the relationships between the opcodes. I need to create a memory database in assembler. IT has to have three fields:

text field - variable - length
type field - one byte will do
numeric field - four bytes, one dword
very small size
search via text field

The database will contain the labels for the assembley file. Text field will contain the name of the label , type field will contain the type, and the numeric filed will contain the value.

I need help on how to create a memory database in assembler. would anybody have a memory database program I could look at or know of some website I could learm from I would appreciate it.

AmkG · May 20, 2002

Depends on how clean/dirty you want your label database to be.

If I were to implement the label database, I would do:
byte 0: n length of label name
byte 1-n: label name
byte n+1: type of label
byte n+2-n+5: label value
byte n+6: start of next label (byte 0 of next label)

This is rather efficient in terms of memory. But timewise it is a bit of a dog since you can only search from beginning to end. However you can use pointers to the start of label entries, and set up those pointers into a hash table or whatnot.
"Information has a tendency to be free. Which means someone will always tell you something you don't want to know."

straiph · May 20, 2002

hmm... your database sounds a bit simple for such a complex processor. maybe you should consider some of the following if you want to be practical in creating a good assembler.

you ought to have several context databases because the opcode value and number of bytes can completely change depending on the context.

you ought to stick to fixed length records as this makes searching easier. always put the primary search field first in every record. if you need to perform additional search on different data field then create an index where the first field is ordered with the data you require and the second tells you the record number in the database.

once you have deciphered the line context ie: displacement, immediate and register etc search its type database for acceptable mnemonics. if found you then use the "opcode image" to format the information into the machine code.

the opcode image should be the same for each type of context so you only need to record the opcode bits.

bbbbbrrr
00001111 bbbbbbbb mmrrrnnn ssiiibbb dddddddd dddddddd

where b=opcode bits, m=mod, r=reg, n=r/m, s=scale, i=index, b=base d=displaement etc

but you probably knew all that anyway.

it is better to write your own database routines so you dont get aload of unneccessary code that dont do exaclty what you want. maybe you could write a program to create a binary database so it is easier to maintain and update that your assembler will use.

to economise use a separate database for nmemonics. first search this to get mnemonic index (word) and have the index in your context databases. that way you wont have the same mnemonic repeated five for six times.

the only database routines your assembler needs: load database, search for record, load found record (error if not found).

in the FAQ under technical information find and download manual 2. This manual contains all opcodes and usage, a complete encodings list with various context types for each operation and also a full opcode map giving you 1 byte opcodes, 2 byte opcodes, 16bit & 32bit mod&r/m with SIB codings and escape code groups. Also included are opcodes for MMX, SSE, SSE2, FPU and system management.

That should keep you busy for a while! LOL (it did me)

"People who have nothing to say, say it too loud and have little knowledge. It's the quiet ones you need to worry about!"

AmkG · May 20, 2002

Straiph... he said that database was for the *labels* "The database will contain the labels for the assembley (sic) file. "... having fixed-length records limits the label names to a certain size. Bad idea IMO, especially since I have a penchant for labels of up to 40 chars in length (especially for IF/CASE structures, so that I don't repeat labels on other IF/CASEs), and there may be other programmers like me. A variable length record allows us to have label names of varying sizes.

As for the mnemonics/pseudo-ops...

Parallel tables would probably be easier to implement... each record is in a separate array. Your ideas for this are good Straiph, definitely mnemonics should be separate. Mnemonics don't reach 8 chars so an 8-char mnemonic label field would be fine. And definitely mnemonic searching needs optimization... erp... label searching needs optimization too though maybe even need it more... so I suggest using a hash table with pointers to the label records as entries in that hash table.

BeyondSociety... have fun. "Information has a tendency to be free. Which means someone will always tell you something you don't want to know."

straiph · May 22, 2002

Optimising labels is a toughy depending how the assembler works. If your assembler performs more than one "pharse" then you can prepare the labels database in an orderly manor. But if you are designing a single "pharse" assembler you have to consider how to add records on the fly giving optimum search capabilities. Just adding new records on the end and then performing a sequencial search each time a label if referenced is very slow. The add record function needs to maintain a sorted database but you need to think about how to implement this on a variable length record database. Also will this be quicker than performing a sequencial search? It will for large numbers of labels.

In any case, we have certainly given beyondsociety something to think about!

"People who have nothing to say, say it too loud and have little knowledge. It's the quiet ones you need to worry about!"

AmkG · May 22, 2002

Don't really need to do sort for a VARIABLE LENGTH database. Create a DIFFERENT, fixed-length database, with only one record: a pointer. This pointer points to the proper entry in the variable length database. So moving stuff around is faster because you're sorting small elements (pointers). This will add maybe 4 bytes (2 if Dos16Asm) to each label's memory usage, not a lot but not small either. However the memory savings (and sheer flexibility; I tend to use labels like DontLeaveMemoryMoveLoopInMoveImageProc

) of a var-length database would help counteract the little bit you lose in the sorted, pointer database.

BeyondSociety, hope you can get this to work! Have fun!

"Information has a tendency to be free. Which means someone will always tell you something you don't want to know."

straiph · May 23, 2002

Exactly! Isnt that called an index?

Anyway, I think BeyondSociety got bored with us along time ago! LOL
"People who have nothing to say, say it too loud and have little knowledge. It's the quiet ones you need to worry about!"

AmkG · May 23, 2002

Yeah, much likely.

"Information has a tendency to be free. Which means someone will always tell you something you don't want to know."

beyondsociety · May 24, 2002

Thanks for all your help and if you happen to think of anything else, feel free to post it.

"If you don't stand for something, you'll fall for anything"

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

creating a assembler

beyondsociety

Technical User

AmkG

Programmer

straiph

Programmer

AmkG

Programmer

straiph

Programmer

AmkG

Programmer

straiph

Programmer

AmkG

Programmer

beyondsociety

Technical User

Similar threads

Part and Inventory Search

Sponsor