String

So far in strings we have just declared them in .data section. And used them as it is. Now we will look operations on strings. For 32 bit segments, strings instructions use ESI and EDI register to point the source and destination operand respectively.

There are basically 5 instructions for processing strings. The first 3 letter in the instruction tells what instruction does, and ‘S’ stands for String, followed by a letter representing the size to operate, ‘B’ (Byte) ‘W’ (Word) ‘D’ (Double word)

  • MOVS : This instruction moves 1 byte, word, or doubleword of data from memory location to another

  • LODS : This instruction loads from memory. If data is 1 byte it is stored in AL register, if 1 word then AX register, if 1 doubleword of data then EAX register.

  • STOS : This instruction stores data from register (AL, AX, EAX) to memory.

  • CMPS : This instruction compares two data item in memory.

  • SCAS : This instruction compares the contents of a register with the content in memory.

Before moving forward, lets see the difference between strings and repetition instructions.

Repetition(REP) instruction:

Generally used to manipulate data buffers. A form of array bytes. Common data manipulating instruction are movsx, cmpsx and scasx. ESI is used for source EDI is used for destination. These will require a prefix to operate on data length more than 1. All above mentioned instructions (movsx, cmpsx etc) operates on a single byte. Repeat (rep) prefix are used for multibyte operations.

  • REP (repeat until ECX equals 0)

  • REPE, REPZ (repeat until ECX equals 0 or as long as the zero flag is set. Both instruction means the same)

  • REPNE, REPNZ (repeat until ECX equals 0 or as long as the zero flag is unset. Both instruction means the same)

Repetition instructions are generally seen with string instructions. They are used to repeat the instruction until certain conditions are met.

Possible combination of Repetition and Strings instructions →

💡 MOV and CMP operates on 2 strings whereas LOD, SCA, STO operates on 1 string.

  • MOV moves data from the source string to the destination string.

  • CMP compares data between the source and destination strings.

  • LOD loads data from the string pointed to by ESI into EAX.

  • STO stores data from EAX into the string pointed to by EDI.

  • SCA scans the data in the string pointed to by EDI and compares it to EAX .

Lets look a few in action.

Copying string (string-copy.nasm)

Here, ESI points to source and EDI points to destination where string will be copied. rep instruction will repeat untill ECX becomes zero, and we have entered ECX equal to length of source. movsb will copy data byte by byte from source to dest.

Here we can see value of ECX getting decremented after each repetition.

Comparison of strings (str-cmp.nasm)

Here we see, source is “Hello WOrld!” and comparison string is “Hello world!”. Both of them are unequal, therefore the result should be “UnEqual Strings”.

Also, as a record, lets note that the strings become unequal from 8th place.

Now, when comparison starts Zero flag is set. Earlier it isn't set as seen above.

We can see Zero flag been set throughout. As we will reach the 8th occurrence (ECX=6) Zero flag will be unset. And we read above, that REPZ repeats untill ZeroFlag is set. As soon as an unequal byte is compared, ZeroFlag is unset. That means till ECX=7 Zero Flag will be seen.

And as expected, we see ‘UnEqual Strings’ after ECX=7.

Last updated