Next:What If It's Too Easy
Up: Overview
Previous: Lesson 3
Lesson 9
Lesson 9: Characters
In this lesson we're going to talk about text in more detail. As a quick refresher, review the instructions we've seen so far to work with text (POINT, APPEND, OUTPUT). Notice that all of these instructions only treat text as "opaque" blocks. But what if we want to inspect the data inside of a piece of text? Well, that's what using characters is all about.
A character is the atomic unit of text. Each individual letter, or numerical digit, or space, or puncuation mark, is a character. Since characters are, like text, also not integers, they are not directly stored in registers. Instead, we will work with 'character id's, which are in some ways similar to 'text id's, and in some ways different.
First let's go over how to get a 'character id'. Doing so requires us to introduce a new instruction, called FETCH.
POINT DX,"a1&"
FETCH AX,DX
OUTPUT CHARACTER
The first line here gives DX the 'text id' of "a1&". The second line, FETCH, is the interesting part. It removes a character from the text that DX refers to, and puts the corresponding 'character id' into AX. Then the last line outputs the character that the 'character id' in AX refers to.
Reinforce now that FETCH is in fact modifying the text refered to by AX. This means that further uses of the text id will have different results. Here's a good code segment to demonstrate this.
POINT DX,"a1&"
FETCH AX,DX
OUTPUT CHARACTER
MOV AX,DX
OUTPUT TEXT
FETCH AX,DX
OUTPUT CHARACTER
MOV AX,DX
OUTPUT TEXT
The output from this segment, when run, will be:
a
1&
1
&
Though it appears from the output that OUTPUT CHARACTER and OUTPUT TEXT are very similar in nature, rest assured, they are not. Make sure the students understand that 'character id's and 'text id's are not at all interchangable. For example, FETCH requires a 'text id' for it's second operand, but produces a 'character id'. Also, the data refered to by a 'character id' is exactly one character long, whereas pieces of text can be any length at all, even including zero. Most importantly, 'text id's will almost always be different each time and place the program is run, however, 'character id's are garunteed to be the same for each character, and have very useful properties as well. We'll see what these properties are in just a moment.
But before we get to that, let's practice using FETCH and 'character id's.
Have the user input some text, then print each individual character on a seperate line.
In order to make this code, we can take advantage of another useful property of FETCH. It will remove a character from it's text, but only if there are any characters left to remove. If there are, then the FLAGS are set to =, otherwise they are set to !. This allows us to use FETCH/COND for conditionals, and more importantly, loops.
What we will want to do is use FETCH inside of a loop, testing immediately afterwords whether the operation succeeded or failed. In the case of failure, we are done with our original text, and should exit the loop. Well, let's get to it!
INPUT TEXT
MOV BX,AX
;loop
FETCH AX,BX
COND ! done
OUTPUT CHARACTER
JMP loop
;done
This accomplishes exactly what we need.
Now let's see what's so important about 'character id's always being the same. Well, for one thing, we can reliably compare them. Let's see how this works.
POINT AX,"sat"
POINT BX,"sip"
FETCH CX,AX
FETCH DX,BX
; now CX and DX each have the character "s"
MOV AX,CX
OUTPUT
MOV AX,DX
OUTPUT
; we outputed the character id's, they should be the same
CMP CX,DX
COND ! fails
;passes
POINT AX,"same"
OUTPUT TEXT
JMP done
;fails
POINT AX,"different"
OUTPUT TEXT
;done
As gaurenteed, the two 'character id's are the same, and CMP can reliably test them for equality.
However, this isn't the case for text. You may remember hearing that CMP for 'text id's only is trustworthy when it yields =. If two 'text id's are compared and the result is !, then the answer is inconclusive, rather than affirmative. Here's an example of such a scenerio where this happens.
POINT CX,"short"
POINT DX,"sh"
APPEND DX,"ort"
;text in CX and DX should be equal
MOV AX,CX
OUTPUT TEXT
MOV AX,DX
OUTPUT TEXT
;they look equal, let's compare
CMP CX,DX
COND ! fails
;passes
POINT AX,"same"
OUTPUT TEXT
JMP done
;fails
POINT AX,"different"
OUTPUT TEXT
;done
After running this code, it's clear that the text in CX and DX should be the same, but aren't according to CMP. Let's examine this a little more closely.
POINT CX,"short"
POINT DX,"sh"
APPEND DX,"ort"
;text in CX and DX should be equal
MOV AX,CX
OUTPUT TEXT
MOV AX,DX
OUTPUT TEXT
;they look equal, what about the id's?
MOV AX,CX
OUTPUT
MOV AX,DX
OUTPUT
Here the problem should be clear, even though the text refered to by CX and DX are the same, the 'text id's themselves aren't. Different 'text id's may indeed refer to equal pieces of text. To be precise, any text that has been modified by either FETCH or APPEND is not capable of being compared using CMP.
That's okay though, because now we have the ability to look at each character one at a time using FETCH in a loop!
Given two text id's, one in AX and one in BX, compare the pieces of text, setting the flags to = if they are equal, and to ! if they are unequal.
In order to do this, we need to define exactly what equality means for text. The term's meaning is not quite as obvious as it is with integers.
There are two parts to the precise definition. First, both pieces of text must have the same length. Secondly, each character, when compared to the character in the same position of the other string, must be equal. We can combine these conditions into saying that when we FETCH from each text, either both must succeed with the same character, or both must fail. If after repeatedly matching the same character with successful FETCHes from each text, they then both fail at the same time, we know the text pieces are equal.
This sounds like all we need to make is a loop iterating over two pieces of text at once. Here's some code.
;loop
FETCH AX,CX
COND ! firstIsDone
FETCH BX,DX
COND ! unequal
CMP AX,BX
COND ! unequal
JMP loop
;firstIsDone
;first is finished, only if second is
;also finished are the texts be equal
FETCH BX,DX
COND ! unequal
JMP equal
TODO: finish me
Next:What If It's Too Easy
Up: Overview
Previous: Lesson 3
by dlong@progmatism.com. Plz don't copy kthx.