Metaphor: the Rosetta Stone
How can a computer run programs in HyperTalk, Pascal, COBOL, FORTRAN,
LISP, C, Smalltalk, and so on?
Binary Representation
A modern computer deals exclusively with zeros and ones. How shall we
represent the things we need, like numbers and characters?
Instead of using base-10 positional notation,
247 = 2 * 10^2 + 4 * 10^1 + 7 ¥ 10^0
we'll use base-2 instead, using powers of 2 rather than powers of 10:
247 = 1 * 2^7 + 1 * 2^6 + 1 * 2^5 + 1 * 2^4 + 0 * 2^3 +
+ 1 * 2^2 + 1 * 2^1 + 1 * 2^0
= 128 + 64 + 32 + 16 + 0 + 4 + 2 + 1
= 11110111, in binary notation
Converting a Number from Binary to Decimal
(1) Write the binary number and beneath each digit write the power of 2
to which it corresponds.
(2) Multiply each power of 2 by the digit and add the results.
Example: Finding the decimal equivalent of 10111
1 0 1 1 1 the binary number
16 8 4 2 1 powers of 2
1 * 16 + 0 * 8 + 1 * 4 + 1 * 2 + 1 * 1 = 23
So 10111 represents the number 23.
Converting a Number from Decimal to Binary
(1) Divide the number by 2 and write down the remainder.
(2) If the quotient is 0, you're done, otherwise replace the current
number by its quotient and go back to step (1).
(3) The remainders are the binary equivalent.
(This process works in any base, by the way.)
Example: Finding the Binary Equivalent of 23
ASCII
To represent characters, we decide on a numeric code, such as ASCII
(American Standard Code for Information Interchange).
We can extend this code to strings by either
(1) Using a digit at the front to tell how long the string is
(2) Putting a special code (such as 0) at the end
Using the first technique encodes the three-character string "CAB"
(ASCII codes 67, 65, 66) as
00000011 01000011 01000001 01000010
and using the second gives the representation
01000011 01000001 01000010 00000000
Encoding Instructions
A hypothetical computer in which each instruction consists of a 4-digit
operation code, along with 12 more bits (binary digits) for additional
information.
For our computer, we might have the following instruction codes, where
X represents a 12-bit binary number.
Code Information Action
0000 X Store accumulator value in
memory cell X
0001 X Load contents of cell X
into accumulator
0010 X Set accumulator to X
0111 Halt execution
1001 X Add contents of cell X
to value in accumulator
An accumulator is an additional memory cell used to store temporary
results of computations.
Sample program:
0001 000000001100 ; Load acc from cell 12
1001 000000001101 ; Add cell 13 value to acc
0000 000000001100 ; Store acc into cell 12
0111 ; Halt
Here's how it would work:
Assemblers
Clearly, programming in machine language is hard.
So assign a string code to each machine language instruction.
For example:
0000 --> STO
0001 --> LOD
0111 --> HLT
1001 --> ADD
Now, our sample program is much easier for us to understand:
LOD 12
ADD 13
STO 12
HLT
But the downside is that the computer can't read it.
An assembler program translates source code in assembly language into
object code in machine language:
In general, one line of assembler language translates into one
instruction code.
Languages for People
Anything that makes programs easy to write and understand is all to the good.
We can make a language of our own:
(1) Design the language by
(a) Specifying what forms the statements in our language will take
(b) Describing the action of each statement
(2) Design a translator from our language to the language of our target
computer.
That's what the designers of HyperTalk, Pascal, C, LISP, Forth, and
BASIC had to do.
It's not impossible, just difficult.
A Taxonomy of Languages
Programming languages may be grouped into broad classes, depending upon
their underlying design philosophies.
Sample: take a list of integers and form the sum of all the elements in
the list that are odd.
In HyperTalk, we have:
put 0 into sum
repeat with num = 1 to number of lines in field 1
if (line num of field 1) mod 2 0 then
add line num of field 1 to sum
end repeat
The four classes of languages we will look at are:
Imperative languages
Functional languages
Declarative languages
Object-oriented languages
Imperative Languages
The fundamental construct is the procedure. A program has a main
body which calls procedures.
Procedures are made up of statements, most of which "do something" to
data.
Examples: FORTRAN, Pascal, C, Modula II, Ada
Our sample in Pascal looks like this, in part:
type TermIndex = 1 .. 100;
TermArray = array [TermIndex] of integer;
var myTerms : TermArray;
procedure SumOdds(n : TermIndex ;
terms : TermArray ;
var sum : integer);
var i : TermIndex;
begin
sum := 0;
for i := 1 to n do
if Odd(terms[i]) then
sum := sum + terms[i]
end;
Functional Languages
A program is a function which transforms input to output. Functions
are defined in terms of other functions.
LISP is the most widely used functional language.
Our program in LISP:
(DEFUN SUMODDS
(LAMBDA (TERMS)
(COND
((NULL TERMS) 0)
((ODD (CAR TERMS)) ((PLUS (CAR TERMS)
(SUMODDS (CDR TERMS))))
(T SUMODDS (CDR TERMS))))))
Declarative Languages
The emphasis is more on describing the interrelations of the data
manipulated by the program and less on how the data are manipulated.
Many of the details about how data are manipulated are hidden from
the programmer.
Examples: COBOL, Prolog
Our sample program in Prolog:
sumodds([], 0).
sumodds([H|T],N) :- sumodds(H,N1), sumodds(T,N2),
sum(N1,N2,N), !.
sumodds(X,N) :- mod(X 2 1), eq(X, N), !.
sumodds(X, 0).
Object-Oriented Languages
The fundamental unit is the object. An object contains data, along
with methods for manipulating the data.
Objects communicate by passing messages.
True object-oriented languages support inheritance, in which an
object may inherit the data structures and methods of another.
Examples: Smalltalk, C++, Object Pascal, and (to a lesser extent)
HyperTalk
Our program in Smalltalk:
sumOdds
| total |
total ¨ 0.
self do: [ :each | each 2 = 0 ifTrue: [total ¨ each + total]].
total
Implementing a Language
Translators for high-level languages come in two flavors:
Interpreters, in which the source code is translated each time it is
encountered when running the program.
Example: BASIC, LISP, Smalltalk and HyperTalk are usually interpreted.
Compilers, in which the source code is first translated once and for all
into object code, and then run.
Example: FORTRAN, Pascal, and C are usually compiled.
Trade-offs:
Interpreters are easy to write.
It is easier to provide debugging support in interpreters.
Compiled programs generally run much faster.
Translating a HyperTalk Statement
put sum + term into sum
Phase 1: Scanning
Break the source code into tokens, the smallest meaningful units in
HyperTalk:
put is the start of a HyperTalk command
sum is a variable; assign it memory cell 12
+ is an operator
term is another variable; assign it cell 13
into is the required second part of a put
command
sum is a variable we've already assigned
Phase 2: Parsing
Combine the tokens into an internal structure that stores the syntactic
"sense" of the program, often done with a parse tree:
Phase 3: Code Generation
Use the parse tree to generate code.
LOD 12
ADD 13
STO 12