If you have listened to March 2nd's
Geeknights, you would have heard Scott discuss how a high level code is either complied or interpreted into Assembly or machine code. Well, if you would like to delve further into this subject matter, here is a small lecture which will go over the basics of computing (hastily I'll admit) and later on we will cover a simple interpreter for the Brainfuck language. The lecture covers a lot of material within a small amount of text, so I have glossed over a lot of information. If you wish, I can delve deeper into any topics you would like to know. This will be a multipost topic within this thread, hopefully I can bang it out in two posts.
I have also assumed that you have at least listened to the Geeknights shows on computers and programming which will provide a decent background in which to understand the material. With this little project, you will:
- Learn about the computing process in detail
- Learn Brainfuck ^__^
- Understand how assembly works with a processor (The LC-3)
- Get a basic overview of a simple computer.
- Impress your friends with awesome knowledge
Alright, let's begin. First off, let's go over the scope of the computer hierarchy. Here is a basic flow chart of computing system:
Problem->Algorithm->Language->Machine Architecture->Micro-Architecture->Circuits->DevicesProblem- This is what we wish to do with our computer. What is the task we wish to perform with the computer? Do we want to play a video game, write a paper, or perform calculations? Having a solid grasp of the problem allows us to find a solution to the problem in the quickest and most efficient manner. This stage is the furthest most computer users will ever utilize and will never delve deeper into the computing hierarchy.
Algorithm- The algorithm is the basic overview of the solution to our problem. It is what allows us to transform the solution to the problem from a natural description to some series of steps that progress towards termination and a final solution. So, do we need a program which can take in input from our keyboard and save it in a separate file? Do we need a three dimensional world which can simulate realistic physics? Or do we need a mathematical formula to solve a complex equation? With the correct algorithm we can solve our problem quickly and correctly every time.
Language- This is the level most programmers spend their time. The language consists of any high-level programming language in which one uses to implement their algorithm. There are hundreds of languages out there today, all of which allow you to solve different kinds of problems. Popular examples today are Java, C++, Python, Lisp, C, and the one we will be using today, Brainfuck. These languages are often complex, robust, and allow the programmer to implement their algorithm into a digital form or a series of steps that the computer can follow to create reproducible results.
Machine Architecture or ISA (Instruction Set Architecture)- This is the assembly language which takes the higher level language and transforms it into instructions which the hardware can interpret and operate with. The two ways that high level languages can be converted into Machine Architecture code are
compliers or
interpreters. One of the most famous ISA's is the x86, which most of the chips on the market today are based on. Both AMD and Intel use x86 in their processors today. The ISA is made up of various Opcodes, or instructions for the processor. Because the LC-3 is a 16-bit processor, the opcodes are 16 bits long. Let's say you have a 32 bit processor in your computer right now, it's opcodes would be 32 bits long. A 64 bit processor would have 64 bit opcodes.
Micro-Architecture- This is the level that most people think of when they envision a computer. You processor, motherboard, RAM, Video Card, Hard Drive, and input devices such as mice and keyboards fall in this category. This level transforms the Machine Architecture into an actual implementation.
Circuits- The circuit level describes physical construction of each of the components of the Micro-Architecture. The main component of this level are
logic gates, simple components of wires and devices which allow simple
boolean logic execution.
Devices-
Transistors. Only the most advanced electrical engineers will work at this level. Logic gates are constructed using wires and transistors. They are the main units of operation in your computer.
So, what is the LC-3? The LC-3 is a simple implementation of the
Von Neumann model computer, a simple but outdated architecture. The LC-3 is also an assembly language (Instruction Set Architecture or ISA) for the specific LC-3 processor (there is only one). The LC-3 processor is a 16-bit processor which means that it can handle bit strings of length 16. This bits are either 1's or 0's, your basic binary. However, there are multiple ways to represent numbers with binary. The most common today is
Two's Complement. I won't be going over the specifics as it is a little more complicated that what I was planning on going over.
Here is the Micro-Architecture diagram of the LC-3 based computer.
To most of you, this diagram is extremely confusing and is totally beyond all comprehension. I'll try to go over the basics and hopefully you will understand some of the aspects of a computer.
A: This is your Register File or
Processor Register, it is a small amount of memory on the processor which can be quickly accessed by other aspects of the processor. The LC-3 has eight registers within the Register File.
B: This is your memory, commonly known as RAM. This is where the majority of your operations and instructions are stored.
C: This large black line is your BUS, which is used to carry 16 bit binary numbers to several components within the computer. It is the main system of transit for your computer.
D: The PC or Program Control holds the MEMORY LOCATION of the next instruction to be carried out. This memory location is sent to the memory which then retrieves the instruction (opcode) and sends it to the processor for execution.
E: The ALU or Arithmetic Logic Unit is what performs operations to the bits in your computer. In the LC-3 it can only perform the operations ADD, bitwise AND, and bitwise NOT.
F*: I forgot to mark this component on the diagram, but it is the IR or Instruction Register. This keeps track of the current instruction being executed during the clock cycle and sends the instruction opcode to the Control and eventually the ALU.
Everything else is not necessary for this brief overview, but they all serve an important purpose. Components A and E are what make up the processor, B is your memory, and E and F make up your Control Unit. These are the basic components of a computer.
So how do we interact with the computer? We utilize the processors opcodes, or instructions which when inputed into the computers logic will cause changes in the computers state, which are then saved into the memory. Here is a list of the LC-3's opcodes:
Let's look at the first opcode on the list, ADD. The first four bits (15-12) in the opcode are the instruction code, it tells the computer which operation it wants to carry out. The next three bits represent the destination register, where we want to place the result of our operation. Bits 8-6 represent the first source register and bits 2-0 represent the second source register. Bits 5-3 are not used and thus are just zeros, the processor will ignore these bits. The registers are part of the register file, which the processor interacts with directly (the processor never directly interacts with memory). So, what would it look like when we want to add the value at register 1 and register 2 and place it into register 0?
In assembly:
ADD R0,R1,R2
In binary:
0001 000 001 000 010
000 is 0 in binary which represents R0, 001 is 1 in binary which represents R1 and 010 represents R2
So if we had the value 3 stored at R1 and 4 stored at R2, after executing this instruction the value 7 will be inputed into the register at R0.
So, utilizing these opcodes, we can write programs which can be executed directly by the computer hardware in order to perform computing tasks. This covers the basics on how one programs in assembly to perform basic computer executions. Hopefully with this background you will be able to marginally understand the interpreter.
If you have any questions, feel free to ask them and hopefully I can answer them within my next post. Until the next post, I recommend these PDF's for a more detailed explanation of the LC-3 ISA and the Micro Architecture of the LC-3.
ISAMicro Architecture
Comments
First off, let's go over all the files and programs you will need to run this interpreter.
This is the Simpl console. This is where we place our input for the program to read and where the program will print it's output.
Here we have our main simulator. Let's examine it more closely.
Red Box- These are our eight registers, R0-R7. The registers are where we are going to be placing values that we can manipulate with our opcodes.
Orange Box-This is our memory. The LC-3 has an address space of 216 or 65,536 memory locations. Not all of them are used for memory locations, several of them are used for various operations which help us execute our program.
Dark Blue- This is the memory location 3001 in hexadecimal. When we want to access this memory location we need to load this number into the PC (Program Control).
Light Blue- This is our 16 bit value stored at the memory location 3001. We can store integers, opcodes, and other memory locations in these values. This is where we will see our opcodes when we load our assembly.
Light Green- This box will show us a natural language explanation of the opcode stored at memory location 3001
After we File->Load(and reinit) our Brainfuckinterp.asm file, the simulator will show our assembly loaded into memory.
Red Box- This is our PC. If you look, you will notice that it stores the value x3000. This is the memory address which stores our first instruction of our code. If you step through the program one line at a time you will notice that it increments one integer at a time.
Orange box- The orange box shows the hexadecimal value of the binary number stored at the memory location.
Green Box- Another example of our natural language explanation of our opcode.
Dark Green Box- This is a label. The LC-3 language allows us to assign labels to specific memory locations to allow us easily call those memory locations in our code. I will discuss this later on.
On to the code! Here we have the beginning of our interpreter, which starts off with mah name and title of the code. You may also notice that I am using GVIM to write my code. You can use any text editor to create this code as long as you save the file as a .asm file.
Red Box-This is a commented block of how we are going to use our registers. It's always good practice to make sure you know what you are using each register for so you do not clear or load the wrong number in the wrong register. With assembly it is easy to forget whether your data pointer is in R4 or R5 and having them blocked out before hand can save you a lot of trouble.
Light Blue- This line of code tells the processor where we wish to place our first line of code in the memory. It is good practice to start at x3000 because there is usually nothing stored at that location. A lot of the smaller numbers are used for other processes and basic instructions.
Green Box- This is our label for this memory location, remember it from the previous example? We can use a opcode call BR or branch to return to this location and is a good way to implement loops.
Purple Box- This is the beginning of our main code. This specific location calls the TRAP opcode GETC to grab a character from the console and store it in R0 (Traps always store in R0). We will then compare the ASCII value of the character against saved integers to find out which character we have loaded.
So, if you have noticed, we have changed our location in memory to x5000. An astute observer might have noticed that this location is the first memory location of where we are going to store all of the instructions that we read in from the console.
Red Box-You can see that the explanation shows us the character stored in the memory location. I will go over valid Brainfuck commands later, but for now know that this is where the instructions are stored.
Orange Box- This is the ASCII value of the value stored in the memory location.
Light Green Box- If you look at R5 you will notice that it is pointing at memory location x506F. This is where it is currently pointing and were the next instruction would be placed if we were to ST a register. We must remember to manually increment it every time we store or else we will loose data.
Further down in the code I have initiated "variable" which store values in memory for use using the .FILL command.
Red Box- Here I have stored negative versions of the respective ASCII characters. I will add these to the currently inputed character to see if we have a match.
Light Green Box- These are the pointer values for our instruction memory and value memory. These are loaded at the beginning of the file. Can you find in any of the code pictures where this was done? (Hint: I use the LD Opcode)
This code segment is where all of the operations happen when we execute an operation. Try to see if you can follow the logic behind the code.
In this picture we can see a sample Brainfuck code in the console and the respective output in the console when it is run.
Now, let's go over the Brainfuck language itself.
Operators
> :Increment Data Pointer(R4+1)
< :Decrement Data Pointer(R4-1)
+ :Increment value at memory location stored in Data Pointer. (mem[R4]+1)
- :Decrement value at memory location stored in Data Pointer. (mem[R4]-1)
. :Print the value at memory location stored in Data Pointer. (Print mem[R4])
, :Accept one byte of input from console and store at Data pointer location. (Input byte in mem[R4])
[ :If value at Data pointer is zero, jump to instruction after next closed bracket, else increment Data Pointer
] :If value at Data pointer is not zero, jump to previous open bracket, else increment Data Pointer
X :End Program
So, now that you have a basics on how each operation affects the state of the machine and you have a little bit of knowledge of assembly, try messing around with various Brainfuck commands. The interpreter is written so that it ignores any character that is not an operator, so feel free to insert return carriages and other values if you want to comment your code. Here is your first code to try:
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.X
That just about covers the basics, feel free to ask me if you have any questions like usual. Enjoy.
Oh, and did you fix that bug in your prime number finder?
Also, AH FUCK BRAINFUCK! >.O
But it looks cool. I'll check it out eventually. I thoroughly enjoy teh blag, by the way.