Recently several people had shared their combinator-based computers, now it is my turn:)
The main goal was to create easily-extendeble CPU (probably with multithreading, but i am still working on this) with hig-performance and very flexible to adding/removing/changing its functionality.
Stage 0: "The beginning"
This setup contains only vital things of any CPU: IP (instruction pointer) with increment, several registers, ROM memory (2 cell) and 2 instructions, *RAM memory is not vital for computer.
On this simple CPU we've already can write simple program, lets do it.
Our program will be very small and simple because our CPU is almost useless.
Code: Select all
start:
r0 = r0 + 1
jmp start
At this moment I don't know all purposes for this CPU, but I am going to show how to use it by yourself.
Each peripheral devices requires its own system command to communicate with this device.
If you will be able to write your own command than you will be able to use any devices you want.
Code: Select all
start:
r0 = r0 + 1
print r0
jmp start
The main idea of this CPU is to implement only necessary instructions, and reuse them in other purposes when it is possible. New instruction appears from real applications, this is the reason why I am describing this CPU together with code. On the previous stage we've implemented simple counter from 1 to infinity (2148M). Now our task will be more complex: our counter will go from 1 upto 100 and from 100 downto 1, and again from the begining.
Code: Select all
r1 = -1
increase_init: r1 = r1 + 2
r2 = r2 + 100
increase: r0 = r0 + r1
print r0
if (r0 == r2) jmp decrease_init
jmp increase
decrease_init: r1 = r1 - 2
r2 = r2 - 100
decrease: r0 = r0 + r1
print r0
if (r0 == r2) jmp increase_init
jmp decrease
1: basic instruction: register + constant
2: print r0 to display
Not so much as you might think:) After that we need to identify instructions that we need to implement:
3: r0 = r0 + r1
4: if (r0 == r2) jmp
Only 2 new instruction - it will be easy. You may have noticed that some ROM cells are turned upward and some downward, I introduced this difference to show where we perform calculation (upward), and where we do some jumping (downward).
Stage 3: "Converting source code to instructions"
This stage can be fully automatized by some external applications, but it will be better if you will understand how it works inside.
Every implemented instruction has it own protocol, this protocol also defines input and output delay for the operation.
input delay means the amount of time that is needed to prepare output before it will be requested.
output delay means the amount of time that is needed to commit output after it was requested.
For example: you are introducing new instruction R0 = R0 * 5 + 10, and this instruction is implemented as two consecutive combinators, so input delay will be 2 ticks. Some operation can have bigger input delay (e.g. perfoming modulo operation) and some - output delay (writing to RAM).
If you want to implement an efficient program you need take into account this delay.
In our CPU that we are working on, instructions have next delays [input-output]:
1: basic instruction: register + constant [0-1]
2: print r0 to display [1-0]
3: r0 = r0 + r1 [1-1]
4: if (r0 == r2) jmp [1-1]
Further, I will add new instruction
8: r0 = r0 % r1 [2-1]
Why this delay is so important?
The answer is inside CPU architecture: I implemented instruction pipeline for this CPU. (https://en.wikipedia.org/wiki/Instruction_pipeline) And it is the source of our high performace, and also it the source of pain if you are transforming source code to instructions.
The problem occurs if you perform two instructions one by one and they share same variables, e.g.
r0 = r0 % r1
print r0
In this case r0 variable is not ready for writing to display, because it still inside dividing process. This happens if distance between first and second instruction is less than 1'st output delay + 2'nd input delay (distance = 1, output delay = 1, input delay = 1). To handle this problem you need to add an EMPTY command between them or to shuffle these commands with some others. Consider two examples:
instruction1: R0 = R0 * 5 +10 [2-1]
instruction2: R1 = R1 * 10 +5 [2-1]
Code: Select all
R0 = R0 * 5 +10
EMPTY
EMPTY // 2 commands are required because distance = 3, and output + input delay = 3
R0 = R0 * 5 +10
R1 = R0 * 10 +5
EMPTY
EMPTY
R1 = R0 * 10 +5
R0 = R0 * 5 +10
R1 = R0 * 10 +5
EMPTY
R0 = R0 * 5 +10
R1 = R0 * 10 +5
Now we are ready to write our own program.
Stage 4: "Optimization"
Here I want to show how to transform program into instructions manually.
Task: find the lowest divisor of an input number.
Several new instructions:
4: if (r0 == r2) jmp [1-1]
5: if (r0 != r2) jmp [1-1] // sibling instruction to #4
6: load r0 [1-1]
7: print r1 [1-0]
8: r0 = r0 % r1 [2-1]
Code: Select all
r1 = 1
start: r1 = r1 + 1
print r1
load r0
r0 = r0 % r1
if (r0 != r2) jmp start
Code: Select all
r1 = r1 + 1
start: r1 = r1 + 1
EMPTY//because r1=r1+1 is [0-1] and print r1 is [1-0], distance should be 2
print r1
load r0
EMPTY
EMPTY//because load r0 is [1-1] and r0=r0%r1 is [2-1], distance should be 3
r0 = r0 % r1
EMPTY//because r0=r0%r1 is [2-1] and (r0!=r2) is [1-1], distance should be 2
if (r0 != r2) jmp start
Code: Select all
r1 = r1 + 2
start: load r0
print r1
EMPTY//because load r0 is [1-1] and r0=r0%r1 is [2-1], distance should be 3
r0 = r0 % r1
r1 = r1 + 1
if (r0 != r2) jmp start
Calculating lowest divisor of 295927: Stage 5: "..."
I am still working on this CPU, but even now it is only 1.5 times slower than simple counter (I don't know exactly, maybe 40 instructions per second?). In the nearest time I am going to implement prime numbers calculation and RAM memory support - probably this will be quite easy.
And continue working on multithreading (my dream:)).
Usefull links:
Another CPU: https://forums.factorio.com/forum/vie ... =8&t=15113
Digit display: https://forums.factorio.com/forum/vie ... 62#p101362
Memory cell: https://forums.factorio.com/forum/vie ... 897#p97897
And my personal "Thanks" to @Hogdriver for inspiring me on this job.