A New Kind of Computing
Bernard A. Hodson
The industry today is plagued by a variety of problems, including insecure operating systems, viruses, worms, spam, theft of identity, intrusion into personal systems, wireless data interception, satellite data interception, hackers, and so on. The costs to industry from spam alone are high, and viruses have played havoc with business activity, even putting some companies out of business. Security threats to individuals, companies, and countries are increasing. It is high time that we addressed potential solutions and acted upon those that offer the most promise.
This article describes one possible solution and outlines a programming paradigm that could be developed as a standard. The solution has already been used successfully on several levels of computers, from main frames and microcomputers to 8-bit RISC chips for smart cards and embedded systems. The paradigm proposed is a standard that can apply to all levels of programming activity, with considerable flexibility for customization. It has the potential to eliminate most of the problems mentioned earlier and is small enough that the entire system could be encrypted for each computer and server. It would simplify or eliminate all operating systems.
The paradigm utilizes an expandable language, which can be converted to a byte string on any computer. The byte string is completely independent of the target computer for the application. Using the rules established for the paradigm, a simple compiler can process any application written in terms of the expandable language. In fact, the rules are so simple that we can develop an application without the need for a compiler, generating a byte string for acceptance by the run system. The end of the article explains how readers can obtain a copy of the simple compiler and a basic expandable language, to try that phase out.
The run system processes the generated string of byte codes. To do this, it uses a double numeric system, which uniquely identifies every element needed within an application. This technique makes the virtual processor run system very tiny, from three or four thousand bytes for 8 bit RISC chips using typical smart card and embedded systems applications, to seven or eight thousand bytes for a microcomputer with simple graphics, to somewhat more for the processing of video images and other more demanding applications. The numeric coding system used uniquely identifies every activity that the system will carry out, making for a fast running application. The numeric codes of the paradigm are unique, which makes it possible to add new capability without affecting what was already developed.
To Compile or Not to Compile: That is the Question
The author has had experience in developing compilers and a JAVA run system. FORTRAN and COBOL (as for C, C++, PL1, and other compilers) generate a machine language structure, which utilizes a library of subroutines. In JAVA, a string of byte codes is generated which requires a library of methods and similar structures. The libraries for both approaches tend to be large. FORTRAN and COBOL are quite limited in their capabilities while JAVA is verbose with a clumsy vocabulary. Even the very simple ubiquitous "hello world" application in JAVA needs a huge amount of methods and resources.
The compiler for the paradigm of this paper is itself very small and can be placed, if desired, at the front of the run system, taking just a few hundred more bytes (as the compiler and run system have mutual routines). In that situation, the language elements, rather than the byte codes, are presented to the system, which generates the byte codes first and then runs them. This mode is particularly useful for safety critical applications (where the compiler and the application have to be tested whenever a change to either occurs). For the remainder of this article, let us assume that the compile is complete and the system was presented with a string of byte codes. Figure 1 shows the compile operation.
Figure 1: The compiler converts language elements to byte codes.
All elements of the run system are static. The only variable part of the paradigm is the generated byte stream, which will vary from application to application.
Every application consists of a string of language elements that may be associated with parameters such as numbers or variable names. With the exception of numeric data and literals, all language elements and variables are converted to a single byte. Each language element is associated with a number currently running from 0 to 255, although at present only a fraction are used. It is unlikely that more than 256 will ever be needed but, if so, the number can simply be increased to just short of 65536, without in any way affecting what has been developed before such an extension takes place.
Some typical examples of language elements are:
looping 1 1 100 adr grt
screen ^hello world to^ name
arith alpha = beta + gamma / delta + 13
The language elements shown are "looping," "screen," "bitmap," and "arith," which will have a numeric code associated with them such as 3,5,+, or *.
For the element "looping" the numbers 1 1 100 represent looping parameters going from 1 by steps of 1 to 100. The symbols adr and grt represent transfer points for the true or false result of the operation. Such a language statement may result in the byte code sequence 2(1(1(d25.
The (1 indicates that a numeric number has been converted to its binary equivalent—in this case a 1. The 25 indicates that the second and fifth named language statements are to be transferred to depending on the result of the looping arithmetic (this is done automatically during the compile process). Other language elements give alternate forms of loop control.
The element "screen" might result in the byte code sequence 511, indicating that the first literal is to be placed on the screen, followed by the first variable, which would likely contain the name of a person receiving the message. It has been ascertained that few applications contain more than 256 variable names. While this is the limit in the initial system, extension to just less than 65536 variable names can be accomplished without disturbing what has been developed previously.
The element "bitmap" would have the single byte code +, which would trigger a sequence of activity in the run system asking for the name of the bitmap image that should be produced.
The final language element "arith" might generate the byte codes
where the fourth variable has the result of taking the seventh variable, adding the sixth divided by the third, to which we add 13.
In this case, the % indicates that what follows is a floating point number whose length is 5 with the positive sign and whose value is 13.0. Again, the relative numbers used are a function of the compiler, and the programmer doesn't have to be aware of the coded sequences.
An initial reading may suggest that the structure is complicated. However, it is this simple—the very tiny compiler does the numeric conversions and the run system processes them. The programmer does not need to specifically know the coding system. The run system, from that byte code stream, does exactly what is required.
One important observation is that a spurious byte code introduced nefariously would likely cause the application to abort. For applications that are more critical, we could add a check sum at the end of the byte codes, giving the total value of all bytes, and more or less guaranteeing security from hackers and virus activity. This would be checked at the beginning of an application.
The Virtual Machine Processor—The Run System
The run system consists of about 30 small modules in native code, the number of modules depending on the functionality included. Most of the modules are independent of each other so that the size of the virtual processor (VP) can be reduced for the client needs (for example, smart card applications may not need the graphics or the bitmap modules). Even so, the VP is very small for most client needs, ranging in size from about 4k bytes to 10k, depending on the functionality included. Access to the VP is through a numeric code within the static part of the software.
Most of the modules require only a few bytes of machine code, the only exceptions being modules such as bitmap and the software floating point routines for add, subtract, multiply, divide and test floating point numbers (which are similar to but more accurate than the IEEE format). The technique of numeric coding for the static part of the system is what makes such a small VP size possible. The coding also enables the VP to go directly to both the module required and its associated parameters.
Most of the modules are concerned with data moves from direct or indirect addresses, and with binary arithmetic and logic routines. These are all that were necessary from a review of compiler-generated code from many applications in a business environment.