Course Handout - Short-Term Memory Subtypes in Computing and Artificial Intelligence
Part 4 - A Brief History of Computing Technology, 1951 to 1958
Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2010, High Tower Consultants Limited.
First published online 10:30 BST 17th July 2003, Copyright Derek J. Smith (Chartered Engineer). This version [HT.1 - transfer of copyright] dated 18:00 14th January 2010
This is the fourth part of a seven-part review of how successfully the psychological study of biological short-term memory (STM) has incorporated the full range of concepts and metaphors available to it from the computing industry. The seven parts are as follows:
Part 1: An optional introductory and reference resource on the history of computing technology to 1924. This introduced some of the vocabulary necessary for Parts 6 and 7. To go back to Part 1, click here.
Part 2: An optional introductory and reference resource on the history of computing technology from 1925 to 1942. This further introduced the vocabulary necessary for Parts 6 and 7. To go back to Part 2, click here.
Part 3: An optional introductory and reference resource on the history of computing technology from 1943 to 1950. This further introduced the vocabulary necessary for Parts 6 and 7. In so doing, it referred out to three large subfiles reviewing the history of codes and ciphers, and another subfile giving the detailed layout of a typical computer of 1950 vintage. To go back to Part 3, click here.
Part 4: An optional introductory and reference resource on the history of computing technology from 1951 to 1958. This material follows below and further introduces the vocabulary necessary for Parts 6 and 7. The main sections are:
Part 5: An optional introductory and reference resource on the history of computing technology from 1959 to date. This material will further introduce the vocabulary necessary for Parts 6 and 7. To go directly to Part 5, click here.
Part 6: A review of the memory subtypes used in computing. To go directly to Part 6, click here.
Part 7: A comparative review of the penetration (or lack thereof) of those memory subtypes into psychological theory. UNDER CONSTRUCTION.
To avoid needless duplication, the references for all seven parts are combined in the menu file. To return to the menu file, click here, and to see the author's homepage, click here. We also strongly recommend the computer history sites maintained corporately by the Charles Babbage Institute at the University of Minnesota, the Bletchley Park Museum, the Computer Conservation Society, and the National Archive for the History of Computing at Manchester University, as well as those maintained individually by the University of Essex's Simon Lavington, Clemson University's Mark Smotherman, and the freelance historians George Gray (editor of the Unisys Newsletter) and Ed Thelen.
1 - Computing 1951-1958 - First Generation Hardware (US)
After the top secret development effort of the 1940s [see Part 3], a number of civilian General Purpose Computers (GPCs) were delivered more or less simultaneously in the early 1950s, led by the Ferranti Mark I and the LEO I in February 1951, and followed closely by the US Census Bureau's UNIVAC in March 1951. The design principles were those of the 1950 Eckert-von Neumann machine [reminder], and the technology was basically chassis-mounted valve-based circuitry - hot, heavy, and unreliable. The stage was then set for a relentless expansion in commercial data processing capacity as large corporations started to install Eckert-von Neumann derivatives - known nowadays as "First Generation" Computers - alongside their existing data tabulating machinery. As the decade progressed, a number of other GPCs emerged from the established centres of excellence, and the final list includes the IBM 701, the Harvard Mark IV, the Bell Labs Models V and VI, the NPL's DEUCE, Cambridge's EDSAC II, Manchester's Marks II through V, and the LEOs II and III; each faster, more reliable, and offering more memory than its series predecessor. But because the centres of excellence were themselves still learning, some surprisingly fundamental innovations were still being made, and in due course these would usher in a newer generation of systems. The transistor was one such innovation, and, having invented the technology in the first place, Bell Labs were one of the first to experiment with transistorised digital circuitry, and their 1954 TRADIC contained 700 point-contact germanium transistors. Manchester University did likewise with two small research machines, which they had running in November 1953 and April 1955 respectively, and so, too, did the UK Atomic Energy Research Establishment, Harwell, with their 1955 CADET. Among the lesser innovations in this period was the development of magnetic tape storage .....
Key Innovation - Magnetic Tape: The first coated tape method of audio recording was developed in 1927 by Fritz Pfleumer in Germany [timeline for magnetic recording technology]. The commercial development rights to Pfleumer's ideas were then acquired by AEG in 1932, and with assistance in plastic film technology from BASF a modern format reel-to-reel tape recorder [picture] was ready for demonstration at the Berlin Radio Fair in 1935. This breakthrough for the broadcasting and home entertainment industries was not lost on the emerging data processing industry, and magnetic tape readers and writers were included on the Eckert-Mauchly Computer Corporation's BINAC development in around 1949, and then on the 1951 UNIVAC, proving faster and more reliable than the earlier paper tape systems. << AUTHOR'S NOTE: Given that the overall purpose of this seven part paper is to study the penetration of computing concepts into cognitive theory, it is worth noting at this juncture that magnetic tape storage has little explanatory or metaphoric value in this respect. The essence of a tape as a tape is that it is dismountable, which biological memory most certainly is not, and the essence of the data stored upon it is that it is stored "serially", that is to say, there is no overriding logical sequence to the individual items stored nor is there any quick method of retrieving a particular record from mid-file - you just have to start searching at the beginning and keep looking until you chance upon it, which takes time. Biological memory, on the other hand, remains permanently online, and seems to rely on two considerably more flexible storage techniques, namely (a) "sequential" storage, where the individual records either arise naturally in, or have been sorted into, some clear logical sequence (time-stamped order, say), and (b) "random access" storage, where either an "index" or a "hashing algorithm" [details] allows a target record to be retrieved from mid-file. We therefore see little mileage in the tape metaphor, although our final evaluation will not be available until 2004. >>
1.1 The MIT Systems
Two of the very biggest innovations - ferrite core memory and the concept of real-time computing for command and control - came from MIT's Whirlwind team. Whirlwind's success for the US Navy in the 1940s [reminder], plus deteriorating relations with the then Soviet Union, prompted the USAF to commission an experimental air early-warning system from them. Whirlwind showed that such a complex linking of radar and computing systems was feasible on 20th April 1951, when it simultaneously tracked three separate aircraft from radar and computed their interception parameters (Waldrop, 2001) [the hard part, apparently, lay in getting the interceptors round behind their approaching targets]. This demonstration resulted in massive government funding for air defence technology, and the result (although it was not fully operationally deployed until 1963) was SAGE (= Semi-Automatic Ground Environment), "a continent-spanning system of 23 Direction Centres [picture], each housing up to 50 human operators, plus two redundant real-time computers capable of tracking up to 400 airplanes at once" (ibid., p75). The machines themselves were massive Whirlwind clones, code-numbered AN/FSQ-7, and each contained more than 50,000 valves and ran "the largest computer program written up to that time" (source). [For a fuller introduction to US air defence technology, see Boyne (1999/2003 online).]
BIOGRAPHICAL ASIDE - J.C.R. LICKLIDER: J.C.R. ("Lick") Licklider (1915-1990) was a psychologist-in-computing who made his name on the SAGE project. He had previously been with the Harvard Psychoacoustics Laboratory, where he had worked on measuring and improving the intelligibility of the voice communication systems available to bomber crews. He was thus one of the founding fathers of the signal-to-noise-ratio approach to perception which burst upon the psychological scene in the 1950s. He joined MIT in 1950 to work on the human factors aspects of multiple mainframe computing, and soon became convinced that it was only a matter of time and good design before properly trained operators would be able more or less to hold a conversation with their CPU. This was the birth of the "on-line transaction processing" type of modern computing, as already profiled [reminder]. In 1960, Licklider summarised these ideas in what is now seen as a classic paper, entitled "Man-Computer Symbiosis" (Licklider, 1960/2003 online), in which he wrote "..... the resulting [man-computer] partnership will think as no brain has thought and process data in a way not approached by the information-handling machines we know today". In 1962, Licklider moved to the Pentagon's Advanced Research Projects Agency (ARPA), to continue his work on distributed command and control systems, where he put forward detailed proposals to raise the connectivity of the various systems then in use, so that they could exchange data more easily. He left ARPA in 1963, leaving these concepts to be fleshed out and made to work by his successor, Larry Roberts, under the name ARPANET [more on which in Part 5]. Licklider also contributed towards MIT's 1963 "Man and Computer" (or "MAC") project [more on which in Part 5], which eventually evolved into the world's first "online community" (Waldrop, 2001), and wrote a number of influential papers profiling the sort of knowledge interrogation software which Internet search engines have since made commonplace.
1.2 The IBM Systems
As we have already seen in Part 1, Part 2, and Part 3, IBM had been the data processing corporation half a century before binary electronics was invented. By 1945, its punched card business had corporate customers worldwide, with storerooms full of data and processing suites full of card sorters and tabulator-adders. Yet these punched card archives largely defied complex data analysis, because the tabulator-adders were basically little more than large desk calculators fitted with card readers - they were like fish out of water when given multiplications and divisions to carry out. IBM's strategy after the war was therefore to see to its customers first and to do pure R & D second. This, after all, had been the strategy which had made them number one in the first place, and their CEO, Thomas J. Watson, saw no reason why it should not keep them there. So they took the new-fangled digital circuitry and installed it in their tabulator-adders, thus turning them into tabulator-multipliers such as the IBM 603 (1946) and IBM 604 (1948).
As for the pure R & D, IBM had been there with its chequebook for the 1937-1943 Harvard Mark I development [reminder], and had pushed through the 1947 SSEC [reminder] with no little urgency once Watson had decided that digital computing skills were too strategically important not to have in house. This was an inspired decision, and the SSEC experience immediately became the basis for the 1952 IBM 701 "Defence Calculator". This was a typical first generation computer, initially designed for the US defence market, and around 25 times faster than the SSEC it replaced. What made this machine historically significant is the fact that it also span off to become IBM's first top-of-the-range production machine for the civilian market, selling to the US Weather Bureau, aerospace companies, and the like. The project was directed by Nathaniel Rochester and Jerrier Haddad [team bios], and when the machine was formally announced on 29th April 1952 it came with enough CRT Main Memory for 2048 36-bit words, plus drum and tape backing store. Nineteen machines had been sold by the time it was replaced by the IBM 704 in 1955. The 704 was essentially a 701 upgraded to take the new and far more cost-effective ferrite core memory, and 123 of these were sold by the time it was replaced in turn by the IBM 7090 in 1960.
The 701 was a large system, both physically and functionally, and with a price tag to match. However, thanks to the vision of one of its then junior engineers, Cuthbert C. Hurd (1911-1996), IBM management were persuaded to include a scaled down version - the IBM 650 - in their sales portfolio. This was a smaller mass-market machine, lacking the expensive CRT RAM, but deliberately intended to be compatible with their tabulating machine range. The 650s retailed at between $200,000 and $400,000 according to the precise configuration, and around 1800 of them were sold (Gray, 1999/2003 online).
The IBM 650 was also honoured to be the first production machine to offer a random (or "direct") access disk memory option. To understand the true significance of this particular innovation, we must remind ourselves that our previous mentions of random access have related to main memory (where the words had even stamped the initials "RA" into the acronym RAM). When it came to backing store, however, the situation remained that files of any size would have to be stored as a succession of contiguous records - perhaps key-sorted, perhaps not - on magnetic tape [see earlier inset]; main memory could not (and, fifty years later, still cannot) hold more than a fraction of the total data available to the system. And the problem with magnetic tape, of course, was that as often as not several thousand feet of very delicate data and several minutes of waiting would be between you and the record you were after.
Now we have already seen how magnetic drum storage had been successfully implemented on the Harvard Mark 3 and the Manchester Mark I systems, and designers had already recognised that a spinning disk would have advantages over a rolling drum, if only the problems of size, cost, capacity, and reliability could be overcome .....
Key Innovation - Magnetic Disk: Inspired by the shape and ease of manufacture of the gramophone record, the magnetic computer disk was a platter (or stack of platters) of non-ferrous material, thinly coated with a magnetic oxide emulsion and accessed by a read-write head (or heads) mounted on the radius line. When installed as a "disk drive" computer peripheral, this technology allowed data to be stored in concentric circular "tracks", one beneath each position of the head(s). If you now took your giant tape file and wrote the records n-at-a-time onto as many tracks as it took, you could - providing you knew (or could in some way compute) the platter-track address - get hold of any one of them again in only a few tens of milliseconds, even if it were situated right at the end of the file.
The search for workable disk technology began in earnest at IBM in 1952, under Arthur J. Critchlow [IBM fact sheet and pictures] and the story has recently been re-told by Rostky (1998/2004 online). IBM announced the new technology on May 1955, and named it RAMAC. Each 50-platter drive stored what would today fit onto three standard floppy disks. It remains only to note that the direct access facility revolutionised both operating system design and application programming. To systems programmers direct access technology offered the sort of "paging" which would soon be used to deliver "virtual memory" systems [see Part 5; Section 1.2], and to applications programmers it offered not just disk-based files permanently online, but also "indexed sequential" organisation thereof capable of converting record key into a disk address. It also, in 1961, assisted at the birth of the modern database [see separate story].
1.3 The Remington/Sperry Systems
IBM did not have things entirely their own way, however, for they were facing increasing commercial competition in the early 1950s from the Remington Corporation [reminder]. In February 1950 and December 1951, respectively, Remington acquired the controlling interests in the Eckert-Mauchly Computer Corporation (EMCC), and Electronic Research Associates (ERA). EMCC, it will be recalled, was the company which the Johns Eckert and Mauchly had set up in 1946-7 to build BINAC and UNIVAC [reminder], and ERA was the company set up in 1946 by Howard T. Engstrom (1902-1962) and William C. Norris (1911-) to build the US Navy's 1947 "Task 13" machine (the machine which evolved into the US National Security Agency's October 1950 Atlas 1101 computer) [reminder]. And despite the fact that Remington's own computing department, the EMCC people, and the ERA people were geographically separate units, and retained a "high degree of autonomy, indeed rivalry" (Gray, 2002/2003 online), Remington had a minor star in EMCC's UNIVAC 1, which they duly capitalised upon by rebranding a demilitarised variant of the Atlas 1101 [picture] as the UNIVAC 1101, complete with a parallel organised arithmetic unit. They also won a second National Security Agency contract for the Atlas II (1950-1953), an Air Force contract for the (three-off) UNIVAC 1102 (1952-1956), and got government permission to market a civilian version of the Atlas II, to be called UNIVAC 1103 (Gray, 2002/2003 online). [Harry Wise was on that team, and has some technical recollections and photographs online - click here.]
This was all cutting edge stuff, of course, and it was projects like these which attracted the likes of Seymour Cray (1925-1996) into computing. Cray joined ERA in 1951, and impressed management so much that in 1953 they appointed him Project Engineer for the 1103 (Breckenridge, 1996/2003 online). He rewarded their faith in him by turning the 1103A into one of the most innovative machines of its day. One development in particular - a "program interrupt" facility - was developed on an 1103A at the Lewis Research Centre, Cleveland (Gray, 2002/2003 online), an outstation of the US National Advisory Committee on Aeronautics (NACA) [more background].
Key Innovation - "Program Interrupts" (First Mention): According to Gray (2002/2003 online), the 1103A interrupt facility "permitted the 1103A to interrupt the processing of one program to handle another". This might be useful, for example, if an urgent job arose part-way through the execution of a much longer one, but where too much processing had already been done just to abandon it and start again later. To do this switchover safely, requires taking careful note of the position of the executing program, and then being able to reinstate all the registers and Main Memory settings to their pre-interrupt state afterwards, and this in turn requires sophisticated systems software and additional memory resources. We return to this topic in Part 5. << AUTHOR'S NOTE: The idea of one program interrupting another is quite popular in modern theories of high level motor control, especially those of the Norman-Shallice type. For specific examples, click here or here. The role of interrupts, however, is nevertheless poorly dealt with. Our full evaluation will not be available until 2004. >>
Further enhancements to the 1103A followed in the form of the 1958 1103AF, the first production machine to provide a reliable floating point arithmetic facility (Gray, 2002/2003 online) .....
Basic Terminology - Decimal Fraction: A decimal fraction is the sort of number we see all around us every day, that is to say, numbers such as 127.30 or 3.142. Each such number consists of two theoretically infinite strings of digits, either side of a "decimal point". The digits to the left of the decimal point are known as the "integer" part of the fraction, and the digits to the right are known as the "decimal". Taken together, these three elements are sometimes referred to as a "mantissa".
Key Innovation - Floating Point Arithmetic: Ever since the days of Schickard and Pascal [reminder], calculating machinery manufacturers have had to trade off complexity and cost against accuracy. If you decided to offer your customers ten-digit accuracy, for example, then someone wanting to multiply two such numbers together would suddenly need to display up to 20 digits, yet it would be wasteful to provide a register that big if it were only going to be used once in a blue moon. This is why Babbage's 1822 Difference Engine went for 31 digits and turned out to be an engineering nightmare, whilst the 1642 Pascaline made do with eight [reminder] and was well received. An added problem was that the number to be displayed might be all integer, all decimal, or part and part, so calculator designers also had to keep track of where the decimal point should go, which they did by "fixing" it, and by leaving it to their customers to do "fixed point arithmetic". Things were no less troublesome for the computer designers of the 1940s, especially those working in binary (because base 2 numbers are roughly four times as long as their base 10 counterparts), and the eventual solution was to convert all numbers into what is called "scientific notation", a numbering convention where only one significant digit is allowed in the integer part of the mantissa, and in which the true scale of the resulting number is then obtained by multiplying it by an associated power term. Each power term consists of a "base" - typically 2 in binary or 10 in base 10 - and an "exponent", the power to which the base must be raised. Thus .....
0.147 is the same thing as 1.47 multiplied by 0.1, and can be expressed as 1.47 x 10-1
147 is the same thing as 1.47 multiplied by 100, and can be expressed as 1.47 x 102
14700 is the same thing as 1.47 multiplied by 10000, and can be expressed as 1.47 x 104
..... and so on. The decimal point, in other words, is now free to "float" from wherever it started off until it arrives at its standardised new position, leaving the power term to remember where it had been. The "precision" of a given system is the number of significant digits catered for in the mantissa. Thus the number 14741 in a precision-3 system (as shown above) loses the fourth and fifth significant digits and will be treated as 14700, and this means that we risk introducing "rounding error" into our arithmetic if we throw lots of significant digits at a machine which has not been set up with the precision necessary to handle that number of digits safely. However, the advantage of scientific notation is that it greatly simplifies the task of computerising the multiplication process, because your two multiplicands will now always lie between 1 and 9 (and can therefore easily be fixed point multiplied), and the power terms can be multiplied by adding the two exponents. Thus a precision-2 multiplication of 0.0201 by 35192 will proceed as follows .....
(1) Convert each multiplicand to scientific notation, thus .....
(2.0 x 10-2) x (3.5 x 104)
(2) Do the multiplication as described above, giving .....
7.0 x 10-2 x 104
(3) Calculate the new exponent as described above, giving .....
7.0 x 102
(4) Convert the answer back into everyday form, giving .....
As it happens, the arithmetically correct answer is 707.3592, so we are about 1% out, the difference arising from the aforementioned rounding error. It would therefore be wise to increase our precision, and a precision-4 multiplication of the same two numbers duly gives a better approximation, namely 707.3 [you actually need a precision-7 multiplication to remove the error altogether]. Unfortunately, rounding error is endemic in floating point arithmetic, so for scientific applications the precision is nowadays set high at an IEEE-recommended 32 bits for "single precision" working, or 64 bits for "double precision" working [details]; and even then things can go wrong, as when an unexpected rounding error of this sort contributed to a failure in a Patriot missile launch during the First Gulf War, resulting in a Scud strike on a US barracks, killing 28 servicemen [details]. As Goldberg (1991/2003 online) puts it, "squeezing infinitely many real numbers into a finite number of bits requires an approximate representation", thus presenting the computing industry with the intriguing paradox that the faster you can do your calculations the more you need to worry whether one of these inherent inaccuracies has taken place. [For a fuller introduction to the intricacies of floating point arithmetic, complete with worked examples, click here.] << AUTHOR'S NOTE: The floating point concept - perhaps surprisingly - is not totally incompatible (mutatis mutandis) with what is known about modular mathematical cognition. As we have explained elsewhere (Smith, 1998), mathematical cognition consists of at least two basic "left hemisphere" processes and at least one "right hemisphere" process, all of which seem capable of simultaneously working on a given mathematical task. The left hemisphere processes are (a) to establish the conceptual meaning of the numerals and arithmetical symbols used, and (b) to activate and put into effect the rule sequence by which the solution might be derived. The right hemisphere process is to predict the result of the overall calculation by roughly estimating what the answer should be. In addition, the ability to use the "number line" may provide both conceptual (left) and visuo-spatial (right) support. Our full evaluation will not be available until 2004, but, in the meantime, interested readers may find no little fascination in the following neuropsychological case study data - click here. >>
In 1955, Remington merged with Sperry Gyroscopes to form the Sperry-Rand Corporation, and in 1956 the 1103 was supplemented by the markedly smaller UNIVAC "File Computer". This machine got its name from another important innovation, namely the long-term storage of large commercial data files on magnetic drum, rather than on punched card - thus mounting a direct assault on IBM's hitherto captive market. Not only could large bodies of commercial data now be rolled forward on a daily, weekly, monthly, or yearly basis, but it would henceforth no longer be necessary to trolley the files in and out of the machine each time they were needed.
Key Innovation - Periodic Batch Processing: Accountants and bankers have had to "carry forward" balances from the end of one accounting period to the beginning of the next ever since money was first invented, and the basic principles of book-keeping within such an accounting period are laid out in Luca Pacioli's "Summa de Arithmetica" (1493). [William Goetzmann, of the Yale School of Management, places the first recognisable bank in Uruk in Mesopotamia (modern Iraq) around 5,000 years ago; click for details; picture ; for a delightful and totally fascinating history of the profession of accountancy, click here.] The calculations, however, were done by hand until the advent of mechanical calculators towards the end of the 19th century [reminder], and - if you were lucky enough to be able to afford it - were then semi-automated by the Hollerith style punched card industry of the early 20th century. Now the point about punched card processing is that data flowed from storage point to storage point via machine after machine and through tray after tray, being merged, sorted, updated, copied, collated, resorted, etc. Finance Departments suddenly had to be totally precise (a) in how they allocated their records - the cards - into files, and (b) how the files then metamorphosed as they passed through the various processing stages. Fortunately, although the nature of the data varied widely according to the nature of the business involved, there were some clear basic patterns. By far the greatest volume of data was "transaction data", that is to say, individual records of individual happenings during the period in question - individual sales, individual deliveries, individual money transfers, etc. However, this type of data only really meant anything when seen in the context (a) of the situation at the end of the preceding period (as set out in the opening balances contained in some sort of permanently maintained "master file"), and (b) of supporting detail such as rate books ("reference data") and customer details (as set out in some sort of "customer file"). Thus the sequence of events in a typical 1920s stock system might be to open up a master file of opening stock levels (containing one record card per item per location), and then to "tally forward" each stock level by the receipts, issues, and adjustments of that item contained in one or more "transaction files". At period end, this tally forward would generate the theoretical carried forward stock levels, which could then be audited against a physical stockcheck, if necessary. The experience of the Lyons Company's office systems guru, John Simmons, is probably indicative. He had been appointed in 1923, remember, in order to bring scientific rigour into the design and operation of Lyons' corporate processes [to streamline, first and foremost; to automate if that helped streamline; and to computerise if that helped automate], and their 1949 LEO project [reminder] succeeded in automating the batch processing routine for payroll applications in December 1953 [more on this in Section 2.4]. << AUTHOR'S NOTE: The standard tool for the specification of these large batch processes was the dataflow diagram (DFD) [background and example]. DFDs are now the diagram of choice for tracking the movement of information (a) through an extended system, and (b) through the processing stages carried out at any one node within that system. As for the use of flow diagrams to help explain cognition, they are already standard practice (albeit they are not always very well done technically). We have written at length on this subject elsewhere, and the subject can safely be approached from any of three directions, for the material constantly cross-refers. To begin with a specific example of a modular cognitive system, click, or to begin with an introductory commentary, click here, or to begin with some back-to-basics practical diagramming, click here. >>
Lyons' and Sperry's pioneering work led to a growing confidence that large corporate files could be safely committed to a medium which was inherently unreadable by humans, and this was commercially highly significant because it enabled the new electronic systems to threaten the data tabulating industry in its own back yard; to offer an upgrade pathway for the literally thousands of tons of legacy data still held on punched cards. The design spotlight therefore fell on such factors as storage capacity, processing speed, and system reliability, and it soon became standard design practice to assess whether the file handling and the number crunching aspects of a given process presented broadly balanced demands. If they did, then all was well, but if either was slow compared to the other, then you had to suspect a bottleneck of some sort .....
Key Innovation - I/O vs CPU Bottlenecks: New terminology soon emerged to describe the various problem scenarios. If a process struggled when reading data in or when writing data out it was referred to as "peripheral bound", but if it struggled during the number-crunching phase it was referred to as "processor bound". When the present author joined British Telecom in 1980, he learned of "the Great ISOCC Tea Trolley Disaster" [English humour]. ISOCC was the BT computer system then responsible for charging for operator-connected calls. For a time, however, "telephone tickets" [picture] were being produced by the nation's telephone operators faster than the billing centres could process them; specifically, it took about 25 hours to process every day's input slips, so the processing backlog simply got longer by the day. We can no longer recall with certainty whether the problem lay in getting the data punched into the machine (in which case the process would have been peripheral bound), or in carrying out the necessary calculations once it had been (in which case it would have been processor bound), but it was probably the former. << AUTHOR'S NOTE: Neuroscience does not know enough about biological processing architectures to judge whether the unit of binary decision making - that is to say, the biological equivalent of a logic gate - is a single neuron, a small neuron cluster, a large neuron cluster, an entire gyrus or ganglion, or an entire cortical layer. It is conceivable, indeed, that structures traditionally ignored by neuropsychologists, specifically the brain's "white" as opposed to "grey" matter, or even the glial supporting cells, may also be able to act in some way to modulate what is going on in their electrically noisier neighbours. We raise this as an issue, because until neuroscience can identify the logic gates it will remain unable reliably to identify the larger logic or control units either, so it is difficult to agree on which brain areas count as peripherals and which as central systems, and therefore which is getting overloaded. Interested readers may obtain some idea of current thinking in this area from the following website, click here. We extend the argument when discussing serial and parallel computing in Section 2.2. >>
Hollingdale and Tootill (1965) explain the problem, and its conventional solution, this way .....
"Before 1958 most computers were engaged on scientific or technical calculations of one kind or another; nowadays most of them are employed on clerical tasks - on payroll or stock control calculations, for example. In this kind of work the essential role of the computer is to manipulate large quantities of digitally coded material - usually referred to as data - in a prescribed manner [.....] The amount of computation that has to be done on each unit of data is usually quite small [..... //] The organisation of the rapid input and output of streams of coded data is, in fact, one of the major problems of [commercial computing]. [.....] The basic idea is to carry out all the slower input and output processes away from the main computer, whose precious time must not be squandered on sluggish menial tasks. These are delegated to a subsidiary computer, the essential function of which is to transcribe information from one physical medium to another ....." (p239-240.)
The upshot of all these problems was that Lyons and Sperry perfected some historically important methods of coping more effectively with peripheral bound processes by strategically placing additional memory resources about the system .....
Key Innovation - "Buffers" and "Buffering": "Buffers" in general are simply areas of temporary storage somewhere in size between a register and a decent slice of Main Memory, and they are important because they allow "buffering", that is to say, "the temporary holding of information in transit" (Purser, 1987, p312). Like any other form of digital storage, a buffer consists ultimately of a contiguous series of flip-flops, and the underlying technology may vary from machine to machine. To understand how buffering works, we need to remind ourselves (a) that Eckert-von Neumann GPCs [reminder] habitually process their input and output files through reserved record areas in Main Memory, (b) that file reading and writing are comparatively slow processes, and (c) that Lyons' and Sperry's customers were trying to funnel enormous volumes of data through card or magnetic tape readers. Bit String Buffering: The Lyons and Sperry designers therefore added what was effectively a small free-standing register between the relatively fast data bus and the relatively slow peripheral device controllers. As data arrived in the form of a slow stream of bits from a peripheral device, it was loaded into an "input buffer", and only when that buffer was full was the accumulated stream released - either at a higher serial rate, or in parallel - onto the bus to complete its journey to its destination in CPU or Main Memory. This short delay left the CPU free to be getting on with something else. A similarly arranged "output buffer" could buffer data en route from the output record area to an output device. Pinkerton (1990/2003 online) profiles the buffering system on the LEO as follows: "The buffers in each of the input channels were all valve circuits and mercury delay lines. They had little short tubes which assembled individual numbers and put them into position, one after the other until they filled it up. The same thing on the output side; some of the buffers were doubled on the input though not on the output. The logic allowed four input channels and four output channels ....." Main Memory Buffering: It is also possible to use Main Memory for buffering. This can be done by doubling (etc.) the number of Main Memory record blocks allocated to each file, and by then routinely loading these one record ahead of the ongoing processing. This sort of "double buffering" introduced a degree of record-level overlap into the system, such that the input delay for record <n+1> coincided with the processing time for record <n>, which, in turn, coincided with the output delay for record <n-1>. For very fast processes, "treble buffering" (etc.) could be resorted to, thus creating even more overlap. It follows that a given input character was likely to have been buffered twice on its way to the CPU, once at the device stage, and then again at the Main Memory stage. Instruction Buffering: We deal with another type of buffer - the "instruction buffer" - in Section 2.3, under the heading Pipelining. "Store and Forward" Buffering: This is a form of buffering which occurs (in addition to all the others) at the subordinate nodes in a communications network. It is needed whenever three or more computers have been interconnected, and it allows transmissions to be stored temporarily at node #2 en route for node #3, should the line to node #3 be down or otherwise engaged, or should the CPU at node #3 be too busy to accept the input immediately. Modern telecommunications - including the technology currently bringing you this page - devotes much of its attention to such problems, and network designers need to carry out a complex traffic queuing analysis before deciding how much buffering store to provide at each intermediate node. << AUTHOR'S NOTE: The concept of output buffering is heavily used in modern theories of motor behaviour - for specific examples, click here, or here, or here. The concept is less well handled in theories of perception, although the phenomena (a) of perceptual memory [see Sperling (1960)] and (b) of saccade-based text processing [see Gough (1972)] are both prime candidates for it. It is also poorly handled in matters of human communications, where the "store and forward" version of buffering is not catered for in any of the common forms of the "sociogram". This, as we have repeatedly argued elsewhere [eg. Smith (1997; Chapter 1)], seriously erodes the explanatory value of most modern cognitive models [for our most recent offering on this topic, see our e-paper on "peer-to-peer" telecommunication]. >>
2 - Computing 1951-1958 - First Generation Hardware (UK)
The 1945 demobilisation took all but a select few of the British wartime engineers from their top secret work and returned them - tightly sworn to silence by the Official Secrets Act - to peacetime employment. We begin this phase of our story with one of these demobilised boffins, Andrew D. Booth, one of the pioneers of magnetic drum memory .....
2.1 The Birkbeck and "British Hollerith" Machines
ASIDE: The British Rubber Producers Research Association (BRPRA) was founded in 1938 to promote formal scientific knowledge of said product. It set up research centres in both Malaya and Welwyn Garden City, Hertfordshire [pictures], and one of its first researchers was Cardiff-born George Alan Jeffrey (1915-2000), later Professor of Crystallography at the University of Pittsburgh. Now the point about rubber is that it is a highly strategic commodity, perfectly capable of winning or losing wars. Without it, for example, you have no tyres for your vehicles or aeroplanes, no engine mountings, no oil seals, no hoses, no rubber dinghies, etc. Strategically, therefore, you either have to own the hot places in the world where the product can be grown naturally, or you have to have the raw materials and industrial base to manufacture it synthetically. Either way, you need a first class team of physicists and chemists trying to make the stuff longer lasting and constantly analysing specimens cut from wrecked enemy equipment to see if they had managed to do the job better than you had (much of Jeffrey's war was spent X-raying slices of rubber cut from wrecked German aircraft). It was this strategic war work which the BRPRA was there to organise, and, humble though the job might sound, the rubber boffins probably contributed every bit as much per capita to the war effort as did the codebreakers at Bletchley Park. It was Jeffrey who recruited Booth, and it was the fact that X-ray crystallography creates raw data like there is no tomorrow that led Booth to develop automatic calculating machinery as a laboratory aid to data analysis in much the same way that Wynn-Williams had scratched together his thyratron circuits at Cambridge in the early 1930s [reminder].
Booth moved to Birkbeck College, University of London, after the war, and in 1946 his automatic calculating work brought him to the attention of the US National Defence Research Committee's chief "talent scout", Warren Weaver [bio follows in Section 4.1]. Weaver was intrigued by what he saw, and put Booth forward for a study scholarship under John von Neumann at Princeton's Institute of Advanced Studies [a cynic might ask in retrospect who was studying whom]. Accompanied by his research assistant Katherine H.V. Britten (whom he subsequently married), Booth was at Princeton from March to September 1947, and upon his return to Britain proceeded to build a small relay computer named the Automatic Relay Computer (ARC) (Lavington 1980/2003 online). He demonstrated this machine to Weaver on 23rd May 1948 at BRPRA's Welwyn Garden City centre (Hutchins, 1986/2003 online), and we shall be finding out what Weaver did next in Section 4.1. But what made the ARC more than just a British version of, say, the Bell Mark II relay computer [reminder] was the fact that Booth had cleverly linked it to a simple magnetic drum memory .....
Key Innovation - Magnetic Drum Storage: The advantage of magnetic drum storage over tape lies in the fact that the storage surface rotates continuously against a linear array of read-write heads [picture]. This means that your data is re-available to you once per revolution, unlike tape-borne data, where there is a lengthy re-wind time between passes. The sequence of magnetically coded pulses recorded by any one head is known as a "track", and each stored bit offers itself back to the head for reading and/or overwriting every few milliseconds [this rotational delay being known as its "latency"]. The system has quite a history, in fact, for the idea of "rolling up" your data storage in order to save space is as old as the scrolls of ancient times, and the idea of winding data around a cylinder, so that it spirals gradually from one end to the other, is clearly seen in both the Greek cryptographical system known as the scytale [reminder] and Edison's famous cylinder phonograph [picture]. It is therefore not surprising that when, in 1898, the Danish inventor Valdemar Poulsen was looking for somewhere to keep his magnetised wire recordings, he chose a drum [picture]; nor that the computer pioneer John Atanasoff toyed with the drum concept in the late 1930s on the Atanasoff-Berry Computer [reminder], albeit he ended up using surface capacitors as his storage substrate rather than a magnetic coating [picture] (Gray, 1999/2003 online). What is not so clear is who first applied Pfleumer's ferrite emulsion as a magnetic lacquer coating to a drum, and the first clear suggestion to this effect seems to have been in an MIT master's thesis by one Perry Crawford.
BIOGRAPHICAL ASIDE [INCOMPLETE] - PERRY O. CRAWFORD (? - ?): According to Jay Forrester, the Whirlwind Project Director, it was Crawford who first called his attention to the possibility of digital computation (Forrester, 1988/2003 online), this having been the subject of his MIT master's thesis during the period 1939? to 1942?. Mindell (2002) mentions Crawford as having succeeding Claude Shannon as postgraduate student at MIT, sometime around 1940, and it was in the period 1940 to 1942 that Crawford developed his ideas on the use of digital methods to fire control problems. During 1942 to 1945 Crawford served (as an attached civilian) with the navy's Special Devices Centre, later at Sands Point, Long Island, building flight simulators and training devices and coming up with the name "Whirlwind" into the bargain. In October 1945 he went to work for the Special Services Division of the Office of Naval Research. Crawford left the navy in 1952, and moved to IBM, where he became involved in the SABRE project [see Part 5]. He was still around to be interviewed in 1995 (by Mindell), but seems to have published little.
Booth experimented with the drum concept in his Birkbeck College laboratory in 1946-1947, and passed on the details to the Manchester University team who added it to their Baby [reminder] (Booth, 1995/2003 online). On balance, however, it is probably fairer to credit the first large scale investment in drum technology neither to MIT nor Birkbeck, but to ERA's John Coombs, as part of the US Navy's Project Goldberg, also in around 1946. ERA used a 5" drum revolving at 3000 rpm, and set their packing density at 230 bits per inch [source]. The literature also mentions an 8" drum developed in 1948 by the University of California at Berkeley (Spice, 2001/2003 online) and recording at 800 bits per inch, and other units on the Harvard Mark III and the SEAC [source]. << AUTHOR'S NOTE: The notion of a constantly recycling memory is superficially similar (a) to the psychological concept of the "rehearsal loop", and (b) to the physiological concept of the self-sustaining neural "reverberatory circuit". Rehearsal has been suggested by many authors as a mechanism whereby a small number of auditory items can be maintained in memory by constant covert repetition. Our full evaluation of the literature here will not be available until 2004, but to see rehearsal loops in a cognitive model click here. >>
After making his name with the ARC, Booth moved up to thermionic valves, trying out his circuitry in a 1949 prototype machine called the Simple Electronic Computer (SEC). When he was happy that he knew what he was doing, he followed this up with a number of larger, but still drum-equipped, machines in his 415-valved APE*C Series, where the APEC stands for "All Purpose Electronic Computer" and the asterisk indicates the particular commercial sponsor at the time. Thus the APERC [picture] was delivered to the British Rayon Research Association, the APEHC was delivered to the British Tabulating Machine Company ("H" because BTM were the holders of the Hollerith licence in Britain), the APENC was delivered to a Norwegian sponsor, and the APEXC was his own Birkbeck machine. The APEHC was BTM's first excursion into electronic computation, and they quickly reworked it in house as the HEC [picture]. They then - goaded no doubt by IBM's success with its electronic multipliers [see Section 1.2] - used the HEC experience to build punched card processors of their own, such as the BTM 541 (1953) and BTM 555 (1957) [pictures]. The HEC was also the inspiration for the BTM 1200 (1954) and BTM 1201 (1957) computers, and thus indirectly for the ICT 1300-series and ICL 1900-series mainframes which followed them. [Francis Townsend was part of the team responsible, and has some technical recollections and photographs online - click here.]
2.2 The Last of the Boffins
So what of the boffins who had been kept on after the war in the secretive government and military research stations we heard so much about in Part 3. Well basically we spoke of four such sites - Bletchley Park, the National Physical Laboratory (NPL), Dollis Hill, and the Research Establishment at Malvern, Worcestershire, so there are only four stories to tell, and we shall take them in order of mention.
After the war, the GCCS at Bletchley Park continued to develop its Colossus machines, sold an unspecified number to friendly foreign governments [which is one of the reasons their existence was not acknowledged for such a long time], and remained active in the security business for many years. In the end, however, the unit was relocated and the site itself has recently become a museum [to see what Hut 4 is doing nowadays (31st May 2003), click here]. But GCCS made no further computers [or, if it did, they are still classified], and so it now moves outside the remit of this paper.
We left the NPL in 1949, struggling to get their cool but decidedly complicated Pilot ACE up and running [reminder]. In 1948, Turing had been transferred to Max Newman's laboratory at Manchester, and the Pilot ACE continued in development for two more years under James H. Wilkinson and E.A. ("Ted") Newman. It executed its first program on 10th May 1950 (making the NPL, they claim, the fourth group in the world to deliver a stored program electronic digital computer), and was publicly demonstrated in December that same year. The NPL then transferred the production rights in Pilot ACE to the English Electric Company [company history], the then owners of the Marconi interests, who tidied it up a bit before putting it to market as the Digital Electronic Universal Computing Engine (DEUCE), eventually delivering more than 30 machines at around £50,000 apiece [source]. A full (84-page!) technical specification of the DEUCE is available online. The NPL's (Full) ACE was ready in 1958, still with delay-line storage, whereupon it was decided that there was no longer any role for government owned mainframe development. Full ACE thus marked the end of the NPL's involvement in mainstream computer development, so it, too, now moves outside the remit of this paper.
ASIDE: Rumour - probably not totally without foundation - has it that the Royal Aircraft Establishment actually bought two DEUCEs, so that they could run their really important programs on both of them, only trusting the results if both machines came up with the same answer (Kershaw, 1994/2003 online)! For more on Pilot ACE, see Newman (1994/2003 online), and for DEUCE, see Fisher (2003 online).
As for the Post Office boffins at Dollis Hill - that is to say, Tommy Flowers, A.W.M. Coombs, and their respective teams [reminder] - their fate is easy to relate because as telecommunications engineers in the first place they simply reverted to their former existences. Nevertheless there were occasional opportunities for off-site adventure, as when Coombs was called in to assist at .....
..... the War Department's own high-tech installation at Malvern, home, as we mentioned briefly in Part 3, of two of Britain's early defence computers. The first of these was a derivative of Pilot ACE called MOSAIC, sponsored by the Ministry of Supply for missile tracking applications. This was a 6000-valve, 2000-transistor, serial processing machine built between 1947 and 1954 by the Post Office's A.W.M. Coombs and team (Lavington, 1980/2003 online), and its Main Memory comprised 1024 40-bit words in mercury delay lines ("nearly a ton" of the stuff, according to Lavington). The second British defence computer was a more technically adventurous parallel processing machine called TREAC. This was designed in 1947 by Albert M. Uttley, went operational in 1953, and used a 512-word Williams-Kilburn system for fast store. The Australian Computer Museum also remind us of the work of Trevor Pearcey (1919-1998), a TRE veteran who in 1946 emigrated to Australia, where he designed the 1949 Australian CSIRAC computer, since claimed as the fifth stored program computer to be completed.
Key Innovation - Serial vs Parallel Processing: To do things "in series" is to do them one after another. To do things "in parallel" is to do them simultaneously. Serial computers therefore execute their data moves and their arithmetical operations one after another, whilst parallel computers do them simultaneously. As originally conceived, parallel processing computers used one wire per bit when moving whole words of data at a time between their registers and their logic gates, whilst serial computers used one wire only, and moved one bit at a time. The parallel machines were accordingly much faster, but much more electrically complicated. EDSAC's Maurice Wilkes explained the differences this way .....
"The most obvious division of computing machines is between those which are serial in operation and those which are parallel [but] the subject is an endless one in that many different variants of each kind of machine are possible. I do not consider that the question of whether it is better to build a serial or a parallel machine will ever be finally answered; each type has its advantages, and the final decision will always be dependent on the designer's personal preferences, and to some extent on his terms of reference. Also, as time goes on, the balance of advantage and disadvantage will swing as improvements are made in this or that component, or as entirely new components become available. Certain fundamental comparisons between serial and parallel machines can, however, be drawn readily enough. [//] The arithmetic unit of a parallel machine tends to be much larger than the arithmetic unit of a serial machine, since a separate flip-flop or equivalent circuit element must be provided to accommodate each digit in the various registers. Moreover, since addition of the digits takes place simultaneously, or nearly so, a separate adder must be provided for each digit. The disparity [.....] increases as the fundamental number length is increased. [.....//] The control, on the other hand, is much simpler in a parallel machine. In the first place the necessity for the generation of a set of digit pulses does not arise, and secondly, the timing is much simpler, since a number can be moved [.....], or an addition performed, by the application of a single pulse to a set of gates ....." (Wilkes, 1956, p66)
What we are looking at, therefore, are alternative ways of organising a single CPU - serial for cheap but slow, or parallel for complicated and expensive but fast; it is just another design decision to be made in the early stages of a machine development project. << AUTHOR'S NOTE: The parallel processing concept used by TREAC and the US parallel machines [Whirlwind and the 1101] is NOT THE SAME as the "parallel distributed processing" talked about by the PDP school of cognitive scientists [see our e-paper on Connectionism]. A parallel wired CPU moves whole data words at a time WITHIN A SINGLE CPU, but PDP as an incarnation of modular cognition REQUIRES MORE THAN ONE CPU. Or to put it another way, the essence of "parallel" in a parallel CPU is that many bits are moved simultaneously under the control of a single Control Unit, whilst its essence in a PDP configuration is that more than one Control Unit is at work. Now when we look at the brain, there is certainly no doubt that data moves simultaneously within even the smallest aggregate of neurons, but as we do not yet know the unit of binary decision making it would be unsafe to classify this as parallel logic. There is also certainly no doubt that different brain areas are simultaneously active during different types of cognitive task, but until you know which modules count as Control Units [and that is a philosophical problem as much as an engineering one] you cannot be sure what these resources are each contributing, or, indeed, to what extent they are in fact co-operating. Moreover, because each component CPU can be serial or parallel within itself, you can theoretically have PDP in a network of simultaneously active serial CPUs, albeit this would be slower than PDP in a network of simultaneously active parallel machines. You could even have PDP in a part-serial, part parallel, processing network. Our full evaluation of the literature pertaining to this issue will not be available until 2004, but there is much to learn in the meantime from the literature on brain injury. See, for example, our timeline notes on the nineteenth century neurologists Dax and Lordat, or the debate between the localisers and the globalists [timeline]. Note especially that both the hierarchical block diagram of Kussmaul (1878) and the transcoding block diagram of Charcot (1883) both derive from this type of evidence, and therefore so, too, do modern derivatives thereof, namely Norman (1990) and Ellis and Young (1988) respectively. >>
2.3 The NRDC Machines
Another cluster of first generation computers evolved out of the tripartite collaboration of Manchester University, the Ferranti Company, and Elliott Brothers, and the coordinating force here was the Government's scientific funding office, the National Research and Development Corporation (NRDC) [not to be confused with the American NDRC mentioned in Section 2.1 above and seen again in Section 4.1]. This organisation was set up in 1949 by the then Trade Minister Harold Wilson (later Prime Minister between 1964 and 1970, and 1974 to 1976), under the direction of John Anthony Hardinge Giffard (1908-2000), Third Earl of Halsbury, in an entirely laudable attempt to stretch the British taxpayer's return on state-funded research. This meant co-ordinating the work of the remaining defence establishments with interested parties throughout industry and academia, and the newly born computing industry was high on the NRDC's list of strategic priorities.
ASIDE: In a blistering appraisal of the whole NRDC experience, Hendry (1989) explains how most of the companies the government helped had gone out of business by the mid-1960s [and the survivors - as we shall be seeing in Part 5 - have either gone since, or are pretty sick]. For a number of reasons, some screamingly obvious but others quite subtle, the NRDC tended to "help to no advantage". It transpired that neither government gifts, nor advantageous development loans (nor, indeed, roving facilitators) were particularly effective instruments of public policy. "At no time," Hendry concludes, "did the aid given or offered to any firm make any perceptible difference to that firm's business strategy" (p168). In the ten years to 1959, the NRDC distributed just short of £1 million in cash, and underwrote loans for a further £1.25 million. This compares with US military patronage for the US computer industry during the same period of "over $1 billion" (p161). For his part, Lord Halsbury described the whole experience as rather like "pushing mules uphill" (Halsbury, 1991, p272), and had little positive to say about the quality of the decision making in the companies he had been trying to assist. The Ferranti company was founded by Sebastian Ziani de Ferranti (1864-1930) in 1889 in London, to manufacture electrical generating equipment. During the First World War they produced munitions, and in the 1920s moved into radio and consumer electricals. The radio experience then stood them in good stead come the 1930s, when the RAF needed re-equipping, and again when the radar contracts started to come through shortly afterwards.
So let us start with the collaboration between Manchester University - whose Mark 1 we have already spoken about in detail [reminder] - and the local Ferranti works at Moston (later West Gorton), Manchester. The strategy here had already been hammered out at an NRDC meeting on 14th December 1949, and involved Ferranti preparing the Manchester Mark I for market as the Ferranti Mark I, thus freeing up the academics to push ahead with further technical developments. Manchester University therefore began the decade by getting on with the Manchester Mark II, alias "MEG". This incorporated a number of relatively minor developments, but when it first operated in May 1954 was nevertheless 20 times faster than the Mark I, thanks primarily to having changed from serial to parallel logic (Lavington, 1978/2003 online). This, too, was eventually marketed by Ferranti, as the "Mercury", and sold fairly successfully in competition with the IBM 704, because despite being a lot slower it was also a lot cheaper (ibid.). The university then moved on to the Manchester Mark III, an experimental transistorised machine, which contributed greatly to the 1956 (six-off) Metropolitan Vickers MV950, which has been claimed as the first operational transistorised computer. [We shall be dealing with Manchester's later machines, the second generation MUSE and Atlas, in Part 5].
Other work, meanwhile, had been going on in London and the Home Counties, the industrial player here being Elliott Brothers .....
HISTORICAL ASIDE: Elliott Brothers were founded by William Elliott in 1795 to supply Royal Navy officers with precision technical instruments, and had then simply grown with the technology to provide many of the Royal Navy's fire control systems during the analog computing era [see Part 2], not to mention supplying some of the parts for Babbage's original Difference Engine (Clarke, 1975). After World War Two, they were keen not to lose this marketing foothold, and in 1946 established a digital computing research unit of their own at Borehamwood, Hertfordshire. They selected John F. Coales (1907-1999), who had just spent the entire war on Elliott's gunnery radars, to run this unit, and their first major project was to develop a state-of-the-art computerised naval anti-aircraft fire control system, codenamed MRS. The broad specification here was for "ship-borne radar [Elliott's specialism, remember] [to] lock on to a target at about eight miles and then track it and supply aiming information to a ship's guns" (Lavington, 2000, pp8-9). A large number of R & D staff were duly taken on, and the computer which resulted was the Elliott 152 [picture], with logic circuits designed by Charles E. Owen (details in Clarke, 1975). The 152 "executed several instruction streams in parallel from an electrostatic read-only memory" (Lavington, 2000, p11). Government contracts being what they are, however, minds soon changed, and MRS was cancelled in 1949, threatening to put over half of Elliott's researchers out of a job (Hendry, 1989) .....
Now it so happened that the collapse of the MRS project coincided roughly with the inaugural meeting of the NRDC, so it made good political sense for the government to try to keep Coales' team together. The NRDC therefore chose Elliott to turn its now idle 152 into a medium-sized computer to complement the larger Manchester machines, issuing the first of a number of contracts in September 1950. The result, in 1953, was the Elliott 401, the prototype of a series of five increasingly sophisticated "400-Series" machines. The 401 was designed by Andrew St. Johnston as a valve-based computer with delay line and magnetic disc memory, and ran its first program in March 1953. It had a front-loading modular design [picture] for ease of maintenance, achieved high availability figures, and actually remained in use until 1965. [For more on the 401, see Lavington (2000). The machine itself is now preserved at the Science Museum, London.]
Key Innovation - Magnetic Disk Storage: One of the 401 innovations was the first magnetic disk. This was designed by Chris Phillips as a more compact and easier to engineer form of magnetic medium than the magnetic drum. However, the functional principles - track-based storage, access latency, etc - were the same as for the drum. << AUTHOR'S NOTE: Comments as for the magnetic drum [see Section 2.1]. >>
The 401 was followed in 1955 by the 402 and 403 production machines, the 403 being historically noteworthy because it fielded an early version of "pipelining" .....
Key Innovation - "Pipelining" and Related Concepts (First Mention): "Pipelining" is a technique "whereby multiple instructions are overlapped in execution" (Patterson and Hennessy, 1996, p125). We have already seen how the buffer concept allowed a temporary memory resource to be used to speed up I/O [see Section 1.3]. The Elliott 403 designers simply applied a similar concept to the instruction stream as well, arranging for the FETCH part of the basic FETCH-and-EXECUTE cycle [reminder] to be a step ahead of the EXECUTEs. This was accomplished by providing an "Instruction Buffer", to contain instruction <n + 1> in FETCH mode, while the Instruction Register [reminder] contained instruction <n> in EXECUTE mode. Needless to say, when instruction <n> had been successfully executed, an important decision needed to be made - had that instruction resulted in any form of program jump? If it had not, then the operands corresponding to the presumed next instruction, that is to say, instruction <n + 1> in the Instruction Buffer, could safely be used, and were duly copied to the Instruction Register in a ready-to-execute state. If, on the other hand, a program jump had been requested, then the presumption had been wrong, and the contents of the Instruction Buffer were unsafe, had to be discarded, and a new FETCH operation done to initiate the program branch. Since the safes were generally more frequent in practice than the unsafes, the technique delivered real savings in speed of processing. [We return to these concepts in Part 5, when dealing with the IBM 7030 and CDC 6600.] << AUTHOR'S NOTE: Neuroscience does not know enough about biological processing architectures to judge whether the brain has devised some sort of op code / operand system for itself, and accordingly to judge whether it does pipelining or not. We suspect that it does, allowed to by its own internal modularity and forced to by its very slow clock speed, but the full evaluation will not be available until 2004, and in the meantime interested readers may obtain some idea of current thinking in this area from the following website, click here. >>
The last of the Elliott 400-Series was the Elliott 405. This sold well both at home and internationally [possibly under the name NCR 304?], thanks to a licensing agreement with the American NCR Corporation. [For more on the early Elliott range, click here or see Gabriel (1997/2003 online), or Hoare (1981/2003 online). Jack Smith (2001/2003 online) provides a handy memoir of how an Elliott 403 contributed to the Australian weapons research programme. For the later "800-Series" machines, see Part 5.]
Key Innovation - Direct Memory Access: The architecture devised for the Elliott 405 allowed its programmers to maximise CPU performance by re-routing much of its non-essential traffic around it rather than through it. Using an electronic equivalent of a city orbital [US = beltway], data en route from peripheral to Main Memory was given a direct access route, without going via the CPU (hence the epithet "direct"). This was implemented by giving each peripheral its own controller, with its own instruction set. This way of doing things immediately became an industry standard, and is nowadays known as "Direct Memory Access" (DMA) [further details]. << AUTHOR'S NOTE: The DMA set-up is not in essence dissimilar to the psychological concept of "implicit learning", that is to say learning which gets stored without ever having been consciously attended to, however, our full evaluation of this possibility will not be available until 2004. >>
The success of the 400-series kept Elliott so busy during the early 1950s that they even had to give some of their good ideas away. We refer here to the 1952 "Nicholas" - another Charles Owen design, with system software by George Felton - a machine which Elliott put together in a hurry to carry out in-house data processing. So successful did this prove, that the NRDC arranged for Ferranti's London works to turn it into a marketable package, and the result - the "Pegasus" - ran its first program in October 1955 and entered commercial service six months later (Lavington, 2000). It proved a highly reliable general purpose serial computer, and despite having to make do with thermionic valves and magnetic drum storage, its designers had devised a "powerful, straightforward" (p16) 49-item three-operand instruction set which made for very concise programming, although in achieving this they had been assisted by one of the NRDC engineers, Christopher Strachey .....
BIOGRAPHICAL ASIDE - CHRISTOPHER STRACHEY: [This from Campbell-Kelly's (1985) technical biography.] Christopher Strachey (1916-1975) was the son of a First World War cryptologist. He graduated in mathematics and physics at Cambridge, and ended up as a wartime radar engineer. Employed in 1939 by the Standard Telephone and Cable Company on the development of thermionic valve technology, he had occasion to run complex calculations through a small differential analyser, and this aroused his interest in computing. Upon his demobilisation in 1945, he obtained a position as a science teacher, firstly at St. Edmund's School, Canterbury, and later at the prestigious Harrow School, where a chance encounter with Norbert Wiener's recently published "Cybernetics" re-awakened his interest in computing. He therefore approached the National Physical Laboratory's Pilot ACE development team a few miles away across Middlesex, and by January 1951 was attempting to write programs in his spare time. Inspired by an article on computerised board games by the NPL's Donald Davies (Davies, 1950), he chose to attack the problem of getting a computing machine to play draughts. He did not quite succeed, but his program was promising enough to earn him a second attempt, this time on the Ferranti Mark I at Manchester University, and over the ensuing months he improved the code so much that the university promised him a job as soon as finances allowed. In the event, however, he was poached by Lord Halsbury's newly created NRDC before this could happen, and joined them in June 1952. He presented his draughts program at the Second ACM Conference in Toronto in September 1952, and the consequent boost to his reputation then earned him a succession of roving placements in both North America and Britain. This included time on site in 1953 evaluating progress of the Elliott 401, in 1954 improving the Ferranti Pegasus, and in 1958 helping to develop the MUSE [more on which in Part 5]. He left the NRDC in 1959 to set up in private consultancy, and for six years toured the country, contributing to the development of third generation programming languages [see Part 6]. Finally, he was headhunted in 1965 by Oxford University to direct their recently established Programming Research Group, and worked there until his death in 1975.
One of the things Strachey did for Pegasus was to help design an enlarged set of CPU registers, in what is now known as a "general register" arrangement (Lavington, 2000). He began by looking at the inefficiencies and inadequacies of earlier instruction sets and found that the chief time stealers - the "red tape" instructions - were control branches, data transfers, and handling large data arrays. He therefore arranged a simultaneous upgrade of the logic circuitry, the register set, and the instruction set. A 39-bit word size was chosen, within which was a six-bit op code field. This would theoretically have allowed up to 64 separate operations, but only 49 were initially used. The operations were presented as eight blocks of eight, by sub-dividing the op code into two three-bit triplets, and transliterating each of these into octal. Thus 010011 was treated as 010 (octal/decimal = 2) followed by 011 (octal/decimal = 3), and shown as 23 (decimal 19).
TECHNICAL ASIDE - OCTAL: Octal is the base eight numbering system, which means that each left order integer represents a power of eight higher than the one to its right. Only the digits 0 to 7 inclusive are used, because the decimal 8 will always "carry one" - thus octal 7 plus octal 1 gives octal 10 (decimal 8), and octal 777 (decimal 511) plus octal 1 gives octal 1000 (decimal 512).
Programmers soon began to remember these numeric codes as "mnemonics" for the operations they controlled [more on which in Section 3.2]. Eight accumulator registers were built into the CPU, numbered X0 to X7, with X0 always set to binary zero. These could be used for any relevant purpose, specifically indexing large data arrays, loop counting, and holding intermediate values during complex calculations. This is how array handling worked .....
Key Innovation - Handling Data Arrays: When allocating Main Memory for data storage, it frequently happens that a data item naturally occurs a number of times. Suppose, for example, that you needed to store <GROSS-WEEKLY-PAY> for every one of a thousand employees. The trick here is to set up a contiguous "data array", that is to say, one thousand consecutive word addresses in Main Memory. You can then select the particular occurrence you are momentarily interested in by "indexing" that array with a bracketed suffix. Thus <GROSS-WEEKLY-PAY(INDEX)> will get you record #8 when INDEX reads 7 (all computer counts start from zero, remember), record #652 when it reads 651, and so on (it will also blue screen on you if you are careless with your indexing and try to access a value less than zero or greater than 999). << AUTHOR'S NOTE: Neuroscience does not know enough about biological processing architectures to judge whether the brain has devised some sort of array indexing system for itself. We suspect that it does, but the full evaluation will not be available until 2004. >>
About 40 Pegasus machines were sold before the product line was discontinued in 1962: the sixth off the production line helped Vickers Armstrong design the Concorde supersonic airliner, and the 25th has been restored to working order and is on display in the Science Museum, London. "On the international stage," writes Lavington (2000, p57), "the huge number of modern commodity computers with a general register-set architecture is testimony to the usefulness of Christopher Strachey's ideas for Pegasus". [For background detail on Pegasus, see Cooper (1990/2003 online) or Lavington (2000). Donald Kershaw was a user of this kit, and provides some recollections online (Kershaw, 2002 online).]
2.4 EDSAC and EDSAC Clones
We have already explained in Part 3 how Cambridge University's EDSAC had come from behind in the years 1946 to 1949 to put together the world first stored program computer [reminder]. The machine continued to be improved during the early 1950s, and brought forth many innovations, not least microprogramming [more on which in Part 5] and assembler programming [see Section 3.2 below]. For Cambridge University's official collection of reminiscences from this period, click here.
The EDSAC development which most concerns us at this juncture is Maurice Wilkes', David Wheeler's, and Stan Gill's work on "subroutines", ca 1947. Smotherman (2002/2003 online) reviews the early history of the subroutine concept, and suggests that Ada, Countess of Lovelace, deserves first mention in that in 1842 she proposed reusable routines for Babbage's Analytical Engine [reminder]. Alan Turing then proposed much the same for the Pilot ACE in 1946, followed by the EDSAC team in 1947. Konrad Zuse also deserves mention in that his Z4 was upgraded ca. 1950 to allow a single subroutine to be executed by mounting it on a second tape reader. Here are the some of the technicalities .....
Key Innovation - Subroutines: Subroutines are "self-contained sequences of orders each of which performs a distinct part of the calculation in hand" (Wilkes, 1956, p86). More importantly, they allow a discrete operation to be developed once and then re-used as often as wanted, and as such they represent probably the greatest single aid to programmer productivity ever devised. In practice, however, there are several ways of benefiting from pre-written code. The first is to copy a trusted pre-existing program in its entirety and then edit it where necessary, the second is to edit in a block of instructions from such a program, and the third is to set a block of instructions up as a program-in-miniature - the "subroutine" - and then formally transfer control to it. All three ways work, but only the subroutine approach creates genuinely reusable code. The transfer of control is known as making a "call", and the eventual transfer back is known as a "return". Wilkes explains that it is frequently an engineering choice whether to perform a given operation via a subroutine, or else to hardwire in some additional logic circuitry. As is often the case in such matters, there is a trade off between complexity (circuitry costs money to design and test) and speed (subroutines add execution time), and it is the relative frequency of the task which often swings the decision one way or the other. The EDSAC, for example, deliberately chose a slow subroutine to perform division on the grounds that divisions were comparatively infrequent. Other "obvious examples of subroutines suitable for including in a library are ones for calculating sines, cosines, exponentials, and similar functions" (ibid.). << AUTHOR'S NOTE: Biological cognition probably relies heavily on subroutine calls. Taking speech production as an example, and adopting the hierarchical view proposed by psycholinguists such as the University of Arizona's Merrill Garrett (see, for example, Garrett, 1990), we find that there are several points at which the primary justification for the existence of a high level process can only be to control the rapid concatenation of a sequence of lower level ones. Thus with the allocation of phonemes to words at the phonological build stage, or with the allocation of muscle contraction profiles to phonemes at the subsequent muscle activation stage. Another (very significant) property of subroutines is covered when we get to the work of John McCarthy in Section 4.7. >>
Key Innovation - "Return Codes": One of the complications introduced by subroutines is that the calling program must be sensitive to the possibility that the called subroutine might fail in some way. Such a failure will often "crash" or "loop" the entire machine, but if it does not, then it falls to the calling program NOT to carry on in what is then probably a corrupt state. The way around this is (a) for the calling program to precede the call with the setting of a "return code" data field to a non-zero value (to be interpreted as "failure"), (b) for the last instruction in the called program to set that same field to zero (to be interpreted as "success"), and (c) for the first instruction in the calling program after the return to test for zero or non-zero. If the subroutine has worked, then the return code will have been set to zero, and the calling program can proceed about its business. If the subroutine has not worked, then the calling program can branch to a pre-written soft abort control sequence. So successful did the return code idea work in practice, that designers soon introduced a whole range of non-zero return codes (alternatively, "error codes" or "diagnostics"), each enabling a different type of failure to be accurately identified. Internet users see these every day, of course - click here for a non-existent Internet link, and note the resulting error code. << AUTHOR'S NOTE: Neuroscience does not know enough about biological processing architectures to judge whether nature has invented for itself some sort of cognitive return code. For our part, we suspect not only that it has, but also that the mechanism works closely with the biological interrupt system [see Section 1.3] to support hierarchical control in general terms. The issue will be reviewed in greater detail in Part 7. >>
EDSAC was also the inspiration for the Lyons Company's LEO I [reminder], and here again the main player was a demobilised wartime boffin, an ex-TRE electronics engineer named John M.M. Pinkerton (1919-1997). Pinkerton took charge of the project in January 1949, and the resulting EDSAC clone ran its first test program on 5th September 1951 (Bird, 1994), and its first business application - a bakeries valuation - on 29th November 1951 (Land, 1999/2003 online). It then parallel ran its first live payroll on 24th December 1953, at which time its designers claimed that what took an experienced clerk eight minutes could be done by the machine in 1.5 seconds (Bird, 2002). After careful cross-checking of manual and computer output, the manual system was withdrawn, and LEO ran unsupported on 12th February 1954 (Caminer and Land, 2003). Apart from the immediate benefit to their Accounts Department, Caminer and Land (2003) describe that date as marking "a world first". A more powerful machine, the LEO II was begun in 1954 and entered service in May 1957. Eleven of these machines were built, but only the last four were equipped with the more expensive ferrite core memory (Ferry, 2003). A third generation machine, the LEO III is mentioned in Part 5.
3 - First Generation Software
Alongside the electronics and engineering developments, the early 1950s also found a growing army of computer programmers struggling to keep their CPUs gainfully employed. The problem here was essentially one of too good a press, for now that potential system sponsors knew how versatile the new digital machinery could be, they were raising requests for computer solutions faster than the necessary programs could be written. Unfortunately, the penny was also beginning to drop as to how much time these programs could take to write; and worse, how difficult they then were to amend; and worse still, how often customers would ask for such amendments. Thus .....
"In June 1949 I realised that a good part of the rest of my life would be spent finding errors in my own programs." (EDSAC's Maurice Wilkes, in Sabbagh, 1999.) [We believe (May 2003) that Wilkes is still alive, so it is clearly good to make lots of errors!]
3.1 Application vs Operating Software
To cope with this early "software backlog", the computing industry was forced to specialise on the programming side of things the way it had specialised during its formative years in the physics and the engineering, when valve men with soldering irons produced the hardware and mathematicians decided how it should be used. It began holding summer schools (for example at Cambridge in 1950) and providing training courses, and at first two, and then three, and then four, areas of specific expertise emerged. The first distinction to emerge was that between "computer hardware" - the physical machine - and "computer software" - the programs which had to be written to make the hardware productive [the word "software" itself was coined by John Tukey at Bell Labs]. The second distinction was then between "application software", programs which specifically addressed this or that real world problem, and "operating software" (sometimes "systems software"), programs which helped interface these applications with the hardware in some standardised and user-friendly way. A computer's "Operating System", or "OS", became the software which was actually "closest to" the hardware, and what was happening in essence was that one program - the OS - ended up (thanks to yet more call-and-return branching) executing another - the application .....
Key Innovation - Application Software: Applications are what you, the user, want to do with your computer. If you wanted to run an appointments system, for example, you would need a carefully designed and interlocking set of "applications", each addressing one aspect of the overall task of managing appointments. The distinct practical purpose of each program is known as its "functionality", and it is functionality which gives an application its business value. The sum total of individual applications is known as the "application system", and the programmers who specialise in coding applications are known as "applications programmers". Applications programmers are assisted in their work by the "programming languages" available on their machines [see next section], and also by a deep understanding of the real world they are trying to computerise. In the event, the best programmers often turned out to be the worst at understanding the real world, and so a separate breed of beings called "systems analysts" was recruited to do that part of the job for them, and hybrids of the two sorts of skills - those who can manage both real world and programming tasks - are called "analyst-programmers".
Key Innovation - Operating Software: The key to understanding operating software lies in grasping a single basic fact, namely that a high proportion of any one program re-uses code already written (often at great cost) elsewhere. It therefore makes sense not just to provide the applications programmers with a library of tried and tested subroutines, but also with a way of accessing these procedures from within a programmer-friendly keying environment. The operating system is the mechanism by which this is all achieved, and its standard functions include interfacing with the operator, managing the library procedures, managing the peripherals, displaying the contents of magnetic media, knowing what filenames have been allocated, knowing where on the available disks it has put each file (and the really clever operating systems are able to store a given file in a number of non-contiguous fragments, so that takes a lot of managing), managing disk I-O, issuing warnings, help, and error messages, keeping track of date and time, and marking same on the filestock so users know how old they are, and last (but by no means least) controlling the execution of end-user programs. Some operating systems are easier to get along with than others. The worst ones are those invented in the days before the general public were allowed to get their hands on computers. The easy to use ones are called "user friendly", and the trend of late has been for them to become more and more user friendly as time goes by. Operating systems are written by boffins and maintained by "systems programmers", who (because they are the only ones who understand what they are doing) get paid a lot more than applications programmers.
Figure 1 shows where all these (very expensive) people fit within a commercial computing installation .....
Figure 1 - Key Roles in Corporate Computing: Here we see the basic structures and personalities you would find in a typical computerised corporation. Two separate organisations are shown. On the left hand side of the diagram is a computer manufacturing organisation (Ferranti, say). On the right hand side of the diagram is a client organisation, a purchaser-user of computing power. We have subdivided the client organisation so as to show its Computing Division (large panel, left, drawn over scale) separate from its end-user divisions (stacked panels, right, drawn under scale), and both subordinate to the Corporate Executive (top centre). Within the Computing Division, we show Operations and Systems Development activities separately, albeit both share a common physical system (pink panel). Within the Systems Development Department, the principal activity is the writing and testing of application software for the future, whilst within the Operations Department it is running application software already formally authorised for live use. This separation of duties is vitally important. Indeed in most large organisations, the operations side of things will be located in a purpose-built "Computer Centre", whilst the development side of things remains a headquarters function.
Note the many different specialists (red captions) dotted around the various departments. In the manufacturing organisation, we find the boffins who build the hardware and develop the operating software. Hardware boffins improve circuitry and invent memory mechanisms, etc. Software boffins improve instruction sets, implement ever better pipelining, etc. "Maintenance engineers" provide an agreed level of after-sales service (green arrow) in return for a consideration (pink arrow). In the Operations Department, there are "operators" to tend the machine itself and "systems programmers" to keep it in tune. In the Systems Development Department, it is usual to find a number of "systems maintenance teams", each consisting of the analysts and programmers familiar with a particular application system. There will usually also be an overarching "Systems Unit" [names vary] carrying out more strategic investigation and design.
Given the many specialisms, the need also arises for help desks and liaison officers (green captions). These include "operations support" people to liaise with the systems maintenance people in order (a) to schedule work on a day-to-day basis, and (b) to report and prioritise system errors. The maintenance team also have to liaise with the Systems Unit over the commissioning of broader system enhancements. [For the sort of systems disasters which result when people do not communicate effectively, see our IT disasters database.]
Copyright © 2003, Derek J. Smith.
3.2 Interpreters, Assemblers, Loaders and Linkers, and Compilers
As far as software productivity is concerned, there has only ever been one problem and there has only ever been one solution to it. The problem is the unfriendliness of machine code, and the solution has been to distance programmers from its complexity as far as possible. Put simply, writing efficient machine code from scratch is onerous in the extreme, and trying to work out how you did what you did at a later date is nigh on impossible. As a result, generation zero development teams devoted a lot of effort to simplifying the process, and their answer was to write utility programs to stand between the programmers and the machine code. In their simplest form, such programs offered to translate from a relatively human-friendly vocabulary into the decidedly human-unfriendly vocabulary of the machine's instruction set. These utilities were the first "programming languages", and they brought with them some important new terminology .....
Key Innovation - Programming Languages, "Source Code", and "Object Code": A programming language is a computer program capable of recognising selected human-meaningful input commands, and generating as output the logically corresponding machine code instructions. The input commands are known as the "source code", and source code - of essence - is far easier to read for its logical intention than is machine code, both by the original programmer during development, and then by a maintenance programmer at a later date. The output is known as the "object code", and if the developers have done their job properly this will have been optimised for technical efficiency, as well as being standardised and bug-free. Special dummy source code instructions allow explanatory annotation to be added to the source program as aides memoires, but withheld from the machine code.
Ideally, you want to be able to tell the machine something you understand, like <MULTIPLY TAXABLE-PAY-THIS-WEEK BY TAX-RATE GIVING TAX-THIS-WEEK> and have it turn that instruction into something the machine's arithmetic unit will understand, like <00101111001011011000101101001110110111 etc.>. Unfortunately, the source instructions had to be kept short to save memory, and so were usually abbreviated to two- or three-character "mnemonics" such as M for MULTIPLY .....
Key Innovation - Mnemonics: In everyday English, a mnemonic is a memory aid of some sort (such as "ROY G BIV" for the colours of the rainbow, etc.). Because there is no point in having source code at all unless it is distinctly memorable, mnemonic systems usually consist of a set of handy verbal tags .....
Example: We have already seen how Pegasus programmers took to identifying their binary op codes by their octal couplets. Thus Pegasus op code mnemonic "four-one" [see Section 2.3] instructs the machine to add a particular given number to a particular accumulator. Not only is "four-one" easier to remember than the logically identical binary code "one-zero-zero-zero-zero-one", but it is also easier to convert back to binary than the decimal equivalent. Thus octal 4 gives you binary 100 and octal 1 gives you binary 001, so octal four-one gives you 100001.
The simplest form of translation utility works on a one-to-one basis, and would nowadays be termed an "Interpreter". However, this level of assistance was not a lot of use to generation zero machine code experts, and so was little used. Instead, the boffins concentrated on a more powerful class of utilities known as "assembly languages", or "assemblers" for short .....
Key Innovation - Assembly, Loading, and Linking: "An assembler is a translator that translates source instructions (in symbolic language) into target instructions (in machine language), on a one to one basis" (Salomon, 1993/2003 online). Jack Gilmore wrote such a program for the Whirlwind, using mnemonic codes, and allowing decimal input. The Assembly process then substituted one or more machine instructions for the mnemonic and turned the decimals into binary. This was followed by SHORT-CODE, developed by John Mauchley, and A-0 developed by Grace Hopper. Wilkes (1956) claims "an assembly subroutine" for the EDSAC in 1950. << AUTHOR'S NOTE: Neuroscience does not know enough about biological processing architectures to judge whether the brain processes some sort of machine code in some locations and some sort of assembler mnemonics in others. What can be said with certainty is that there are repeated major shifts in data coding as information flows end-to-end through the cognitive system. The most thorough illustration of this we have come across is Ellis's (1982) transcoding diagram, in which each separate information flow is carefully tagged with the nature of the coding method used within it. >>
One point about simple assemblers was that the object code they produced was actually incomplete. Specifically, it did not include the object code for library-held subroutines. An additional processing stage was therefore needed, called "loading and linking". This was/is an additional utility program which located the various object code modules and - after some careful address cross-calculations - concatenated them into a single executable object code program. When done properly, all subroutine calls will branch and return to valid addresses.
Further major weaknesses gradually became apparent. Firstly, because machine instructions are individually very simple actions, there often needs to be a one-to-many relationship between a mnemonic and its equivalent object code instructions. Secondly, the machine code from one source instruction often needs interleaving with the machine code from adjacent source instructions. For example, mnemonic-by-mnemonic execution may be ideal for interacting with a human operator, but it is totally unable to cope with intentional program loops. And thirdly, assemblers were hardware-specific. You could not write source code once, and have it assembled onto different computers. This is because the one-to-one relationship between source and object instruction tied the translation program inflexibly to the specific host's specific instruction set. This led to the development of more advanced, machine-independent, programming utilities called "compilers" .....
Key Innovation - Compilers: A "compiler" is a package of utility programs, fronted by a powerful conversion routine, supported by all the necessary loaders and linkers, and complete with copious error detection and optimisation routines. As such, they allow programmers to write neatly structured source code, relying heavily on subroutines, and yet leave all the technical details to the compilation process. Where both the amount and the complexity of the object code per source instruction is high, the language is known as a "high level language". In 1951, R.A. ("Tony") Brooker was put in charge of Manchester Mark 1 software and by 1954 had produced the first recognisable compiler, namely the "Mark 1 Autocode". Compilers can conveniently be divided into "scientific", where there the source approximates to formal mathematical representation, or "business-oriented", where the source approximates to everyday English. The standard example of the former is FORTRAN, and of the latter, COBOL. The way things timed out, many first and second generation business systems were written at great cost in Assembler, only to find themselves "up for a re-write" ten years later when COBOL came along. In investment terms, this is not good commercial practice, and British industry (at least) was so eager to squeeze every drop of return out of its capital, that commercial systems written in Assemblers during the early 1960s were still around (as "legacy systems") nearly 30 years later. The present author did a stint on a large Assembler maintenance project as late as 1988.
In fact, the word "compiler" should be treated with some caution, because its meaning drifted somewhat in the early years. Backus (1984) argues that many early "compilers" are better regarded as assemblers, for their most significant aspect is their mnemonic set, rather than any program structuring functionality, while Hopper (1959) emphasises their role in managing subroutines. We include a full (albeit short) Assembly program, and elements of a compiled program, in Part 6, where we shall also be looking at how programming languages give programmers hands-on control over their machine's short term memory resources.
4 - First Generation Artificial Intelligence
Thus far, we have limited our consideration of software to military applications such as codebreaking and ballistics, and to civilian applications such as payroll. However, a number of more adventurous applications had been bubbling under during the late 1940s, and finally hit the headlines in the early 1950s. These were (1) machine translation, (2) semantic networks, (3) mechanised game playing, (4) man-machine conversation, (5) "neurodal" networks, (6) robotics, (7) machine problem solving, and (8) artificial consciousness. Taken together, these make up a domain of enquiry we know today as "artificial intelligence" (AI), and in the remainder of this section we look at each of these AI traditions in turn .....
4.1 Machine Translation (MT)
Hutchins (1997/2003 online) dates the idea that computers might be used to automate the process of natural language translation to two conversations which took place 20th June 1946 and 6th March 1947 between the aforementioned Andrew D. Booth [see Section 2.1] and Warren Weaver (1894-1978), of the Rockefeller Foundation, and a letter dated 4th March 1947 from Weaver to the cyberneticist Norbert Wiener .....
BIOGRAPHICAL ASIDE - WARREN WEAVER: Weaver was a University of Wisconsin mathematics professor, turned career philanthropist, turned wartime government advisor, turned "wartime 'Mr Fix-It'". He joined the Rockefeller Foundation in 1932 as Director of the Division of Natural Sciences, did much of the facilitation during the early construction of the giant 200-inch Hale Telescope at the Mount Palomar Observatory [this project actually lasted 1928 to 1948, with a break during World War Two - click here for the official history], and by the start of the war had established enough of a reputation as a loyal scientist to be appointed Director of the Control Systems Division of the newly formed US National Defence Research Committee (NDRC) [not to be confused with the British NRDC mentioned in Section 2.2]. Much like Princeton's John von Neumann [reminder], his role within the NDRC made him a roving advisor-facilitator with responsibility for just about every mathematical aspect of the US war effort [mostly classified, of course], not least the transition from analog to digital computation (see Mindell, 2002). It also involved him in at least one top secret mission across the Atlantic. After the war, Weaver became interested in the concept of MT, and it was on one of his convenient Rockefeller Foundation visits that he met Booth, who, as we are about to see, shared his belief that MT was theoretically possible. He also went official second author on one of the major classics of modern information theory, namely Shannon and Weaver (1949).
Following Weaver's inspired and energetic string-pulling, one of the first computers off the blocks was Booth's own ARC [see Section 2.1], and the formative event here seems to have been a chance professional exchange of minds between Booth and the Cambridge geneticist Richard H. Richens. Richens dates their first meeting to 11th November 1947, and explains that it came about thanks to their shared interest in punched card methods of processing research data. Richens later explained that .....
"..... the idea of using punched cards for automatic translation arose as a spin-off, fuelled by my realisation as editor of [the journal Plant Breeding Abstracts] that linguists conversant with the grammar of a foreign language and ignorant of the subject matter provided much worse translations that scientists conversant with the subject matter but hazy about the grammar." (Richens, 1984; private correspondence cited in Hutchins 2000)
The unlikely pairing of the x-ray crystallographer with the drum storage system and the geneticist with punched card experience then set to work trying to mechanise translation using a stored dictionary. In a jointly authored paper published a few years afterwards, they explained what progress they had made so far, and made a number of predictions as to how they saw the new science developing .....
"The most obvious way of using a computing machine for translation is to code each letter of a proposed dictionary word in 5-binary-digit form and to store the translation in that memory location having the same digital value as the aggregate value of the dictionary word. Translation would then consist in coding the message word (a teletyper does this automatically) and then extracting the translation in the same or next lower (digitally valued) storage position, the first giving exact translation and the second the stem translation with a remainder. [.....] "
TECHNICAL ASIDE: This is our old friend the Baudot telegraph code again [reminder; details], in which the letters of the alphabet are given binary codes in the range 00000 (decimal 0) to 11111 (decimal 31). The letters C-O-W, for example, would be given the following codes .....
C = 01110 (decimal 14)
O = 11000 (decimal 24)
W = 10011 (decimal 19)
What the authors were suggesting was that they should concatenate the three separate five-bit binary codes to give a 15-bit binary code (in this case, 011101100010011 or decimal 15,123). This code would then be automatically generated by typing the word "COW" at any standard teletype keyboard, and all that the computer needed to do was to store the translation of that word at storage address 15,123 in the computer's memory. In practice, however, it was not that easy .....
"Despite its simplicity, this scheme is quite impracticable with any existing storage device because the naive process of placing each dictionary word in a storage position equivalent to its own binary numerical version is grossly redundant."
TECHNICAL ASIDE: The San Jose Scrabble Club lists 973 three-letter words in the English language, whilst there are 32,768 different 15-bit addresses between 000000000000000 and 111111111111111 inclusive. Storing one in the other would therefore give you an unused memory factor of approximately 97%! This would be an unforgivable waste of resource even in the modern world of cheap memory, but in the 1940s would have been a total non-starter.
"[..... so] to overcome this difficulty, a slightly more sophisticated approach has to be adopted. Dictionary words are stored in alphabetical order and in consecutive memory locations. Possibly these might be broken up into initial letter groups such that each group starts in a storage location indicated by the binary coded form of its initial [.....] The coded word to be translated is sent to an arithmetic register and is there compared, by subtraction, with the dictionary words. In this way, by means of the normal conditional control transfer order [ie. you only do it when the aforementioned subtraction gives zero, which will only happen when the two items are identical - Ed.], the dictionary stem corresponding to the word can be obtained and its translation printed. Next, the remainder of the message word, left after subtraction of the stem, is shifted to occupy the extreme left-hand register position, and the comparison process is repeated, this time with the words of the ending dictionary." (Richens and Booth, 1955, pp45-46)
TECHNICAL ASIDE: The authors are here addressing the problem of translating morphologically complex words such as "COWS", where we have a singular root noun, referred to here as the "stem", to which is suffixed the pluralising morpheme "-S". This is the problem of "inflection" .....
BASIC LINGUISTICS - INFLECTION: "Inflection is variation in the form of a word, typically by means of an affix, that expresses a grammatical contrast which is obligatory for the stem's word class in some given grammatical context" [source]. It is the mechanism by which many of the world's natural languages cope with the number (single or plural), gender (masculine, feminine, or neuter), and grammatical role of adjectives and nouns, and the number (single or plural), tense (past, present, or future), and person (first, second, or third) of verbs. Thus the suffix <s> will pluralise many English nouns (eg. "cats") or put many verbs into the third person singular (eg. "he eats").
Richens' and Booth's proposal was that such words should be matched in two stages, from two functionally separate dictionaries, namely a stem dictionary and an ending dictionary. Thus COWS would divide as <COW + S>, translate into German as <KUH + E> and be reconstituted (on this occasion correctly) as KUHE. In practice, however, this was not that easy either .....
Other universities quick to get involved were MIT and the Harvard Computation Laboratory. The first MT conference was organised by Yehoshua Bar-Hillel of the University of Jerusalem (fortuitously on a sabbatical at MIT), and took place 17-20th June 1952. Among the delegates were Whirlwind's Jay Forrester, Harry Huskey, Andrew Booth, and the linguist in charge of the simultaneous translation facility at the 1945-1946 Nuremberg Trials, Georgetown University's Leon Dostert. This conference was followed two years later by the "Georgetown-IBM" demonstration of automated translation. This was organised by Dostert, took place on 7th January 1954 at IBM Headquarters, New York, and used software developed by Paul L. Garvin with technical input from IBM's Peter Sheridan and management backing from Cuthbert Hurd [whom we have already met in Section 1, and of whom even more in Part 5]. The demonstration processed a 250-word Russian-English vocabulary, and made front page news the next morning [see the New York Times article; see the IBM Press Release].
BIOGRAPHICAL ASIDE - ANTHONY G. OETTINGER (1929-): Anthony G. Oettinger (1929-) studied for his 1954 doctorate at the Harvard Computation Laboratory on the requirements of an automated Russian-English dictionary (Oettinger, 1955). Given the abysmal state of US-Russian relations in the McCarthy era, there is little doubt that it was the promise of being able to translate intelligence material out of Russian more effectively which made MT a defence issue, and which therefore secured it much of its funding [see, for example, Bar-Hillel (1960)]. We meet Oettinger again in Section 4.5.
Unfortunately, the Richens-Booth approach only works for the relatively simple words in a language, and starts to descend into a morass of ever more complicated logic branches and special rules once it encounters "irregular" words and complex grammatical structures. Erwin Reifler, of the University of Washington, Seattle, was one of the first to try to design his way around this particular problem. In Reifler (1955), he proposed carefully allocating separate dictionaries to separate "operational form classes", that is to say, to each of the standard grammatical categories. Form classes with very large memberships include nouns (cup, table, man, etc.), adjectives (big, happy, etc.), and verbs (eat, walk, fly, etc.), and form classes with very small memberships include determiners (a, the, some, etc.), pronouns (I, she, it, etc.), auxiliary verbs (to be/have/get, etc.), prepositions (in, to, by, etc.), and conjunctions (and, but, because, etc.). Clearly recognising the modular nature of biological language processing, he then proposed separate drum memories - up to 15 of them! Thus .....
"If magnetic drums are used for memory storage, we may distinguish between large-drum memories and small-drum memories. A number of each will be needed. [.....] Operational form classes whose membership is large will require large drum memories. [//] Large-Drum System: [.....] For the large operational form classes, a total of four large-drum units will be needed, containing the following memories: (a) The capital memory for substantives and substantive constituents, (b) The attributive adjective memory, including the cardinal numerals, except the few included in the memory of determinative 'pro-adjectives', (c) The principal verb memory, excluding the few verbs included in the memory of verbs always or sometimes requiring a predicate complement, [and] (d) The predicate adjective memory, including all adverbs of adjectival and numerical origin. Small-Drum System: In the small-drum system, an individual memory will be assigned to each of the operational form classes with a comparatively small membership. The ten small operational form classes discussed above may be entered into ten small drums, but it may be desirable with German to subdivide the class of propositions into six sub form-classes, according to whether they require their complements in the following cases: genitive, dative, accusative, genitive and dative, genitive and accusative, or dative and accusative. This would increase the number of small drums to 15. With the smaller number of drums, a greater complexity of equipment and circuitry would be necessary to handle the prepositions. It is, thus, a matter of engineering economics whether 10 or 15 small-drum memories are used." (Reifler, 1955, pp159-160; italics original)
TECHNICAL ASIDE: In retrospect, Reifler's suggestion is nowhere near as silly as it might sound; it was just a couple of years ahead of its time. With the development of database technology over the ensuing decades, it gradually became possible to store fundamentally different datasets such as Reifler was proposing on a single physical resource. In other words, any Database Management System (DBMS) worth its salt could easily make a single magnetic disk look like 15 disks. Figure 2(b) shows the sort of data analysis which would have needed to be done in order to allow this to happen. We shall be saying a lot more about how DBMS do what they do in Part 5 (Section 3.1) and Part 6, and about why they are so important in Part 7.
Coping with noun plurals and (in certain languages) case variations, as well as with verb endings, was, however, just the beginning, for it soon became apparent how complicated were the real-life languages programmers were trying to translate. The problems posed by irregular verbs, or - worse - by the use of ironical or figurative language, or by the intricacies of pronoun resolution, or by the effects of context, or by "polysemy" (that is to say, the multiple meanings of words), have still not been overcome half a century later, and the 1950s efforts were often laughable. Here, from Masterman (1967, p202), is an example of just how hopeless simple word-for-word translation can be [compare (2) and (3) for intelligibility, and note that Latin is a heavily inflected language] .....
(1) ORIGINAL LATIN: Possibile est, at non expertum, omnes species eiusdem generis ab eadem specie ortum traxisse.
(2) WORD-FOR-WORD MACHINE TRANSLATION: Possible is however not prove/lacking all species/appearance same genus/son-in-law from same species/appearance arise draw.
NB: Note the underlined alternatives here. These are examples of the multiple meanings mentioned above, where a single L1 word has two or more valid L2 translations, and where the final choice relies on contextual cues.
(3) FINAL ENGLISH: It is possible, though not proved, that all species of the same genus have been derived from the same species.
Now Margaret Masterman (1910-1986) knew Richens thanks to their joint involvement with Cambridge University's Language Research Unit .....
"Research on MT at Cambridge began [.....] with the informal meetings of the Cambridge Language Research Group in 1954 [.....] In 1956 a grant was received from the National Science Foundation to pursue MT research, and the Cambridge Language Research Unit (CLRU) was formed, a research organisation independent of the University of Cambridge, with Margaret Masterman [wife of the philosopher R.B. Braithwaite] as director. In later years research grants were also received from the US Air Force Office of Scientific Research, the Office of Scientific and Technical Information (London), the Canadian National Research Council, and the Office of Naval Research (Washington, DC) [.....] By 1967, active research on MT at CLRU had declined; many of its members had moved elsewhere, mainly within the Artificial Intelligence field and often with a continued interest in MT [.....] There were four main themes in its MT research: the thesaurus approach, the concept of an interlingua, 'pidgin' translation, and lattice theory. The focus was primarily on semantic problems of MT, and syntactic questions were treated secondarily. Although procedures were intended to be suitable for computers, most of the proposals were tested only by manual or punched card simulations because access to a computer proved to be difficult for many years." Hutchins (1986/2003 online)
All four CLRU avenues of attack are directly relevant to our argument, including their investigations of pidgin. Indeed, one of the group's most fruitful observations was that there were some rather telling similarities between the halting and generally awkward word-for-word machine translations [sentence (2) above] and real life "pidgins" .....
BASIC LINGUISTICS - PIDGINS AND CREOLES: A "pidgin" is "a reduced language resulting from contact between groups with no common language", and a "creole" is "a pidgin or jargon that has become the native language of an entire speech community, often as a result of slavery or other population displacements" [source]. Thus Pidgin English is any rudimentary form of English, especially if spoken by aboriginal peoples for trading purposes, and the pidgin spoken in Papua New Guinea contains many quaint phonetic corruptions, such as lusim (= abandon), bigpela man (= adult), klostu (= almost), and narapela (= same again) [to see the full dictionary, click here].
The academic point was that since pidgins somehow contained all that was necessary to render a complex message meaningful across a language barrier, it might make sense to design some form of "mechanical pidgin" into your machine translation system. And if you did this carefully, you would then be able to record most of L1's inflections in coded form, tucked in as "pidgin markers" alongside the word stems, giving you what is known as a "continuous form" translation. With a continuous form version safely under your belt, you would have a much better chance of producing a contextually, as well as grammatically, sound L2 draft. According to Masterman (1967), the term "mechanical pidgin" came from Richens. Here, still from Masterman (1967, p202), is how sentence (2) above might look in continuous form, complete with inset markers .....
(4) CONTINUOUS FORM MACHINE TRANSLATION WITH INFLECTION MARKERS: Possible z is however not prove/lacking z all m species/appearance same g genus/son-in-law z from same species/appearance o arise z draw p.
Masterman selected her inflection markers from the following list from Richens and Booth (1955, p35) .....
a = accusative case; d = dative case; f = future tense; g = genitive case; i = indicative mood; l = locative case; m = multiple; n = nominative case; o = oblique; p = past tense; q = passive form; r = partitive; s = subjunctive mood; u = untranslatable; v = vacuous; z = unspecific
The technical name for an intermediate translation form of this sort is an "interlingua" .....
Key Innovation - Interlinguas: An "interlingua" is a language form intentionally divorced from both L1 and L2 in a translation, but from which both L1 and L2 (and, ideally, all other languages) can be generated. Strictly speaking, this definition incorporates international languages such as Esperanto, but we restrict ourselves here to any sequence of heavily inflected protosemantic codes produced by parsing. << AUTHOR'S NOTE: Neuroscience does not know enough about biological processing architectures to judge what is the basic central cognitive code, although there is a broad consensus that it is non-verbal. Nor, indeed, is the code open to introspection because our output sentences (written or spoken) come to us out of the inner darkness more or less ready to deliver [this point from Velmans (1991), although we also strongly recommend Jeffrey Gray on this subject]. We shall be returning to this point in Part 7. >>
Masterman is also important because she suggested a major improvement to the Richens-Booth continuous form interlingua. In Masterman (1957), she simply took her copy of Roget's Thesaurus from the shelf, and used its headings as a ready-made semantic interlingua, complete with associated numeric codes. We have ourselves discussed this approach in our paper, "The magical name Miller, plus or minus the umlaut" (Smith, 1997b), where we gave some of the credit for the idea to the 19th century professor of linguistics, Max Müller .....
"[Müller] was one of the first to recognize that word meanings are constantly evolving, and spent a lifetime studying the etymologies of words in a variety of languages with a view to tracking down their derivations (Müller, 1887). Working in this way, he reduced all languages to much smaller pools of word roots and fundamental concepts. To give but one example, the Greek words for lift, compare, tribute, spread, delay, bury, madness, endure, recover, reproach, help, and excel (to name but a few) all derive in various ways from f e r , 'to bear'. Müller's central argument goes as follows: 'Give us about 800 roots, and we can explain the largest dictionary; give us about 121 concepts, and we can account for the 800 roots. Even these 121 concepts might be reduced to a much smaller number, if we cared to do so' (op. cit., p551.). But not all authorities agree on where to draw the line. Whereas Müller chose a pool of 121 'original concepts', Saban (1993) chooses 31 basic 'semantic fields', and in the original 1852 edition of his renowned thesaurus Peter Roget went for 1,000 'related ideas'. So why the disagreement? Well insofar as it was responsible for importing the concept of the bit from engineering into psychology, the answer lies in George Miller's paper [ie. the Miller (1956) "Magical Number Seven" paper - Ed.]. This is because the logarithmic nature of the bit renders 31, 121, and 1,000 not so different after all: to be precise, they might just represent three points on the powers of two dimension. That is to say, we need five bits of information to specify a single one of Saban's 31 semantic fields, seven bits of information to specify a single one of Müller's 121 original concepts, and ten bits of information to specify a single one of Roget's 1000 related ideas [raising two to the powers five, seven, and ten gives 32, 128, and 1024 respectively - Ed.]. Could we, therefore, be looking at a semantic memory which somehow operates according to a binary chop principle?" (Smith, 1997b, pp204-205)
NB (14th July 2003): Although ignorant of her work at the time, we owe it to Masterman to point out that she was there 40 years before we were. This from Masterman (1957, p41): "..... in any kind of writing which builds up into an argument, thesaurus-heads tend to be introduced in powers of two, each topic being introduced concurrently with that to which it primarily contrasts". It is simply a very efficient way to think about things - as Hamlet found out with all his to being and not to being. Nor, it turns out, was Müller really the first, either, for he was simply following suggestions from the 17th century "universal languages" of Cave Beck and Athanasius Kircher. We might also add that Richens' 80 or so "naked ideas" and Parker-Rhodes' (1978) 150 or so "primitive items" are not that far from the line.
So to put things back into perspective, what the early MT researchers had done was to re-open a very old (and more than normally wriggly) can of philosophical worms, and in retrospect it is not really surprising that Booth and Richens did not keep their MT lead (such as it was) for long. The ARC was little more than a hobby kit computer, and the big machines soon overtook it, and - the pessimists apart - there was still a common presumption that things would be better when newer, faster, machines with more powerful programming languages came along. Nevertheless, what the Birkbeck (Booth), MIT (Yngve, Bar-Hillel), Seattle (Reifler), and Cambridge (Richens) groups had been doing was laying the foundations for the modern science of "computational linguistics".
4.2 Semantic Networks
Sadly, MT's problems did not go away, even with interlinguas and inflection markers and more powerful computers. Irony, ellipsis, context, and idiom, still had to be overcome. Reifler himself gives the example of trying to translate "he is an ass" into Chinese, assuring us that you cannot simply substitute the Chinese word for ass because Chinese folklore does not regard the ass as inherently stupid. There are other traditionally difficult and highly illustrative problem phrases, perhaps the most famous of which is "out of sight, out of mind", which in the absence of some extremely nifty programming invites translation as "invisible lunatic" - perfectly logical in its own way, but also perfectly stupid. Another commonly cited howler comes from the English-to-Russian MT software which reputedly rendered "The spirit is willing, but the flesh is weak" as "The steak's off, but the vodka's OK". Warren Weaver himself summarised all the criticisms in a privately (but widely) circulated memorandum on the subject (Weaver, 1949), and it was this which drew broader academic attention to the underlying issues. Thus it was that linguists and philosophers such as Bar-Hillel grew deeply pessimistic about MT's long term prospects [wait and see what he says about John McCarthy in Section 4.7!]. You could not do MT, in Bar-Hillel's view, until you had acquired a far better understanding of linguistics, and those who already understood linguistics were already wagering that you would not be able to do it then either (and, as we shall be seeing in Part 5, nobody half a century later has come close to relieving them of their money).
Yet the CLRU still had the fourth of its four main themes - lattice theory - in reserve. The main worker here was a colleague of Masterman, Arthur Frederick Parker-Rhodes, and his basic proposal was that the mathematics of networks (lattice theory arose a branch of Boolean algebra) was also the mathematics of language [example] .....
BIOGRAPHICAL ASIDE - ARTHUR FREDERICK PARKER-RHODES: text to follow
Drawing on Parker-Rhodes' ideas, Masterman re-analysed the linear array of meanings laid out in Roget's Thesaurus as the nodes of a large knowledge network.
Of course, the idea that the mind might organise its available knowledge as a vast interlocking network of concepts is not new. One of today's leading experts on the technical structuring of knowledge, ex-IBM man and knowledge consultant John F. Sowa, for example, dates the idea to the Greek philosopher Aristotle. Aristotle's early thoughts on the hierarchical nature of definition survived in the network diagram drawn up by Porphyry in the third century CE, a device now known as the "Tree of Porphyry" [picture]. The idea was then reborn in the "Associationist" tradition of Western philosophy (led by David Hartley in the mid-18th century), and was also used by Jung as the basis of his "association of ideas" projective test [the famous "tell me the first word that comes into your head" test]. We shall be returning to this issue in Part 5, but mention it here because Richens - geneticist, self-taught database designer, and now linguistic philosopher - has recently been identified as the man who coined the term "semantic net" [source], and semantic networks are a very important topic .....
Key Innovation - Semantic Networks: A semantic network is "a graphic notation for representing knowledge in patterns of interconnected nodes and arcs [..... and] what is common to all semantic networks is a declarative graphic representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge" Sowa (2003 online). As we shall be seeing in Part 6, semantic networks exist within the computer world in the form of the network-structured (aka "codasyl") database. << AUTHOR'S NOTE: It is worth noting at this juncture (a) that the concept of the semantic network has made some impact on the psycholinguistic side of cognitive science, but (b) that research has been fragmented between linguistics, psycholinguistics, and mainstream cognitive theory, and (c) that one major perspective has been strangely overlooked, namely the portfolio of diagramming and analysis tools used to build data networks for use in computer databases. The most ambitious attempt so far had been by Cycorp's Douglas B. Lenat, whose Cyc project has been under development since 1984, but more on that in Part 5. Our own preference is for the "Bachman Diagram" [specimen] approach, but there is nevertheless considerable explanatory value in both Anderson's and Lenat's approach. For his part, Sowa explicitly equates his Definitional Network with Tulving's (1972) "semantic" type of LTM, and his Associational Network with the "episodic" type. However, to the best of our knowledge, there has not been a single attempt to use codasyl-type database in AI simulations. The full evaluation will not be available until 2004. >>
The main psychological input to the semantic network debate has been from Anderson (1983) and his "propositional network", as reproduced at Figure 2(a). However, very similar diagrams are heavily used within the IT industry to formalise the layout of large computer databases, and a specimen of one of these is reproduced at Figure 2(b). Curiously enough, attempts to import database concepts into psychological explanation are few and far between.
Figure 2 - Propositional Networks, (a) Biological and (b) Computer: In diagram (a) we see how a simple network diagram can be used to analyse something as philosophically complex as a unit of truth. As is common with this sort of diagram, the individual concepts are shown as concept nodes in the network, whilst the real knowledge exists in the form of propositional nodes (the oval ones). The propositional nodes are vital because they "encode" facts (ie. propositional truths) about the concepts they link. The oval symbol on this occasion serves to link the otherwise separate conceptual nodes <LAWYER> and <PARK>, and in so doing it identifies a number of essential linguistic properties as shown in italics on the line linkages. Thus <LAWYER> is the subject of the proposition, <PARK> is the location, and <IN> specifies the relation between the two. In diagram (b), we see similar principles at work in a computer data model. This particular modelling convention uses four types of symbol. The rectangles and diamonds denote different types of concept, or entity, nodes, the ovals denote the attributes, or properties, of those entities, and the lines denote the relationships between entities and other entities or between entities and attributes. Again, the essence of the relationships between entities is marked on the line linkages. Properly implemented by a Database Management System, every occurrence of every entity type, attribute, and relationship can be safely stored in rapid access computer filestore, and retrieved in short order. This is precisely the sort of technology which would have allowed Reifler [see Section 4.1] to squeeze his 15 drum dictionaries onto a single storage resource.
Copyright © 2003, Derek J. Smith. Previously in Smith (1996). Redrawn from black-and-white originals (a) in Anderson (1983, p26), and (b) in Oxborrow (1989, p36).
(1) Definitional Networks: These are networks which define a concept in terms of its similarity to other concepts. Sowa calls this a "subtype", or an "is-a" relation, and places its value in the fact that the properties of the supertype are automatically "inherited" by the subtype. [Our Example: <ROBIN> is a <BIRD, SMALL, GARDEN, NON-MIGRATORY>.]
(2) Assertional Networks: These are networks which assert propositional truths. [Our Example: <DAVID> <IS EATING> <CAKE> or <WHEN HE WAS 7> <DAVID> <VISITED> <SEATTLE>.]
(3) Implicational Networks: These are networks which "use implication as the primary relation for connecting nodes". [Our Example: <THE ROCK IS AT THE TOP OF A SLOPE> which implies ,..... AND SO IS LIKELY TO ROLL IF PUSHED>.]
(4) Executable Networks: These are networks which have mechanisms for actively locating content according to a prescribed search mask. [Our Example: A network which could respond appropriately to an instruction in the form LIST ALL CONTENT ASSOCIATED WITH THE TARGET <SLOPE>.]
(5) Learning Networks: These are networks which can extend themselves, either by adding new nodes or new arcs between existing nodes, or by modifying the association strengths, or "weights", of arcs. [Our Example: A face recognition network which can learn to recognise new faces (new nodes), or attach new attributes to existing ones (new arcs).]
(6) Hybrid Networks: These are networks which combine two or more of the preceding types.
4.3 Mechanised Game Playing
In 1769, the inventor-showman Wolfgang von Kempelen built a chess-playing automaton which he demonstrated at theatres and fairs where it proved capable of beating most contestants. Unfortunately, von Kempelen was pulling a fast one, and the functional CPU turned out to be the world's first minicomputer - a human dwarf secreted within the device [picture and fuller story]. The idea remained appealing, however, and one of the pastimes of the first computer programmers was to investigate whether computers could be programmed to play games such as draughts [US = checkers], noughts and crosses [US = tic-tac-toe], or chess. The pioneer thinker in this area seems to have been Germany's Konrad Zuse [reminder], inventor of Plankalkül, the world's first higher-level programming language. In the period 1942-1945, Zuse sketched out designs for a chess playing program for his Z4 computer, but it never got written and the design was only published in 1972 [source]. The formal glory goes instead to another of Warren Weaver's protegés, Claude Shannon of the Bell Telephone Company. In a manuscript dated October 1948, Shannon analysed the problems which were likely to be encountered when getting a computer to play chess (subsequently published as Shannon, 1950/1993). He saw this as one of many ways in which computers could move in the direction of processing "general principles, something of the nature of judgement, and considerable trial and error, rather than a strict, unalterable computing process" (Shannon, 1950/1993, p637). He saw "the chess machine" as an ideal problem with which to start the quest for what he called "mechanised thinking", and discussed methods of codifying and then evaluating proposed moves. Not surprisingly, Shannon's papers inspired some ground-breaking computer programming in the early 1950s. The following bullets come from a number of sources, including Copeland (2000 online) .....
Key Innovation - Algorithms vs Heuristics: The word "algorithm" derives from the Arabic "al Khowarasmi", the popular name of the ninth century Arab mathematician Abu Jafar Mohammed Ben Musa, and refers to a sequence of formal actions by which a given problem can be systematically resolved. It is "a systematic list of instructions for accomplishing some task" (Wikipedia, 2003). Thus computer programs are algorithms. The word "heuristic" derives from the Greek root euriskein - "to find" (thus being related to "Eureka" = "found it!"), and, strictly speaking, refers to anything which promotes an act of discovery. However, where algorithms consist of specific actions, heuristics consist of rule-action combinations, so with an heuristic you do not know in advance what you are going to do - you have to wait and see what circumstances trigger what rules. The beauty of the if-then approach is that it can save you a lot of time. When playing chess, for example, it is rarely a good idea to exchange a queen for a pawn, and so an entire branch of the exhaustive search tree can be discounted at a stroke. Insofar as the word is used in game theory, the point about heuristics is that they attempt rapid penetration of a problem rather than exhaustive analysis, by virtue of which they are often referred to as "rules of thumb". Ultimately, of course, both algorithms and heuristics are only as good as their designers. If either is not particularly powerful, then the quality of the penetration will be limited.
4.4 Man-Machine Conversation
The fourth line of enquiry was a more philosophical one, and looked at what needed to be done to turn a cleverly programmed calculator into something qualitatively more than a cleverly programmed calculator. What, for example, might a cash register have to do to be accepted as containing a mind; something more than just a thing which can be switched on and off on demand? What might your laptop have to do to be accepted as possessed of some kind of vital spark? This question was first put forward in a paper from Mr GPC himself, Alan Turing, who, when not codebreaking, was "really quite obsessed with knowing how the human brain worked and the possible correspondence with what he was doing on computers" (Newman, 1994/2003 online, p12). Taking time out from his work on the Manchester Mark 1, Turing wrote a paper entitled "Computing machinery and intelligence" in which he suggested that the question "can machines think?" was philosophically unsafe, due to problems agreeing the meaning of the word think (Turing, 1950/2003 online).
Of course, Turing had long believed that all but the most obscure of problems could be solved by a sequence of simple actions, and already for a decade programmers had been proving him right. He now recommended that this sequence of actions be thought of as having an existence and meaning in itself: a program, once devised, was a solution in prospect, as it were, and simply "awaited" a machine to execute it. Turing then brought these two propositions together by arguing that the mind would not only prove one day to be "programmable", but that the eventual program would be "implementable" on a machine. And once you could do that, he said, you would have a machine which would be indistinguishable from a human; that is to say, a machine which could think. As to how you would test this indistinguishability, he proposed objectively establishing whether the machine, so programmed, could perform as successfully as a human in fooling an interrogator in an "imitation game", in which a man (A) and a woman (B) have to fool (C) as to which is the man and which is the woman. The knowledge pertaining to (A) and (B) is accumulated in the mind of (C) by asking questions, the only restriction being that the answers to those questions should be typewritten so as to prevent vocal clues being given.
Turing's imitation game evolved somewhat over the years, and in its later form became popularly known as the "Turing Test". This runs as follows: if a human in room A were to communicate via keyboard and screen with an entity in room B which might be a human but which might also be a computer trying to appear human, then the definition of "humanness" would rest on whether the real human could tell the difference or not after five minutes of questioning. Serious academic attempts to "pass the Turing Test" began in the 1960s, and we shall be looking at some of these in Part 5. Reader (1969) even sketches out the processing modules which would be necessary to build such a machine [check it out], but the layout he proposes turns out to be yet another instance of the cognitive control hierarchy, indistinguishable in essence from Lichtheim's (1885) "House" diagram, and surpassed in supporting detail by Frank's (1963) "Organogramm".
4.5 "Neurodal" Networks and Machine Learning
The fifth line of enquiry was pitched more at the hardware side of things, and can be traced to a number of seminal papers from the 1940s, led by the American neuropsychologist Warren S. McCulloch (1898-1968). The first of these was McCulloch and Pitts (1943), an attempt at mathematically modelling the workings of biological neurons. The authors called their artificial neurons "neurodes", and each was a simple arrangement of electronic components designed to do two main things, namely (a) to output a signal similar to the bursts of action potentials output by biological neurons, and (b) to ensure that this output was modulatable - sensitive to what other neurodes in the vicinity were doing. In this way, quite simple combinations of neurodes were capable of performing the logical operations AND, OR, and NOT to much the same end effect as the electronic circuitry then being developed for use in computers. Pitts and McCulloch (1947) then went an important step further, by discussing what would happen if you took many such neurodes and wired them together. Specifically, they proposed that it would be possible to end up with some sort of artificial pattern recognition system. And McCulloch and Pfeiffer (1949) considered how binary principles might exist within biology. They portrayed the nervous system "as if it were a wiring diagram and the physiology of the neuron as if it were a component relay of a computing machine" (p369). Their aim was then to "set up working hypotheses as to the circuit action of the central nervous system as a guide to further investigation of the function of its parts, and as a scheme for understanding how we know, think and behave" [ibid.]. This is part of their observations .....
"Judging by the speed of response, neurons fall into the middle range of man-made relays. They are about a thousandth as fast as vacuum tubes, about as fast as thyrotrons, but faster than electromechanical and mechanical devices. Because they are much smaller than electronic valves, though the voltage gradients are about the same, they take lower voltages and less energy. A computer with as many vacuum tubes as a man has neurons in his head would need the Pentagon to house it, Niagara's power to run it, and Niagara's water to cool it. The largest existing calculating machine has more relays than an ant but fewer than a flatworm. Neurons, cheap and plentiful, are also multigridded, the largest having upon them thousands of terminations, so that they resemble transistors in which the configuration of electrodes determines the gating of signals." (McCulloch and Pfeiffer, 1949, p370)
HISTORICAL ASIDE: In fact, the idea of nerve net computation is far older than the attempts to construct it. One of the first to suggest that learning was accompanied by the formation of new synaptic connections was the Aberdeen logician, Alexander Bain (1818-1903). Inspired by the work of Lionel S. Beale (1828-1906), pioneer in medical microscopy, and specifically by Beale (1864), Bain tried to find a likely neuroanatomical substrate for the sort of semantic networks popular with psychologists of the "Associationist" persuasion. The process relied on the "contiguity" - that is to say, closeness together in time or space - of the elements to be associated. "Contiguity," he wrote, "joins together things that occur together, or that are, by any circumstance, presented to the mind at the same time ....." (Bain, 1879, p127). He set out his technical proposals in his 1873 book "Mind and Body", from which the following clearly indicates his line of argument: "For every act of memory [.....] there is a specific grouping, or co-ordination, of sensations and movements, by virtue of specific growths in cell junctions" (Bain, 1873, p91; cited in Wilkes and Wade, 1997). Bain then analysed differently interconnected clusters of neurons using simple abstract circuit diagrams (reviewed by Wilkes and Wade (1997). The next major historical figure was Santiago Ramon y Cajal (1852-1934) (eg. Ramon y Cajal, 1911), whose general idea was that plasticity in neuron-to-neuron connections - what are now known as "synapses" - made it possible for neural pathways to connect themselves up on demand - that is to say, as learning took place. Pathways could thus appear where previously just disconnected neurons had existed. Moreover, the biological act of creating those pathways underpinned the psychological act of memorising the experience in question. Some years later, Rafael Lorente de No (1902-1990) extended the debate by microscopically analysing the layout of the synaptic "buttons" on neural cell bodies (Lorente de No, 1938). Working at the limits of magnification of the microscopes then available, he found many hundreds of synaptic buttons dotted about the neural membrane, thus reinforcing suspicions as to their role in neural circuitry. Subsequent studies have confirmed this view, and with today's very powerful microscopes veritable forests of synaptic buttons can be seen, and not just on the neural soma but out on the dendritic trees as well. Another of Lorente de No's contributions was to show how interneurons could be used to feed excitation back into a given neural circuit, thus keeping that circuit active long after the original source of excitation had been removed. This arrangement is commonly referred to as a reverberating circuit. However, the most influential exposition of neuronal net theory came from Donald O. Hebb (1904-1985). In his 1949 book, "The Organisation of Behaviour", Hebb described the interlinking of neurons as creating what he called a cell assembly, which he described thus: "..... a diffuse structure comprising cells in the cortex and diencephalon (and also, perhaps, in the basal ganglia of the cerebrum), capable of acting briefly as a closed system." (Hebb, 1949, p. xix). Hebb's ideas of how cell assemblies form and function can be gathered from the following: "The general idea is an old one, that any two cells or systems of cells that are repeatedly active at the same time will tend to become 'associated', so that activity in one facilitates activity in the other" (Ibid, p70). The idea of cells repeatedly assisting each other's firing is particularly well brought out in what has since come to be known as "Hebb's Rule".
The next step was to go ahead and wire a few neurodes together, and in 1951 Marvin Minsky did just that (Minsky, 1994). As part of his postgraduate researches in the mathematics department at Princeton, and in conjunction with Dean Edmonds, he built SNARC (= "Stochastic Neural-Analog Reinforcement Computer"), the first machine to display learning behaviour, thanks to an inbuilt reinforcement-sensitive 40-neuron "neural net". This was followed a year later by William Ross Ashby (1903 - 1972)'s "Design for a Brain" (Ashby, 1952). Ashby was the cyberneticist who had popularised the concept of "homeostasis" in biological systems. He therefore took a more macroscopic view of neural architecture than had McCulloch or Minsky, that is to say, he was more concerned with what the nervous system was trying to achieve as an organic whole with interrelating modules, rather than with what individual neurons were up to. His work with biological homeostasis had convinced him that many biological control phenomena were simple negative feedback systems, save that they had biological sensors and comparators, and during the 1940s he had experimented with a simple machine called "the homeostat", which was sensitive to environmental events and could initiate corrective behaviurs to neutralise any "perturbation" in their stable settings. Angyan (1959) reports on a more advanced machine called Machina Reproducatrix, fitted with enough sensors and motors to give it a light-seeking "tropism", however we need to talk about this in Part 7, and so will leave the details until then.
ASIDE: The term "tropism"was coined by Loeb (1888, 1890) to describe the wholly involuntary behaviour of organisms lacking a nervous system. Just as plants can turn towards the sun using non-nervous mechanisms, so, too, is much simple animal behaviour unwilled. To be classified as a tropism, the relationship between the stimulus and the response must be rigid and resulting directly from the action of some external energy source. Tropisms can be subcategorised by energy type and direction of response as follows: Geotropisms: positive - turn towards the pull of gravity; negative - turn away from the pull of gravity, Heliotropisms (or phototropisms): positive: turn towards the light; negative - turn away from the light, Thermotropisms: positive: turn towards heat; negative - turn away from heat, and Rheotropisms: positive: turn upstream (in air or water); negative - turn downstream. In animals with nervous systems, this type of behaviour is provided by reflexes. For a longer introduction to the various types of innate and learned animal behaviour, see Section 1 of our e-paper "Communication and the Naked Ape".
In fact, 1952 was a busy year, because it also saw Claude Shannon following up his chess machine with the blueprints for a maze-solving mechanical mouse (Shannon, 1952/1993), and Anthony G. Oettinger writing a program called "Shopper", the first program (as opposed to learning hardware) capable of learning from experience (Oettinger, 1952). Copeland (2000/2003 online) describes Shopper as operating in a virtual world containing a mall of eight shops stocking a number of items. Shopper would be sent off to purchase a particular item, and would have to visit shops at random until successful. The learning lay in the fact that Shopper would memorise some of the items stocked in the unsuccessful visits. Subsequent shopping expeditions, either (a) for repeat purchases of the first item, or (b) for first time purchases of one of the memorised items, would be right first time, thus showing the value of short term memory resources in supporting adaptive behaviour.
We must also mention Albert M. Uttley, whom we first met in Section 2.2 as the TRE boffin who designed TREAC, the first British parallel processing computer. Over the years, Uttley gradually extended his interests to include biological as well as workbench electronics. Like McCulloch, he was particularly fascinated by the logic by which neurons were interconnected, and was one of the first to describe the sort of statistical connection rules which might control the biological learning process. In Uttley (1955), he derived mathematical formulae to describe such things as dendrite distribution and axonal connections.
However, the most famous of the early machines was built by Frank Rosenblatt in the mid-1950s, and called a "perceptron". The essence of Rosenblatt's design was that the neurodes were arranged into two banks, or "layers". One of these was connected to an artificial eye and called an input layer, and the other was connected to the chosen output mechanism (usually an array of flashing lights) and called an output layer. Each point in the input layer was wired to each point in the output layer, and the effective strength of these connections was specifically designed to be varied. Learning was then a process of changing the strength of the connections - a process known as weighting. As we shall be seeing in Part 5, McCulloch's neurodes and Rosenblatt's perceptron formed the basis for one of modern science's most active and exciting research areas, namely "Connectionism".
The point about perceptrons is that the weighting process delivered learning by experience. There was no pre-written program telling the hardware what to do - rather the machine worked it all out for itself. Perceptrons took the academic world by storm, therefore, especially where there were problems which were proving beyond the programmers in the first place - such as most of higher cognition. Thus .....
"It is not easy to say precisely what is meant by learning or what corresponds to it in a machine. [.....] The machine is able to learn up to a point, but once that point has been passed it is helpless. Here it differs from a human being, who is able to learn and to go on learning. [//] A program which would enable a machine not only to learn but also to go on learning, and to adapt itself to each new situation as it arose, I will call a 'generalised' learning program. If such a program could be constructed it would presumably consist of a relatively small initial nucleus of orders, and would adapt and extend itself as circumstances required, replacing its subroutines and switching sequences by others of a more general kind, and creating new subroutines to perform fresh functions. There does not appear to be any reason for believing that the construction of such a program is inherently impossible [.....] Although limitations of speed and storage capacity would undoubtedly soon make themselves felt, it is not these so much as shortage of ideas which is at present holding up progress." (Wilkes, 1956, pp291-293; emphasis added)
The defining essence of a robot is that it carries out work on behalf of its master. Strictly speaking, therefore, a robot needs bear no physical resemblance to any biological form (although, for reasons which are still being looked into, we seem to prefer that they do). As such, the broad concept can be traced back several thousand years [see our e-resource on the history of automata]. Even in 1950, the name "robot" was already at least 30 years old, being usually credited to Karel Capek's 1921 play "Rossum's Universal Robots", and Isaac Asimov's "Laws of Robotics" were 10 years old. As for the technology itself, it was at least 40 years old, and had been forged, as always, on a war anvil .....
HISTORICAL ASIDE - MILITARY ROBOTICS: The military have always proved generous sponsors for companies specialising in technological innovation, and we have already described elsewhere [reminder] how the Sperry Gyroscope Company grew into a major international corporation on the back of mechanical and electromechanical naval control systems. Sperry were also quick to see the same opportunity for fully automated aerial weapon systems, and as early as 1913 were using gyroscopic stabilisers as part of a radio-controlled aircraft (Pearson, 2003 online). The demands of the First World War extended the science of military robotics into the control of aerial torpedoes and the design of better bombsights and perhaps the best known products of this early research were the World War Two German V-1 and V-2 pilotless rocket systems, which both had autopilots and onboard guidance computers capable of steering them several hundred miles. Much less spectacular, but deadly nonetheless, were the Ruhrstahl SD1400 "Fritz X" and the Henschel Hs293A "stand-off" bombs [picture; pictures and discussion]. These were "Lenk-", that is to say guided, "bomben", and were both small winged glider bombs designed for anti-shipping use. The SD1400 was heavy and free-falling, with an armour-piercing capability, whilst the Hs293 was smaller and rocket-assisted, but lacked the armour-piercing capability. They were released from their mother aircraft while still a number of miles from the target, and steered on their short one-way flight by (initially) radio signals generated by a small joystick [picture]. The Lenkbomben were developed in the period 1940-1943, and when ready for field deployment were issued to specially equipped Luftwaffe squadrons codenamed KG40 and KG100. History's first successful guided missile attack on a surface ship then took place on 27th August 1943 in the Bay of Biscay, when the corvette HMS Egret was sunk and the Canadian destroyer Athabascan damaged by Hs293s launched from KG100 aircraft. The following month, the cruisers HMS Uganda and USS Savannah, together with the battleship HMS Warspite, were seriously damaged by Fritz Xs while supporting the Salerno landings, and the Italian battleship Roma was sunk by one or two hits [accounts differ] from Fritz Xs while attempting to defect to the Allies under the terms of the Italian surrender. The battleship Italia was also hit on this occasion, and her escort, the battleship HMS Valiant, damaged by a near miss, but both managed to make their way to Malta [source]. In November 1943, the British troopship Rohna was also sunk en route from Oran to Bombay carrying American troops, and this incident, in terms of lives lost (1138 dead), was actually a worse disaster than the attack on Pearl Harbour (Cassanello, 2003 online) [fuller story; in memoriam]. Still in the Mediterranean, the cruiser HMS Spartan was then sunk off the Anzio beachhead in January 1944 [fuller account]. The Allies responded very quickly to this new age threat, analysing the German radio control frequencies, and adding radio countermeasure systems (="RCM", or "jamming" equipment) to their front-line naval assets as fast as the equipment could be built and installed. This upgrade extended to a number of landing ships being fitted out as Fighter Direction Tenders (FDTs) in readiness for the 1944 invasion. These vessels were selected because they would be stationed off the landing beaches in a command and coordination role alongside existing headquarters ships such as HMS Largs, Hilary, and Bulolo. As a result, when the invasion went ahead in June 1944 the Allied fleet emerged relatively unscathed. For example, we find the following entry in the log of the battleship USS Texas: "the day [7th June] started with a detection of several anti-ship missile radio signals and subsequent jamming" [source]. KG100 was active for a few days off the Normandy beaches, and scored at least one sinking, that of HMS Lawford on 8th June [divers examined the wreck in 2002, and the nature of the damage confirms that a glider bomb was the most likely culprit]. Off Arromanches, they also hit, but did not sink, HMS Bulolo [picture], the Force G (Gold Beach) headquarters ship, striking her - fortunately for the ship as a whole, if not those actually up there - high on the superstructure [the specialist German bomb-aimers were trained to aim at their target's waterline for maximum effect, but even under ideal conditions were lucky to hit within a five metre radius]. The Germans were apparently slow to respond to Allied RCM with "counter-countermeasures", but eventually replaced the radio controlled Hs293A with the wire-guided, and therefore RCM-immune, Hs293B. However, as the war moved progressively inland the opportunity to field this improved system was lost, and the specialist squadrons were disbanded. In all, 319 Fritz X and Hs293 missiles are reported to have been launched in action, of which roughly 100 scored hits. 40 warships and 30 merchant ships were damaged or sunk [source]. In addition to the ships named above, German war historians also claim the destroyers HMS Jervis, Inglefield, Boadicea, and Intrepid as sunk, and the cruiser USS Philadelphia damaged [confirmation]. In addition, one of the aforementioned FDTs, FDT 216, is on official record as having been sunk by an air-launched "torpedo" on 7th July 1944, but it is unclear from the sources available to us whether this was of the guided or free-running type. The next major advance with tactical rocketry was the "homing" missile. These were developed typically for use against either aircraft or submarines, and the functional principle is that the missile "locks on" to an external energy source (typically the engine exhaust heat of an aircraft or the propeller noise of a submarine) and then twists and turns as much as is necessary to keep that source in its sights (allowing for deflection as necessary). This type of system reached a state of some sophistication in air-to-air missiles such as Sidewinder. Laser guided bombs were developed during the 1980s and saw action during the 1991 Iraq War. They have since been partly replaced by JDAM smart bombs [News24
], where the projectile homes on coordinates programmed into a highly flexible tactical command system, and conveyed to the weapon after its release by signals from a satellite. JDAMs are a few metres less accurate than laser-guided munitions, but are cheaper, can be used in larger numbers at a time, are "fire and forget", and are not affected by cloud, darkness, or smoke. [For a fuller introduction to this type of weapon, click here, for more on the technicalities see Blackwelder (1992/2003 online), for an overview of modern military robotics, click here, and for an introduction to how modern "smart" bombs work, click here.]
So we now have something of a pattern emerging. With conventional munitions, the projectiles are totally "dumb" - mere lumps of iron, devoid of any form of intelligence; no more robots than spears or musket balls are robots. With the operator-guided weapons, the projectiles are responsive, but still dumb - they make no decisions of their own, but instead receive input (from the remote joystick) and generate output (to their steering mechanism). And with the early pilotless rocket systems, all the control systems were pre-set before launch, and the missile followed the programmed trajectory as closely as it was physically able. This mixture of technologies means that in the typical air-ground combat engagement of modern warfare we often pitch significantly different control systems against each other. We might, for example, have a tactical dual between a ground unit armed with a fire and forget shoulder-launched surface-to-air missile such as a Stinger, and a helicopter armed with a laser-spotted air-to-ground round such as Hellfire [pictures and details], or between a city's SAM batteries (radar-guided, passive or active), and a B-2 loaded with JDAMs. The common denominator, of course, is that all forms are ultimately placed on target by a human, even if the act of aiming took place at a keyboard many thousands of miles away and many weeks beforehand. The intelligence (and we use the word cautiously) is in the mind of the operator whose eye is on the target (literally or figuratively) and whose hand is on the joystick. So the machines are not really intelligent at all. Even the JDAM has no "flexible control over the sequence of its operations" (Berkeley, 1949, p5), nor is it likely to acquire any for a good few years yet. In fact, Carnegie Mellon University's Hans Moravec [more on whom in Part 5] doubts that such "hazardous duty" systems should be called robots at all - he prefers the title "autonomous machines", because "they have the bodies of robots, but not the brains" [source].
ASIDE: We use the word "intelligence" cautiously, because it is one of those words which can mean whatever the speaker at the time wants it to mean. It can mean advanced philosophy to some, contemplative planning to others, or just the ability to hold a target in the crosshairs, and it is precisely this vagueness which has kept the Turing Test [see Section 4.4] so relevant all these years. The focus of our attention must therefore turn away from robotic mechanisms and design architectures, and look instead directly at the intelligence we are trying to inject into our robotic brain .....
4.7 Machine Problem Solving
We turn, therefore, to the highest of all forms of guidance, namely reasoned action, and the nearest AI got to this in the 1950s was in its experiments with machine problem solving. The most successful of these was the collaboration between the Rand Corporation's Allen Newell and J. Clifford Shaw, and Carnegie Mellon University's Herbert A. Simon (1916-2001) to investigate how machines might profitably mimic (and potentially help explain) human mathematical problem solving. One of their programs is still regarded as defining the science .....
The "Logic Theorist": This 1956 program was designed to test the truth of specific mathematical theorems. It did this by allowing the programmer to input five basic propositions (called "axioms"), against which were applied a number of "rules of inference". This enabled further logical propositions to be derived, by a process based upon the human process of formal mathematical argument, and as each derived sentence is generated it can be checked to see if it matches the theorem under test. If it does, then the audit trail of successive inferences by which it was derived automatically constitutes the mathematical proof required. If it does not, then the program advances to the next permutation. Unfortunately, the problem with this approach is basically one of time, because ideally you would have to explore every combination of axiom and rule through every possible iteration (a brute force computational approach sometimes known as the "British Museum algorithm") [formal definition]. Remembering the distinction we raised in Section 4.3, this is therefore an excellent example of problem solving by algorithm rather than by heuristic.
Logic Theorist was the central exhibit in 1956 at a conference on AI organised at Dartmouth College by John McCarthy (1927-), one of their mathematics lecturers .....
BIOGRAPHICAL ASIDE - JOHN McCARTHY: McCarthy had been following developments in machine cognition very closely, and had already coined the general descriptor "artificial intelligence" in 1955. He would go on in 1958 to develop the LISP programming language, and in 1959 wrote an influential paper entitled "Programs with Common Sense" in which he defined machine common sense as follows: "..... a program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows." (McCarthy, 1959/2003 online). As to how this might be achieved, reference should be made to the five system features proposed in the original paper, if interested. However, the fifth of these features in particular has turned out to be remarkably prescient .....
"5. The system must be able to create subroutines which can be included in procedures as units. The learning of subroutines is complicated by the fact that the effect of a subroutine is not usually good or bad in itself. Therefore, the mechanism that selects subroutines should have concepts of interesting or powerful subroutine whose application may be good under suitable conditions." (Ibid., p3.)
McCarthy's 1959 paper is also noteworthy as an early example of the ill feeling and interprofessional disdain which can sometimes be seen between linguistics professors and the engineers who gravitate so frequently into AI. "Dr McCarthy's paper," wrote Bar-Hillel in the peer review section of the cited paper, "belongs in the Journal of Half-Baked Ideas [and] before he goes on to bake his ideas fully, it might be well to give him some advice ...." (Ibid., p11). Be that as it may, McCarthy had pointed unerringly at the central problem of any hierarchical control system, namely where to place (and how to build) the bit that makes the all-important judgements of interesting, powerful, etc. McCarthy is currently professor of computer science at Stanford University, and is well on the way to making his lifetime's work available on the Internet [index].
Another program, the "Geometry Theorem Prover" was reportedly inspired by a tentative program structure drawn up in 1956 by MIT's Marvin Minsky. This was a sequence of instructions capable of proving a particular geometrical problem. Minsky passed his notes to a young IBM researcher named Herbert Gelernter, who then implemented it on an IBM 704. The resulting software was capable of analysing a geometrical diagram for its constituent propositions, and of then attempting to derive further proofs in much the same way as Newell, Shaw, and Simons' Logic Theorist.
Newell, Shaw, and Simon struck back in 1957 with the "General Problem Solver", a program expressly designed to simulate the processes of human problem solving. Here is Copeland's (2002/2003 online) summary .....
"..... work continued on the project for about a decade. General Problem Solver could solve an impressive variety of puzzles, for example the 'missionaries and cannivals' problem: How are a party of three missionaries and three cannibals to cross a river in a small boat that will take no more than two at a time, without the missionaries on either bank becoming outnumbered by cannibals? GPS would search for a solution in a trial-and-error fashion, under the guidance of heuristics supplied by the programmers. One criticism of GPS, and other programs that lack learning, is that the program's 'intelligence' is entirely second-hand ....."
The period under consideration ended with a paper by Newell, Shaw, and Simon (1958) on what they called "the problem of complexity". In it, they surveyed the attempts to date (including their own) and the heuristics used, and concluded that the complexity of heuristic programs was beyond the power of the programming languages then available, and appealed for more powerful languages. In Part 5, we shall be looking at what came along.
4.8 Artificial Consciousness
Finally, there has been a line of enquiry which has gone one important step past machine intelligence, looking for the secret of biological consciousness itself. Unfortunately, this remains an area where there is no hard data, little simulation, and a lot of speculation. As with intelligence, we simply do not know what consciousness is in philosophical terms, and so we need to be ready to come across it by accident in any of the various sub-areas of AI. It might be in the interlinguas of the verbal mind (Section 4.1), or in the data networks of the knowing mind (Section 4.2), or in the game playing heuristics of the game-playing mind (Section 4.3), or in the ability to pass the Turing Test (Section 4.4), or in some lucky artifact of the neural network (Section 4.5), or in the onboard intelligence of a third generation robot (Section 4.6), or even in the reflection which characterises good problem solving (Section 4.7). In short, we do not know what we are looking for, and that always makes life difficult for programmers [time, in fact, to call in the systems analysts ;-)]. We therefore close with quotable quotes on this very point by two of generation zero's greatest .....
"We know the basic active organs of the nervous system (the nerve cells). There is every reason to believe that a very large-capacity memory is associated with this system. We do most emphatically not know what type of physical entities are the basic components of the memory in question." (von Neumann, 1958, p68; emphasis original)
"There is no difficulty in programming a universal machine to perform any set of operations when once the rules for performing them have been stated explicitly; this is true however complex the rules may be and however many alternative procedures of a conditional type are specified" (Wilkes, 1956, p291; emphasis added.).
5 - Still to Come
That concludes our review of the history of first generation computing technology during the period 1951 to 1958. In Part 5, we bring the review up to the present day, and identify several more key technical innovations, not least Database Management Systems, in Part 6 we look at computer memory in greater detail, especially as used in the COBOL programming language, the IDMS DBMS, and the TPMS TP monitor, and in Part 7 we undertake a comparative review of how well the full richness of computer memory subtypes had been incorporated into psychological theory.
6 - References
See Main Menu File.