Lecturer's Précis - Norris (1991)

"The Constraints on Connectionism"

Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2004-2018, Derek J. Smith.

 

First published online 14:40 GMT 17th March 2004, Copyright Derek J. Smith (Chartered Engineer). This version [2.0 - copyright] 09:00 9th July 2018.

Readers unfamiliar with the concepts of "connectionism" and "neural network" should pre-read our e-handout on "Connectionism".

1 - Introduction

Norris begins by summarising the value of connectionism to psychologists as follows .....

"The recent revival of connectionism has probably created more excitement than any other development in the history of cognitive psychology. Part of this excitement is due to the fact that connectionist models naturally express a number of characteristics which seem to typify human cognition. For example, graceful degradation of behaviour following damage, content addressable memory, and pattern completion are all cited as natural properties of connectionist networks [.....]. The other source of the appeal of connectionism is that, like the brain, connectionist networks are built from large numbers of highly interconnected simple processing units. Therefore connectionism looks like a good starting point for building brain-like models of cognition." (p293.)

However, connectionism's allure can sometimes be misleading, because connectionist nets cannot always cope. Some problems leave them floundering, and the idea that "you simply train your network" (p293) and then inspect the end result to learn how biological brains have been doing the task all along is flawed to the extent that "connectionist learning algorithms aren't really that smart" (ibid.). The most serious weakness emerges whenever the problem at hand needs to be solved in discrete steps, as now discussed.

The specific problem Norris gave his network was how, given any date in a century, to reply with what day of the week it was. It is worth familiarising oneself with what this task involves before proceeding .....

Exercise - What Day of the Week?

(1) Familiarise yourself with this task. What day of the week was/will be .....

seven days ago

ten days' time

49 days ago

48 days ago

7000 days' time

6999 days' time

11th September 2001

18th March 2042

29th February 2080

Norris began by taking the classical connectionist approach. This involved trial-and-error training of a single network with a series of actual date-day pairs, thus .....

"We started by training a network with a single layer of hidden units on 20 per cent of the dates in a 50-year period. All it learned was the dates we trained it on! The network's ability to generalise beyond the dates it was trained on was negligible." (p295.)

So Norris then turned to the problem itself, and to cut a long story short, it turned out that the best solution was to treat the conversion in three separate steps, as follows .....

Preparatory: Select a single "base month", in which all date-day conversions are known.

Step #1: Given the target date, ignore the month and year digits, and look up the corresponding day digits in the base month.

Step #2: Now compute and apply an "offset" to account for the difference (if any) between the target month and the base month.

Step #3: Now compute and apply another "offset" to account for the difference (if any) between the target year and the base year.

Norris then took three separate neural networks, allocating one to each element of the superordinate task. Each network was then separately trained up on what it alone had to learn [thus the Step #1 network was trained up on the Step #1 conversion, and so on]. Only when this had been done were they allowed to communicate one to another, whereupon "the whole net performs at about 90 per cent correct, about as good as the best date calculators in the literature" (p295). The secret, in other words, is to break the problem down into its logical components. "Instead of having to solve one large problem," Norris writes, the connectionist network was much cleverer when it "simply had to solve three far smaller problems" (ibid.). Here is Norris' critical observation .....

"We should not be surprised to find that connectionism offers no easy solutions. [because] the only way we can build interesting connectionist models is by first understanding the structure of complex tasks like language understanding or face recognition. Connectionism will not furnish that understanding for us." (p296; bold emphasis added.)

2 - Evaluation

Norris was one of the first to demonstrate that there exist tasks for which you need separate neural networks, which, moreover, you need to connect up in a very precise fashion, and train in a very precise sequence. And what this gives you, of course, is a network of neural networks, in which the connections within AND BETWEEN each module are both vitally important. Which, for all the technology at its disposal, suddenly placed connectionism conceptually back with the nineteenth century diagram makers. As Norris himself concludes, the modularity of the cognitive system is going to have to be deciphered in the first instance by humans, not by machines .....

"If we knew how to knit together a language processing network in the way I have done for the date-calculation task we would already know most of the answers to the really difficult theoretical problems in psycholinguistics. We would have understood the algorithms and we would know how they fitted together. Connectionism will just provide the theoretical tools for building the model and testing it out." (p296.)