Course Handout - Database Navigation and
the IDMS Semantic Net
Copyright Notice: This material was written and published in Wales by
Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource,
and subject only to acknowledging Derek J. Smith's rights under international
copyright law to be identified as author may be freely downloaded and printed
off in single complete copies solely for the purposes of private study and/or
review. Commercial exploitation rights are reserved. The remote hyperlinks have
been selected for the academic appropriacy of their contents; they were free of
offensive and litigious content when selected, and will be periodically checked
to have remained so. Copyright © 2003-2018,
Derek J. Smith.
|
First published online 09:30 BST 2nd October 2003,
Copyright Derek J. Smith (Chartered Engineer). This
version [2.0 - copyright] dated 09:00 BST 3rd July 2018
Although this paper is reasonably self-contained, it is best read as a subordinate file to our seven-part review of how successfully the psychological study of biological short-term memory (STM) has incorporated the full range of concepts and metaphors available to it from the computing industry. To go directly to the superordinate content file, click here, to go to the superordinate menu file, click here, and to see the author's homepage, click here. |
1 - The IDMS
Database Management System
We introduced network databases in general and the IDMS DBMS in particular in Part 5 (Section 3.1), we refreshed our memories about database currencies in Part 6 (Section 3.8), and we studied the process of data modelling in our e-paper on "Data Modelling". In Section 1.1, we now look at the use of STM during database navigation, or "traversal". This is the all-important process of activating/retrieving the referentially coherent data fragments needed to complete an application at hand.
ASIDE - DATABASE INTEGRITY:
It is in the nature of network databases such as IDMS that they deliberately
fragment and distribute their data. What appears in the real world as a single
tangible thing - your telephone bill, for example - has been broken down and
stored away as a network of interrelated records. This makes for excellent
versatility and speed of access, but puts the onus squarely on systems and
applications programmers alike not to get the fragments accidentally mixed up.
For example, you would not take kindly to your billing address record
mistakenly getting associated with the billing records rightfully belonging to
the busy call centre down the road!
In Section 1.2, we then work through a specimen database traversal in detail, using some of the pseudocode introduced in Part 6 (Section 2).
1.1 Database Entry Points and Search Options
Newcomers to database concepts should refamiliarise
themselves with the topics of Random Access, the Hashing Algorithm, and the
Database Key, all from Part 5 (Section 3.1), before proceeding.
IDMS databases are organised into "realms" of sequentially numbered pages of known and identical capacity (in bytes). Each page contains a small header index, followed by up to 256 separate records, or "lines", each with its own unique page-and-line database key. These records are typically organised into sets by one or more "chain pointers" [actually more database keys] added into their record length. Each page can be transferred on a random access basis as a data block by a single disk read or write. Database realms are typically very large, having many pages, large pages, or both, and so the art of traversing them efficiently is (a) to establish what is known as an "entry point", that is to say, a suitable starting page, and (b) to access no more pages than is absolutely necessary. Unfortunately, there is no absolute definition of what makes a particular entry point "suitable", and so the broader nature of the traversal needs to be considered. Here are some of the retrieval options available, enhanced from Bachman (1973) .....
(1)
A search can be started at the beginning of a database realm, and then proceed
line by line within page by page until there are no more records to examine.
This will retrieve records in strict database key sequence with no reference to
record type. It therefore retrieves all possible records, and the order in
which it retrieves them will to all intents and purposes be random. The COBOL pseudocode for this type of operation in OBTAIN NEXT
IN REALM.
(2)
A search can be conducted by database key, the known permanent address of the
record in question. This will retrieve the record at the specified line and
page, again without regard to its record type. This option can be used in
conjunction with option (1) to begin a realm sweep from part way through [this
facility might be useful, for example, if restarting a full sweep after an
interruption]. The COBOL pseudocode for this type of operation
in OBTAIN DB-KEY IS {database-key}.
(3)
It is also possible to retrieve the record at a specified line and page by
using the database currency mechanisms described in Part 6 (Section 3.8). The currency tables maintained by the DBMS allow the last accessed
record of a particular type or in a particular set to be re-accessed by its
database key without the programmer needing to know that database key
explicitly [that is to say, the currency tables are a systems programming
facility, and not an applications programming one]. The use of currency
tables in this way therefore represents a major application of an STM resource
to LTM retrieval. The COBOL pseudocode for this type of operation in OBTAIN
CURRENT OF {record-type} or OBTAIN CURRENT OF {set-name}, and
examples of this access method are given in Section 1.2 below, at Steps 5 and
7.
(4)
A search can be conducted by key field hashing algorithm. This is the method
known as "CALC" access, because of the calculations carried out by
the hashing algorithm. The algorithm takes the contents of the specified key
field, say the name "WILLIAMSON", performs a mathematical conversion
of the component letters "W", "I", "L", etc.,
(and any trailing spaces), and comes up with a number between 1 and the number
of pages in the realm. Since exactly the same algorithm was used when the
record was originally stored, we now know exactly where to go to get it back.
The DBMS software can then retrieve the specified page, and rapidly scan down
it for the matching line. In case of "overflow", that is to say,
where the true page has filled up, the system maintains internal pointers to
suitable overflow pages. The COBOL pseudocode for this type of operation in OBTAIN
CALC {record-type}, and an example of this access method is given in
Section 1.2 below, at Step 1.
(5)
Using one of the earlier methods as initial entry point, a search can then
proceed via a pre-established set relationship (if there is one). This may go
in either direction around the set according to which NEXT or PRIOR pointers
have been designed in. The COBOL pseudocode for this type of operation in OBTAIN
NEXT/PRIOR [{record-type}] IN SET {set-name}, and examples of this access
method are given in Section 1.2 below, at Steps 2 and 3.
(6)
Using one of the earlier methods as initial entry point, a search can also proceed using NEXT, PRIOR, or OWNER pointers to the target
OWNER record. The COBOL pseudocode for this type of operation
in OBTAIN OWNER IN SET {set-name}. OWNER pointers are a system
and storage overhead, and so would usually only be provided for very long sets
distributed across many pages, when there is accordingly a response time payoff
to be had [OWNER pointers take you straight to the OWNER record, rather than
leaving you to get there by walking the set]. An example of this access method
is given in Section 1.2 below, at Step 4.
1.2 - Database
Traversal by Key, Set, and Currency
The material in this subsection has been taken more or
less verbatim from Smith (1997) "The IDMS
Set Currency and Biological Memory".
Scenario: An industrial training centre wants
to know the names and addresses of the employers of the students due to attend
a coming month's training courses.
Notes and
Assumptions: Databases designed for
use in educational establishments naturally have to contain many
<student> records. In this example, we shall assume that each of these is
simultaneously a member of two important sets, namely
<employer-student-set> and <course-student-set>. This allows the
students on any one course or employed by any one employer to be selectively
(and therefore rapidly) retrieved. We shall also assume that the designers have
arranged for each <course> record to be owned by a series of
<month> records, thus allowing the courses in any one month to be
selectively retrieved.
Traversal
Structure: The logic, or
"structure", of the necessary database enquiry is therefore to locate
the target <month> record and read every <course> it owns, and for
every <course> to read every <student>, and for every
<student> to locate the appropriate <employer> record so as to
extract the required name and address information. The pseudocode for this
processing structure is as follows:
Step |
Pseudocode Instruction |
Note |
1 |
OBTAIN CALC <month> |
Using the hashing algorithm, the "mm-yyyy" record key (input by the user) is converted to its equivalent database page number. The page is retrieved from disk, and rapidly scanned for the target <month> record. Amongst other things, this record will contain the chain pointer database key of the first <course> record in its <month-course-set>. |
2 |
OBTAIN NEXT IN <month-course-set> |
Using the chain pointer from step 1, the corresponding <course> record is obtained. (Note that the OBTAIN NEXT instruction is used for every step along the chain of chain pointers, including the first and the last.) Amongst other things, this record will contain the chain pointer database key of the first <student> record in its <course-student-set>. Also, and without being asked to, the DBMS will note the database key of this record as "current of set" for the <month-course-set>. When the point comes that there are no more records in <month-course-set> then the END-OF-SET condition has been reached and processing jumps to step 9 below. |
3 |
OBTAIN NEXT IN <course-student-set> |
Using the chain pointer from step 2, the corresponding <student> record is obtained. (Again, the same OBTAIN NEXT instruction is used for every step along the chain of chain pointers.) The <student-name> is then copied across to the output display. Again without being asked to, the DBMS will note the database key of this record as "current of set" for the <course-student-set>. When the point comes that there are no more records in <course-student-set> then the END-OF-SET condition has been reached and processing jumps to step 7 below. |
4 |
OBTAIN OWNER IN <employer-student-set> |
Using the owner pointer database key on the <student> record, the <employer> who will be paying for the course is obtained. The relevant name and address details are then copied across to the output display alongside <student name>. |
5 |
OBTAIN CURRENT OF <course-student-set> |
Using the set currency database key from step 3, the <student> record which has just been processed is re-accessed. |
6 |
Repeat steps 3 - 5 |
Using the chain pointer from the current <student> record, the next <student> booked on the course is obtained. Again <student-name> is noted, and again the owning <employer> is found via the <employer-student-set>. Every <student> on the course in question in processed in this way, until the END-OF-SET condition described in step 3 is reached. At END-OF-SET <course-student-set> ..... |
7 |
OBTAIN CURRENT OF <month-course-set> |
Using the set currency from step 2, the <course> record which has just been processed is re-accessed. |
8 |
Repeat steps 2 - 6 |
Using the chain pointer from the current <course> record, the next <course> in the month is obtained. Every <student> in the new <course-student-set> is then processed as described in steps 3 - 6. Every <course> in the month in question in processed in this way, until the END-OF-SET condition described in step 2 is reached. At END-OF-SET <month-course-set> ..... |
9 |
Display the output display in its final form and close everything down tidily. |
|
In this example, we clearly see the traversal philosophy hard at work. The initial access key provides our "entry point" (on this occasion, the act of identifying the target month using its <mm-yyyy> record key), and once this initial entry has been made all additional data can be obtained using a progressive sweep of the sets into which the data has been organised. The process of traversing the network in this targeted and highly structured way is entirely typical of IDMS programming, and the process of setting up the network in the first place is entirely typical of IDMS database design.
We also see the set currency mechanism hard at work. Step 2 of the pseudocode, for example, is the instruction which obtains the various <course> records belonging to the <month> under investigation. When it is issued for the first time, it is following a NEXT pointer from the owning <month> record to the first <course> record in the <month-course-set>. On the second and subsequent occasions, however, it is following a NEXT pointer on the preceding <course> record, and access to this will have been lost during the execution of steps 3 to 6 for that preceding record. What the programmer has to do, therefore, is issue an OBTAIN CURRENT OF SET instruction against <month-course-set> as shown in step 7. The DBMS then uses the appropriate currency indicator to relocate the <course> record in readiness for looping back to step 2 to find the next one. What this slightly counter-intuitive action does is to ensure that all the necessary database pointers are refreshed and confirmed before being relied upon by the continuing search.
References
Bachman, C.W.
(1973). The programmer as navigator. Communications of the ACM. 16(11):653-658.
Smith, D.J.
(1997). The IDMS Set Currency and Biological Memory. Cardiff: UWIC.
[ISBN: 1900666057]