IDMS-traversal-example

Course Handout - Database Navigation and the IDMS Semantic Net

Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2003-2018, Derek J. Smith.

First published online 09:30 BST 2nd October 2003, Copyright Derek J. Smith (Chartered Engineer). This version [2.0 - copyright] dated 09:00 BST 3rd July 2018

Although this paper is reasonably self-contained, it is best read as a subordinate file to our seven-part review of how successfully the psychological study of biological short-term memory (STM) has incorporated the full range of concepts and metaphors available to it from the computing industry. To go directly to the superordinate content file, click here, to go to the superordinate menu file, click here, and to see the author's homepage, click here.

1 - The IDMS Database Management System

We introduced network databases in general and the IDMS DBMS in particular in Part 5 (Section 3.1), we refreshed our memories about database currencies in Part 6 (Section 3.8), and we studied the process of data modelling in our e-paper on "Data Modelling". In Section 1.1, we now look at the use of STM during database navigation, or "traversal". This is the all-important process of activating/retrieving the referentially coherent data fragments needed to complete an application at hand.

ASIDE - DATABASE INTEGRITY: It is in the nature of network databases such as IDMS that they deliberately fragment and distribute their data. What appears in the real world as a single tangible thing - your telephone bill, for example - has been broken down and stored away as a network of interrelated records. This makes for excellent versatility and speed of access, but puts the onus squarely on systems and applications programmers alike not to get the fragments accidentally mixed up. For example, you would not take kindly to your billing address record mistakenly getting associated with the billing records rightfully belonging to the busy call centre down the road!

In Section 1.2, we then work through a specimen database traversal in detail, using some of the pseudocode introduced in Part 6 (Section 2).

1.1 Database Entry Points and Search Options

Newcomers to database concepts should refamiliarise themselves with the topics of Random Access, the Hashing Algorithm, and the Database Key, all from Part 5 (Section 3.1), before proceeding.

IDMS databases are organised into "realms" of sequentially numbered pages of known and identical capacity (in bytes). Each page contains a small header index, followed by up to 256 separate records, or "lines", each with its own unique page-and-line database key. These records are typically organised into sets by one or more "chain pointers" [actually more database keys] added into their record length. Each page can be transferred on a random access basis as a data block by a single disk read or write. Database realms are typically very large, having many pages, large pages, or both, and so the art of traversing them efficiently is (a) to establish what is known as an "entry point", that is to say, a suitable starting page, and (b) to access no more pages than is absolutely necessary. Unfortunately, there is no absolute definition of what makes a particular entry point "suitable", and so the broader nature of the traversal needs to be considered. Here are some of the retrieval options available, enhanced from Bachman (1973) .....

(1) A search can be started at the beginning of a database realm, and then proceed line by line within page by page until there are no more records to examine. This will retrieve records in strict database key sequence with no reference to record type. It therefore retrieves all possible records, and the order in which it retrieves them will to all intents and purposes be random. The COBOL pseudocode for this type of operation in OBTAIN NEXT IN REALM.

(2) A search can be conducted by database key, the known permanent address of the record in question. This will retrieve the record at the specified line and page, again without regard to its record type. This option can be used in conjunction with option (1) to begin a realm sweep from part way through [this facility might be useful, for example, if restarting a full sweep after an interruption]. The COBOL pseudocode for this type of operation in OBTAIN DB-KEY IS {database-key}.

(3) It is also possible to retrieve the record at a specified line and page by using the database currency mechanisms described in Part 6 (Section 3.8). The currency tables maintained by the DBMS allow the last accessed record of a particular type or in a particular set to be re-accessed by its database key without the programmer needing to know that database key explicitly [that is to say, the currency tables are a systems programming facility, and not an applications programming one]. The use of currency tables in this way therefore represents a major application of an STM resource to LTM retrieval. The COBOL pseudocode for this type of operation in OBTAIN CURRENT OF {record-type} or OBTAIN CURRENT OF {set-name}, and examples of this access method are given in Section 1.2 below, at Steps 5 and 7.

(4) A search can be conducted by key field hashing algorithm. This is the method known as "CALC" access, because of the calculations carried out by the hashing algorithm. The algorithm takes the contents of the specified key field, say the name "WILLIAMSON", performs a mathematical conversion of the component letters "W", "I", "L", etc., (and any trailing spaces), and comes up with a number between 1 and the number of pages in the realm. Since exactly the same algorithm was used when the record was originally stored, we now know exactly where to go to get it back. The DBMS software can then retrieve the specified page, and rapidly scan down it for the matching line. In case of "overflow", that is to say, where the true page has filled up, the system maintains internal pointers to suitable overflow pages. The COBOL pseudocode for this type of operation in OBTAIN CALC {record-type}, and an example of this access method is given in Section 1.2 below, at Step 1.

(5) Using one of the earlier methods as initial entry point, a search can then proceed via a pre-established set relationship (if there is one). This may go in either direction around the set according to which NEXT or PRIOR pointers have been designed in. The COBOL pseudocode for this type of operation in OBTAIN NEXT/PRIOR [{record-type}] IN SET {set-name}, and examples of this access method are given in Section 1.2 below, at Steps 2 and 3.

(6) Using one of the earlier methods as initial entry point, a search can also proceed using NEXT, PRIOR, or OWNER pointers to the target OWNER record. The COBOL pseudocode for this type of operation in OBTAIN OWNER IN SET {set-name}. OWNER pointers are a system and storage overhead, and so would usually only be provided for very long sets distributed across many pages, when there is accordingly a response time payoff to be had [OWNER pointers take you straight to the OWNER record, rather than leaving you to get there by walking the set]. An example of this access method is given in Section 1.2 below, at Step 4.

1.2 - Database Traversal by Key, Set, and Currency

The material in this subsection has been taken more or less verbatim from Smith (1997) "The IDMS Set Currency and Biological Memory".

Scenario: An industrial training centre wants to know the names and addresses of the employers of the students due to attend a coming month's training courses.

Notes and Assumptions: Databases designed for use in educational establishments naturally have to contain many <student> records. In this example, we shall assume that each of these is simultaneously a member of two important sets, namely <employer-student-set> and <course-student-set>. This allows the students on any one course or employed by any one employer to be selectively (and therefore rapidly) retrieved. We shall also assume that the designers have arranged for each <course> record to be owned by a series of <month> records, thus allowing the courses in any one month to be selectively retrieved.

Traversal Structure: The logic, or "structure", of the necessary database enquiry is therefore to locate the target <month> record and read every <course> it owns, and for every <course> to read every <student>, and for every <student> to locate the appropriate <employer> record so as to extract the required name and address information. The pseudocode for this processing structure is as follows:

Step	Pseudocode Instruction	Note
1	OBTAIN CALC <month>	Using the hashing algorithm, the "mm-yyyy" record key (input by the user) is converted to its equivalent database page number. The page is retrieved from disk, and rapidly scanned for the target <month> record. Amongst other things, this record will contain the chain pointer database key of the first <course> record in its <month-course-set>.
2	OBTAIN NEXT IN <month-course-set>	Using the chain pointer from step 1, the corresponding <course> record is obtained. (Note that the OBTAIN NEXT instruction is used for every step along the chain of chain pointers, including the first and the last.) Amongst other things, this record will contain the chain pointer database key of the first <student> record in its <course-student-set>. Also, and without being asked to, the DBMS will note the database key of this record as "current of set" for the <month-course-set>. When the point comes that there are no more records in <month-course-set> then the END-OF-SET condition has been reached and processing jumps to step 9 below.
3	OBTAIN NEXT IN <course-student-set>	Using the chain pointer from step 2, the corresponding <student> record is obtained. (Again, the same OBTAIN NEXT instruction is used for every step along the chain of chain pointers.) The <student-name> is then copied across to the output display. Again without being asked to, the DBMS will note the database key of this record as "current of set" for the <course-student-set>. When the point comes that there are no more records in <course-student-set> then the END-OF-SET condition has been reached and processing jumps to step 7 below.
4	OBTAIN OWNER IN <employer-student-set>	Using the owner pointer database key on the <student> record, the <employer> who will be paying for the course is obtained. The relevant name and address details are then copied across to the output display alongside <student name>.
5	OBTAIN CURRENT OF <course-student-set>	Using the set currency database key from step 3, the <student> record which has just been processed is re-accessed.
6	Repeat steps 3 - 5	Using the chain pointer from the current <student> record, the next <student> booked on the course is obtained. Again <student-name> is noted, and again the owning <employer> is found via the <employer-student-set>. Every <student> on the course in question in processed in this way, until the END-OF-SET condition described in step 3 is reached. At END-OF-SET <course-student-set> .....
7	OBTAIN CURRENT OF <month-course-set>	Using the set currency from step 2, the <course> record which has just been processed is re-accessed.
8	Repeat steps 2 - 6	Using the chain pointer from the current <course> record, the next <course> in the month is obtained. Every <student> in the new <course-student-set> is then processed as described in steps 3 - 6. Every <course> in the month in question in processed in this way, until the END-OF-SET condition described in step 2 is reached. At END-OF-SET <month-course-set> .....
9	Display the output display in its final form and close everything down tidily.

In this example, we clearly see the traversal philosophy hard at work. The initial access key provides our "entry point" (on this occasion, the act of identifying the target month using its <mm-yyyy> record key), and once this initial entry has been made all additional data can be obtained using a progressive sweep of the sets into which the data has been organised. The process of traversing the network in this targeted and highly structured way is entirely typical of IDMS programming, and the process of setting up the network in the first place is entirely typical of IDMS database design.

We also see the set currency mechanism hard at work. Step 2 of the pseudocode, for example, is the instruction which obtains the various <course> records belonging to the <month> under investigation. When it is issued for the first time, it is following a NEXT pointer from the owning <month> record to the first <course> record in the <month-course-set>. On the second and subsequent occasions, however, it is following a NEXT pointer on the preceding <course> record, and access to this will have been lost during the execution of steps 3 to 6 for that preceding record. What the programmer has to do, therefore, is issue an OBTAIN CURRENT OF SET instruction against <month-course-set> as shown in step 7. The DBMS then uses the appropriate currency indicator to relocate the <course> record in readiness for looping back to step 2 to find the next one. What this slightly counter-intuitive action does is to ensure that all the necessary database pointers are refreshed and confirmed before being relied upon by the continuing search.

References

Bachman, C.W. (1973). The programmer as navigator. Communications of the ACM. 16(11):653-658.

Smith, D.J. (1997). The IDMS Set Currency and Biological Memory. Cardiff: UWIC. [ISBN: 1900666057]