Support for "Local Values" in the OpenBIOS Tokenizer

Table Of Contents

  1. Support for "Local Values" in the OpenBIOS Tokenizer
    1. Overview
    2. Scope of this Document
    3. Design Objectives
    4. Functional Requirements
    5. Interface Specification
      1. Syntax:
        1. Declaration: (Curly-braces:   {   and   }   )
        2. Separation Character between Initialized and Uninitialized Local Values
        3. Value Extraction: Invocation
        4. Value Assignment: The   ->   Operator.
        5. Scope of Local Values:
      2. Semantics:
        1. Order of Initialization:
        2. Note two significant departures from the "Local Variables" discussed in the ANSI FORTH Standard document:
        3. Value Extraction
        4. Value Assignment
    6. Implementation
      1. Compilation -- the Parser:
        1. Run-Time Support Function names:
          1. Example:
      2. Run-Time Support -- the Local Values Support "Library":
        1. Operational Run-Time Support Functions:
        2. Important note regarding floading of the Local Values Support Functions file
        3. Adjusting the size of Local Values Storage:
        4. Additional Support Function(s):
          1. CATCH / THROW

Overview

The goal of this project is to implement support for "Local Values" in the OpenBIOS Tokenizer.

"Local Values" are an IBM-specific extension of FORTH syntax, currently used both by the FCode Tokenizer and Platform Firmware. They might be considered a variant that meets the spirit, if not the letter, of the suggestions for a "Locals word set" discussed -- but not specified -- in the ANSI FORTH Standard, Section 13 and Appendix A.13

(Please note that the ANSI document does not really specify this feature, because the Committee could not reach an agreement. Appendix A.13 records the somewhat lively discussions that accompanied this topic.)

We will refer to this feature with the nomenclature "Local Values" in preference to "Local Variables" or "Locals" in order to (a) more accurately characterize the behavior of these objects, and (b) further emphasize the differences between the IBM-specific extension and those discussed in the ANSI document.

Scope of this Document

The sections labeled Syntax and Semantics describe the user's view of this feature.

The section labeled Implementation is a description of the underlying parsing and support mechanisms that meet the Design Objectives.

Design Objectives

The "Local Values" extension is intended to relieve programmers maintaining FCode device drivers from the complexity involved in keeping track of the positions of the various items on the stack. The programmer can, instead, refer to these items by symbolic names, in a manner similar to "C" syntax.

Functional Requirements

Tokenizing source-code that makes use of the "Local Values" syntax shall still result in Industry Standard FCode that can be interpreted by the FCode interpreter of any Open Firmware-compliant Host Platform, without imposing any IBM-specific requirements.

Also, the implementation shall support a means whereby to remain compatible with IBM's existing code-base.

Interface Specification

Syntax:

Declaration: (Curly-braces:   {   and   }   )

Local Values may only be declared in connection with a colon-definition (A "word" in FORTH parlance.)

Declaration of Local Values is triggered by an open-curly-brace (i.e.,:   {   ) , and ends with a close-curly-brace (   }   ).

A further distinction is made between Initialized Local Values and Uninitialized Local Values:  Initialized Local Values are declared first, and are separated by a special character from Uninitialized Local Values.

Declaration of Local Values may only occur once within the body of the colon-definition.

Declaration of Local Values after code has been compiled into the body of the word is not recommended, but is permitted.  A Local Values Declaration that occurs inside a Flow-Control Structure will be reported as an Error.

A Local Values Declaration may include comments and may continue across multiple lines.  See the example in the Implementation section.

Separation Character between Initialized and Uninitialized Local Values

Two symbols are accepted as the separator between Initialized and Uninitialized Local Values, the Semicolon; ) and the Vertical-Bar (   |   ).

Since, in FORTH, Semicolon is heavily fraught with a very important meaning, it is preferable to use a different symbol -- one that isn't used for anything else -- as the separator between Initialized and Uninitialized Local Values. Better still would be a symbol that's given at least passing mention in the discussion about the (failed) attempt to establish an ANSI standard for Locals (see the ANSI Forth Spec., section 13.6.2.1795).

The Vertical-Bar symbol (   |   ) fills that bill nicely.

Local Values Declarations will accept Semicolon as an alternative ("Legacy") separator between Initialized and Uninitialized Local Values, and issue a Warning message to the effect that the use of Semicolons in that context is deprecated in favor of the Vertical-Bar.

The User may suppress this message by means of a Command-line switch, known as the Special-Feature Flag named NoLV-Legacy-Message , which is described in the Tokenizer User's Guide.

Conversely, the User who wishes to disallow the use of Semicolon as an alternative separator may do so by means of the Special-Feature Flag named NoLV-Legacy-Separator .  When the Legacy Local Values Separator is thus disallowed, occurrences will be treated as an Error.

Value Extraction: Invocation

A name declared as a Local Value may be invoked within the body of a word in connection with which it was declared, simply by name, in a manner similar to a name defined by the Standard FORTH defining-word, VALUE , to mean that its associated value is to be extracted and placed onto the Stack.

Value Assignment: The   ->   Operator.

The symbol   ->   (dash angle-bracket, pronounced "dash-arrow"), may precede the name of a Local Value, and it may not precede anything else.

No comments are permitted between the   ->   and the Local-Value name to which it applies.

The   ->   and the Local-Value name to which it applies must be on the same line.

The   ->   operator relates to the Local-Value name to which it is applied in a manner similar to the way the  TO  operator relates, when it is applied, to a name defined by  VALUE  ; it causes the numeric value on top of the Parameter Stack to be popped and stored into -- associated with -- the named Local Value.

Scope of Local Values:

Upon completion of the definition of the word in connection with which a set of Local Values was declared, i.e., at the semicolon, the names of the Local Values cease to be recognized. If the same names are declared in connection with a subsequent definition, they are only applicable to that subsequent definition, as if they were newly created. No warning is issued, nor do rules concerning "overloading" apply.

Semantics:

Order of Initialization:

Uninitialized Local Values do not have a value until one is assigned within the definition, by the use of the   ->  ("dash-arrow") operator.

Initialized Local Values are initialized from the stack at the start of execution of the defined word, in the same order as the convention for a stack-diagram, i.e., the first-named Local Value is initialized from the stack-item whose depth corresponds to the total number of initialized Local Values, the last-named Local Value is initialized from the top-of-stack item, and so on in between.

The following will serve to illustrate:

: <word-name> ( P_x ... P_y   P_0 P_1 ... P_n-2 P_n-1 -- ??? )
                                                { IL_0 IL_1 ... IL_n-2 IL_n-1 | UL_0 UL_1 }

              \       At the start of the word, IL_0 through IL_n-1 are initialized
              \       with P_0 thorough P_n-1, respectively, and the stack contains
                                                                                                  (   P_x ... P_y   )

Note two significant departures from the "Locals" discussed in the ANSI FORTH Standard document:

(1) The ANSI FORTH Committee discussions make no provision for Uninitialized Locals,

and

(2) The order of initialization is reversed. In the ANSI document, Locals are initialized in the order they are declared, so that the first-declared will take the topmost value on the stack, and the last-declared will take the deepest value.

The general consensus within IBM is that this scheme is confusing at best, and does not serve the intent of the Design Objectives.

Value Extraction

When the name of a Local Value is invoked, its associated value is extracted and pushed onto the Parameter Stack.

Value Assignment

If the   ->   ("dash-arrow") symbol precedes the name of a Local Value, then the numeric value on top of the Parameter Stack is popped and stored into -- associated with -- the named Local-Value.

Following the   ->   ("dash-arrow") symbol with anything other than the name of a Local Value is an Error.

Implementation

Compilation -- the Parser:

A separate area (a "vocabulary", in Forth parlance) must be reserved where temporary compile-time definitions of the new Local Value names can be created, and whence the new Local Value names can be removed after the definition of the word in connection with which they were declared is completed. Variables must also be set aside to keep count of the Initialized and Uninitialized Local Value names declared inside the curly-braces.

Each new Local Value name has an integer assigned to it. The Parser assigns successive integers, starting with 0, to the Local Value names, in the order that they are declared, and enters the name of each new Local Value, together with its assigned integer, into the separate reserved temporary area.

After all the Local Value names have been declared, i.e., after the close-curly-brace has been read, the Parser compiles-in the number of Initialized Local Values, followed by the number of Uninitialized Local Values, where they will act as arguments to the appropriate function, which the Parser compiles-in immediately after. The function will be the special one that allocates space for, and initializes, the Local Values at the time they are about to be used.

While the definition under construction is being compiled, the area where the temporary compile-time definitions of the new Local Value names have been created must be available to the scanning process, so that the new names will be recognized when invoked. Also, it should be scanned first, ahead of any other word-lists, so that the Local Value names will supercede any similarly-named words, in case of a naming-overlap.

When a Local Value's name is invoked, the Parser compiles-in its assigned integer as an argument to the appropriate function, which is compiled-in immediately after. The function will be a common one that will push onto the stack the address at which the numbered Local Value can be accessed. The Parser will then compile-in either the "fetch" function (   @   ) or the "store" function (   !   ), depending on whether the Local Value name was invoked by itself or in conjunction with the   ->   operator. This way the User/Programmer's view of Local Values' VALUE-style behavior is preserved.

The FORTH functions exit and ; (semicolon) have to be overloaded. (Section 13.3.3 of the ANSI document also mentions   ;CODE   and   DOES>   but these are not recognized by the Tokenizer, so we will not discuss them here.) The overloaded definitions must take special action at compile-time (note that   ;   -- semicolon -- does that normally, anyway, but exit does not) to: compile-in the total number of Local Values as an argument to the appropriate function, which is compiled-in immediately afterwards, before completing their normal behavior. The function in this case will be the special one that releases the space that had been allocated for the Local Values, and restores the state of Local Values storage to the way the calling routine left it. Semicolon must also clear the area where the temporary compile-time definitions of the new local-names were created, rendering them inaccessible.

Run-Time Support Function names:

The names of the three functions that the Parser compiles-in must be well-known and documented, so that they can be implemented and exported correctly in the Run-Time Support "Library" file.

The three functions' names are:

          {push-locals} ( #ilocals #ulocals -- )

          {pop-locals} ( total#locals -- )

          _{local} ( local-var# -- addr )

Example:

The following is an example of a word definition that declares and makes use of Local Values; it also shows inclusion of comments and continuation of the Declaration across multiple lines:

: faber ( m4 m3 n2 n1 n0 -- m4 m3 )
   {
      \ These are initialized values:
      _otter
   
  _weasel
   
  _skunk

      |

      \
These are uninitialized:
     
_muskrat
      _mole
   }

   
_skunk 40 *        -> _muskrat
   _muskrat alloc-mem -> _mole
   base @
   hex      _weasel (.) _mole place
   decimal   _otter (.) _mole $cat
   base !
   _mole count type
   
_mole _muskrat free-mem
;
\ Does nothing useful. Just an example.
\ BEGIN the declaration of Local Values.

\ _otter is initialized with the value of n2
\ _weasel is initialized with the value of n1
\ _skunk is initialized with the value of n0
\ and will be used to determine an amount of memory to allocate.
\ Vertical bar ends the group of Initialized Local Values.
\ NOTE: m4 and m3 stay on the stack.

\ _muskrat will take the final size of the allocation.

\ _mole will hold the address of the allocated memory
\ END the declaration of Local Values.

The compilation of faber starts with   3 2 {push-locals}   . The first invocation of _skunk (by itself) compiles as   2 _{local} @   and the sequence   -> _muskrat   compiles as   3 _{local) !  
Finally,  faber ends with   5 {pop-locals}   before the unnest . After that, the local-names are no longer accessible.

Run-Time Support -- the Local Values Support "Library":

An FCode program that makes use of the Local Values extension will need to incorporate an implementation of the three compiled-in functions named above, together with a collection of functions that support them. These support functions must be defined in a way such that they can be tokenized into Industry Standard FCode, conformant to the Open Firmware specification, without imposing any non-Standard requirements on the FCode interpreter of the Host Platform.

The obvious way to deliver this package of support functions would be to incorporate, into the FCode source being Tokenized, a Prologue or "Library" file that contains the definitions of the three above-named compiled-in functions, along with all their required support.

A file defining the Local Values Support Functions has been written and will be delivered as part of the implementation of this Project.  The user/programmer will be responsible for floading it into the FCode source program to be Tokenized.

The user/programmer has the option of specifying the placement of the Local Values Support Functions file within the body of the FCode source program, and even of making alterations to it, if needed.

Error handling: If the Local Values Support Functions file is not floaded, then the Parser, when it completes the processing of a Local Values declaration, i.e, when it encounters the close-curly-brace, or, similarly, when it encounters an invocation of a Local Value's name, will proceed as normal to compile-in the call to the appropriate function. That function's name will not be recognized, and the Tokenizer will exhibit the normal error-behavior for an invocation of an unrecognized name.

Operational Run-Time Support Functions:

To make Local Values operate, we will, of course, need to reserve an area of Backing Storage. The size of that area will be adjustable by the programmer, and we have chosen a suitable default.

We define a   locals-base   pointer that will point to the base -- within the reserved Local Values Storage Area -- of the set of Local Values currently in use; it will be initialized to point just past the end of the locals-storage area.

The address to which the   <n>   _{local}   routine will point is calculated as the given number of cells above the   locals-base  pointer.

The   ( #I-Ls #U-Ls -- )  {push-locals}   routine works in two stages: for the Unitialized Local Values, it simply decrements the   locals-base   pointer by the number of cells given in the top argument. The Initialized Local Values are then handled one at a time: the   locals-base   pointer is decremented by a single cell, and the data-item on top of the parameter stack is popped and stored into the cell at which the   locals-base   pointer now points. The result is that the topmost stack-item is placed in the last-declared Initialized Local, and so on down the line until the lowest stack-item is placed in the first-declared Initialized Local Value. Neat, sweet, and petite.

The   ( #-Ls -- )   {pop-locals}   routine simply increments the   locals-base   pointer by the given number of cells, which is the total number of Local Values used by the function in which it occurs.

Because functions that use Local Values can call each other, (i.e., the use of Local Values can be nested), the depth of the nesting might be unpredictable. Therefore, the   {push-locals}   routine must perform error-checking: Before decrementing the  locals-base  pointer, it must test whether doing so would put the pointer below the start of the area reserved for Local Values Storage. Such an error is inevitably fatal, and can only be handled by an   ABORT   occurring in conjunction with a warning message advising the programmer to increase the size of the Local Values Storage (and, by implication, re-Tokenize).

It will be the developer's responsibility to catch all such errors during early testing. To prevent generating hidden errors of this sort, the programmer is advised to use Local Values judiciously, and particularly to avoid using them in functions that may be called re-entrantly or recursively to an uncontrolled depth. Fortunately, such routines are rare and easily identified.

Additional help can be provided in the form of a second floadable Local Values Support Function source file -- to be used during development only -- that would overload the   {push-locals}   and   {pop-locals}   routines with the additional action of keeping track of  -- and, of course, displaying at will --  the maximum depth used in the course of a test run. Such overloading of functions is very simple and straightforward in FORTH.

Important note regarding floading of the Local Values Support Functions file

In order to simplify management of the allocation and de-allocation of the area of Backing Storage, and to assure independence among instances of a device-package, both the reserved Local Values Storage Area and the  locals-base pointer are created as part of the device-node's instance data.

The consequence of this is that, in device-drivers that are configured with multiple device-nodes, the Local Values Support Functions file must be re-floaded for each device-node that uses Local Values.  That is to say, every invocation of the new-device command creates a new device node; if that new device-node will be making use of  Local Values, then the Local Values Support Functions file must be floaded again.

The Tokenizer is sophisticated enough to keep a separate vocabulary for each device-node, and will flag an Error if Local Values are used in a device-node for which the Local Values Support Functions file has not been floaded.

However, should the user so choose, a means is available whereby a single floading of the Local Values Support Functions can become accessible to all Device Nodes in a driver, trading off economy of System-memory for convenience of programming.

Adjusting the size of Local Values Storage:

The user is responsible for declaring the maximum depth of the run-time Local Values stack, in storage units (Cells).

This may be accomplished either by:
The form of the Command-Line User-Symbol definition resembles:
   -d '_local-storage-size_=d# 42'
 (Be sure to enclose it within quotes so that the Shell treats it as a single string, and, of course, replace the  42  with the actual number you need...)

If   _local-storage-size_  is defined both ways, the Command-Line User-Symbol will prevail.

If the   _local-storage-size_  definition is omitted, the Local Values Support Functions file will supply a default.

Additional Support Function(s):

CATCH / THROW
Another way that a function might exit prematurely is via a call to throw  .

An FCode program that utilizes Local Values, that calls throw , and that has a corresponding catch to guard it, will need to keep its Local Values properly synchronized.

A throw done by an FCode program that does not have a corresponding catch to guard it will be caught outside the scope of that FCode program, and the question of synchronizing Local Values will be rendered irrelevant.

An overloaded catch in the Local Values Support Functions file does the job.

Constructing it was quite simple: It needs to (a) save the   locals-base   pointer onto the return stack, (b) do a system (generic) CATCH, and (c) restore the   locals-base   pointer. Counterintuitive though this might be, it does not even need to examine the result of the system (generic) CATCH ; it can restore the   locals-base   pointer in either case. If the result was zero (i.e., no throw occurred), the Local Values Pointer will be the same as it was when saved and restoring it will be harmless...

End Of Document