OpenBIOS DeTokenizer detok

(A User's Guide)

Table of Contents

  1. Overview
  2. Output Formats
    1. Sample source file:
    2. DeTokenizer output with no options selected:
    3. DeTokenizer output with the "verbose" option selected:
    4. DeTokenizer output with the "offsets" option selected:
    5. DeTokenizer output with both the "verbose" and "offsets" options selected:
  3. Command-Line Format
    1. Command-Line Options
      1. Switches
      2. The "Additional FCodes" file
        1. Special Functions
  4. End Of Document

Overview

The DeTokenizer is an adjunct tool to the Tokenizer, and performs the reverse function, in a fashion.  That is, it converts binary FCode into a form that can be read for purposes of verification.  This implementation is not, however, a complete DeTokenizer in the sense of one whose output can be turned back through the Tokenizer to generate the same binary.  Such programs might exist, but this is not one of them.

Output Formats

The output of this DeTokenizer basically consists of one token (sometimes two) per line, with optional additional information, depending on which Command-Line Options have been specified.

Sample source file:

The easiest way to describe the different optional output formats would be by creating an example of a source file that has been Tokenized and displaying the output of the DeTokenizer, applied to its resultant FCode binary, with the various options.

Our example source file looks like this:

\  Demo program for DeTokenizer output format display.

tokenizer[
h# 17d5 \ Vendor ID: 0x17d5
h# 5417 \ Device ID: 0x5417
h# 020000 \ Class Code: 0x020000 (Ethernet)
h# f2ed \ Rev-Level
]tokenizer
SET-REV-LEVEL
pci-header

fcode-version2

headers
hex

: hello/goodbye ( hello? -- )
if
." Hello, you big beautiful world!"
else
." Goodbye, cruel world. Gggga-a-a-ackkkkk!"
then
;

: I-say-hello true hello/goodbye ;
: You-say-goodbye false hello/goodbye ;


fcode-end

DeTokenizer output with no options selected:


With no options selected, the DeTokenizer output looks like this:

\  PCI Header identified
\ Offset to Data Structure = 0x001c (28)
\ PCI Data Structure identified
\ Data Structure Length = 0x0018 (24)
\ Vendor ID: 0x17d5
\ Device ID: 0x5417
\ Class Code: 0x020000 (Ethernet controller)
\ Image Revision: 0xf2ed
\ Code Type: 0x01 (Open Firmware)
\ Image Length: 0x0001 blocks (512 bytes)
\ Last PCI Image.
start1 ( 16-bit offsets)
format: 0x08
checksum: 0x33d5 (Ok)
len: 0x009e ( 158 bytes)
named-token hello/goodbye 0x800
b(:)
b?branch 0x0028 ( =dec 40)
b(") ( len=0x1f [31 bytes] )
" Hello, you big beautiful world!"
type
bbranch 0x0030 ( =dec 48)
b(>resolve)
b(") ( len=0x29 [41 bytes] )
" Goodbye, cruel world. Gggga-a-a-ackkkkk!"
type
b(>resolve)
b(;)
named-token I-say-hello 0x801
b(:)
-1
hello/goodbye
b(;)
named-token You-say-goodbye 0x802
b(:)
0
hello/goodbye
b(;)
end0
\ Detokenization finished normally after 158 bytes.
\ PCI Image padded with 302 bytes of zero

DeTokenizer output with the "verbose" option selected:

The "verbose" option adds a display of the hex value of each token processed, (as well as a signature block), thus:

\  Welcome to the OpenBIOS detokenizer v0.6.1
\ detok Copyright(c) 2001-2005 by Stefan Reinauer.
\ Written by Stefan Reinauer, <stepan@openbios.org>
\ This program is free software; you may redistribute it under the terms of
\ the GNU General Public License. This program has absolutely no warranty.
\
\ (C) Copyright 2005 IBM Corporation. All Rights Reserved.
\ PCI Header identified
\ Offset to Data Structure = 0x001c (28)
\ PCI Data Structure identified
\ Data Structure Length = 0x0018 (24)
\ Vendor ID: 0x17d5
\ Device ID: 0x5417
\ Class Code: 0x020000 (Ethernet controller)
\ Image Revision: 0xf2ed
\ Code Type: 0x01 (Open Firmware)
\ Image Length: 0x0001 blocks (512 bytes)
\ Last PCI Image.
start1 ( 0x0f1 ) ( 16-bit offsets)
format: 0x08
checksum: 0x33d5 (Ok)
len: 0x009e ( 158 bytes)
named-token ( 0x0b6 ) hello/goodbye 0x800
b(:) ( 0x0b7 )
b?branch ( 0x014 ) 0x0028 ( =dec 40)
b(") ( 0x012 ) ( len=0x1f [31 bytes] )
" Hello, you big beautiful world!"
type ( 0x090 )
bbranch ( 0x013 ) 0x0030 ( =dec 48)
b(>resolve) ( 0x0b2 )
b(") ( 0x012 ) ( len=0x29 [41 bytes] )
" Goodbye, cruel world. Gggga-a-a-ackkkkk!"
type ( 0x090 )
b(>resolve) ( 0x0b2 )
b(;) ( 0x0c2 )
named-token ( 0x0b6 ) I-say-hello 0x801
b(:) ( 0x0b7 )
-1 ( 0x0a4 )
hello/goodbye ( 0x800 )
b(;) ( 0x0c2 )
named-token ( 0x0b6 ) You-say-goodbye 0x802
b(:) ( 0x0b7 )
0 ( 0x0a5 )
hello/goodbye ( 0x800 )
b(;) ( 0x0c2 )
end0 ( 0x000 )
\ Detokenization finished normally after 158 bytes.
\ PCI Image padded with 302 bytes of zero

DeTokenizer output with the "offsets" option selected:

The "offsets" option shows the position of the tokens relative to the start of the first FCode block after a PCI header (if one is present) and the destination-offset of each branch. If more than one FCode header follows a single PCI header, the offset-counter will continue; if a new PCI header is encountered, the offset-counter will be reset and will begin counting again from zero after the end of the latest PCI header.

Without the "verbose" option, i.e., with just the "offsets" option by itself, the DeTokenizer output looks like this:

\  PCI Header identified
\ Offset to Data Structure = 0x001c (28)
\ PCI Data Structure identified
\ Data Structure Length = 0x0018 (24)
\ Vendor ID: 0x17d5
\ Device ID: 0x5417
\ Class Code: 0x020000 (Ethernet controller)
\ Image Revision: 0xf2ed
\ Code Type: 0x01 (Open Firmware)
\ Image Length: 0x0001 blocks (512 bytes)
\ Last PCI Image.
0: start1 ( 16-bit offsets)
1: format: 0x08
2: checksum: 0x33d5 (Ok)
4: len: 0x009e ( 158 bytes)
8: named-token hello/goodbye 0x800
25: b(:)
26: b?branch 0x0028 ( =dec 40 dest = 67 )
29: b(") ( len=0x1f [31 bytes] )
" Hello, you big beautiful world!"
62: type
63: bbranch 0x0030 ( =dec 48 dest = 112 )
66: b(>resolve)
67: b(") ( len=0x29 [41 bytes] )
" Goodbye, cruel world. Gggga-a-a-ackkkkk!"
110: type
111: b(>resolve)
112: b(;)
113: named-token I-say-hello 0x801
128: b(:)
129: -1
130: hello/goodbye
132: b(;)
133: named-token You-say-goodbye 0x802
152: b(:)
153: 0
154: hello/goodbye
156: b(;)
157: end0
\ Detokenization finished normally after 158 bytes.
\ PCI Image padded with 302 bytes of zero

DeTokenizer output with both the "verbose" and "offsets" options selected:

Combining the "verbose" and "offsets" options results in something that looks like this:

\  Welcome to the OpenBIOS detokenizer v0.6.1
\ detok Copyright(c) 2001-2005 by Stefan Reinauer.
\ Written by Stefan Reinauer, <stepan@openbios.org>
\ This program is free software; you may redistribute it under the terms of
\ the GNU General Public License. This program has absolutely no warranty.
\
\ (C) Copyright 2005 IBM Corporation. All Rights Reserved.
\ PCI Header identified
\ Offset to Data Structure = 0x001c (28)
\ PCI Data Structure identified
\ Data Structure Length = 0x0018 (24)
\ Vendor ID: 0x17d5
\ Device ID: 0x5417
\ Class Code: 0x020000 (Ethernet controller)
\ Image Revision: 0xf2ed
\ Code Type: 0x01 (Open Firmware)
\ Image Length: 0x0001 blocks (512 bytes)
\ Last PCI Image.
0: start1 ( 0x0f1 ) ( 16-bit offsets)
1: format: 0x08
2: checksum: 0x33d5 (Ok)
4: len: 0x009e ( 158 bytes)
8: named-token ( 0x0b6 ) hello/goodbye 0x800
25: b(:) ( 0x0b7 )
26: b?branch ( 0x014 ) 0x0028 ( =dec 40 dest = 67 )
29: b(") ( 0x012 ) ( len=0x1f [31 bytes] )
" Hello, you big beautiful world!"
62: type ( 0x090 )
63: bbranch ( 0x013 ) 0x0030 ( =dec 48 dest = 112 )
66: b(>resolve) ( 0x0b2 )
67: b(") ( 0x012 ) ( len=0x29 [41 bytes] )
" Goodbye, cruel world. Gggga-a-a-ackkkkk!"
110: type ( 0x090 )
111: b(>resolve) ( 0x0b2 )
112: b(;) ( 0x0c2 )
113: named-token ( 0x0b6 ) I-say-hello 0x801
128: b(:) ( 0x0b7 )
129: -1 ( 0x0a4 )
130: hello/goodbye ( 0x800 )
132: b(;) ( 0x0c2 )
133: named-token ( 0x0b6 ) You-say-goodbye 0x802
152: b(:) ( 0x0b7 )
153: 0 ( 0x0a5 )
154: hello/goodbye ( 0x800 )
156: b(;) ( 0x0c2 )
157: end0 ( 0x000 )
\ Detokenization finished normally after 158 bytes.
\ PCI Image padded with 302 bytes of zero
There's another option called "line numbers" but it only numbers the lines of output.  It's easy enough to describe, and so needs no illustration.

Command-Line Format

The command-line format is simply:

detok [options] fc-file [fc-file ...]

The output of this DeTokenizer is directed to STDOUT, so there is no "Output file" option per se.  Simply redirect the output to the file in which you wish to keep the results, using the standard Shell conventions.

Command-Line Options

Command-Line option Switches are case-sensitive; only one option has an applicable argument, and that one is a file name.  Its case sensitivity is, of course, dependent on the Host Operating System.

Switches

Print a brief help message and then exit.
Verbose  --  display additional information:  the hex value of each token processed, as well as a signature block.
Offsets  --  display the positions of the tokens relative to the start of the first FCode block after a PCI header (if one is present), and the destination-offset of each branch.

Note that the combination of the Verbose and Offsets options yields the maximum amount of useful information. 

Line Numbers  --  display the sequential number of each line of output.

Note that the -n and -o options are mutually exclusive; if both are specified, -o will be favoured.

Process All input.  Do not stop when end0 has been encountered.  This option is usually not needed, but may be useful in cases where a file has been corrupted or when something very strange has been Tokenized...
Pre-load Additional FCodes before processing.  These might be, for instance, a set of vendor-specific FCodes that were generated for a specific vendor's products by a Tokenizer customized for that specific vendor.  A detailed discussion of the "Additional FCodes" file will be presented in a separate dedicated section.

The "Additional FCodes" file

Some vendors' FCode drivers contain non-standard FCode tokens.  In order to accommodate those situations, provision is made to specify the names of the FCodes in question.  The -f command-line option permits the user to specify an "Additional FCodes List" file, which will be read before detokenization begins and which will contain the list of "Additional FCodes" to be recognized.

The format of the file is as follows:

  1. One entry, consisting of an FCode and its name, on a line.  The FCode Number is given first, in the form of a hex number, preceded by an optional 0x or 0X  (Thus: 0x602 or 0X602 or simply  602 are all equivalent.)  At least one blank space separates the FCode Number from the Name, which must be on the same line.  Any number of blanks are permitted, and any text that follows the Name is permitted and will be ignored.
  2. Blank lines are permitted and will be ignored.
  3. Comment lines are permitted and will be ignored.  A comment-line starts with either a pound-sign ( # ) or a backslash ( \  ).
  4. FCode Numbers are limited to the range 0x10..0x7ff  Numbers smaller than 0x10 are the leading-byte of a two-byte FCode, and numbers from 0x800 and up are assigned by the tokenizer.  Lines with numbers outside the permitted range will be ignored, and a message will be printed.
  5. FCode numbers that are already assigned will not be permitted to be overwritten.   Lines with numbers that are already assigned will be ignored, and a message will be printed. 

If the file cannot be read, that will be regarded as an immediate failure and cause the program to exit.

Special Functions
In addition to non-standard FCode tokens with simple behavior, some vendors' FCode drivers also contain non-standard FCode tokens with complex behavior.  An example that was recently encountered is "double(lit)" which precedes a double-length (i.e., 64-bit) literal.  This DeTokenizer is structured to allow the creation of a list of pre-defined Special Function names, each of which has a special behavior associated with it.  When one of those names occurs in the "Additional FCodes List" file, it will be recognized; the FCode Number given with it is assigned to it.  When that FCode number is encountered, the assigned special behavior will be exercised.

Adding to the list of Special Function names, and associating a new behavior with the added function, requires modifying the DeTokenizer code, but the infrastructure that is already in place should make this a manageable task for even a modestly skilled programmer.

At the present writing, only one such Special Function name is supported, and that one is, of course,  double(lit)

Its associated special behavior is to collect the next eight bytes from the FCode input stream and display them as a double-length literal.

If you modify the DeTokenizer to recognize additional Special Function names, please update this document to list them and describe their special behaviors.  Thank you. 


End Of Document