Interpreting a Sentence
An Implementation of J
;: 'sum =:+/_6.95*i.3 4' ┌───┬──┬─┬─┬─────┬─┬──┬───┐ │sum│=:│+│/│_6.95│*│i.│3 4│ └───┴──┴─┴─┴─────┴─┴──┴───┘ The source code for word formation is in the files w*.c. The process is controlled by the function wordil (word index and length) and the table state. Rows of state correspond to 10 states; columns to 9 character classes. Each table entry is a (new state, function) pair. Starting at state S, a sentence is scanned from left to right one character at a time; the table entry corresponding to the current state and character class is applied.
New State/Function States Character ClassesXN S AN NN AN 9N XN XN QN S Space X Other XI SI AI NI AI 9I X X QI X Other S Space XI SI A A A A X X QI A Alphanumeric A Letters excl. NB XI SI A A NB A X X QI N N N The letter N XI SI A A A A NZ X QI NB NB B The letter B Z Z Z Z Z Z X X Z NZ NB. 9 Digits and _ XI SI 9 9 9 9 9 X QI 9 Numeric D . Period Q Q Q Q Q Q Q Q QQ Q Quote C : Colon XI SI AI NI AI 9I XI XI Q QQ Even Quotes Q ' Quote Z Z Z Z Z Z Z Z Z Z Trailing Comment X S A N B 9 D C Q Function I j=:i [ Emit(j,i-1) N j=:iEmit(j,i-1) produces a pair of indices delimiting a word in the string. i is the current index, and j is an internal register; if the current word is a number immediately following a numeric list (one or more numbers), Emit combines their indices to form a single word. At the end of the string, Emit(j,i-1) is executed.
As an example, this process is applied to sum =:+/_6.95*i.3 4, the sentence used above. In the following table, the columns are: index, character in the string, the (current state, character class) pair, the (new state, function) pair, and the action. For example, the first step is step 0, the letter is s, the current (and initial) state is S, and the character class is A. From the table, the entry in row S and column A is AN, meaing the new state is A and the function code is N. The action assigns 0 to j.
State / | New State / | ||||
i | Char | Char Class | Function | Action | |
0 | s | S A | AN | j=:0 | |
1 | u | A A | A | ||
2 | m | A A | A | ||
3 | A S | SI | j=:3 [ Emit(0,2) | sum | |
4 | = | S X | XN | j=:4 | |
5 | : | X C | X | ||
6 | + | X X | XI | j=:6 [ Emit(4,5) | =: |
7 | / | X X | XI | j=:7 [ Emit(6,6) | + |
8 | _ | X 9 | 9I | j=:8 [ Emit(7,7) | / |
9 | 6 | 9 9 | 9 | ||
10 | . | 9 D | 9 | ||
11 | 9 | 9 9 | 9 | ||
12 | 5 | 9 9 | 9 | ||
13 | * | 9 X | XI | j=:13 [ Emit(8,12) | _6.95 |
14 | i | X A | AI | j=:14 [ Emit(13,13) | * |
15 | . | A D | X | ||
16 | 3 | X 9 | 9I | j=:16 [ Emit(14,15) | i. |
17 | 9 S | SI | j=:17 [ Emit(16,16) | 3 | |
18 | 4 | S 9 | 9N | j=:18 | |
19 | Emit(18,18) | 4 |
... #define CLPAR '(' /* 40 050 28 */ #define CRPAR ')' /* 41 051 29 */ #define CSTAR '*' /* 42 052 2a */ #define CPLUS '+' /* 43 053 2b */ ... #define CASGN '\200' /* 128 200 80 =. */ #define CGASGN '\201' /* 129 201 81 =: */ #define CFLOOR '\202' /* 130 202 82 <. */ #define CMIN '\202' /* 130 202 82 <. */ #define CLE '\203' /* 131 203 83 <: */ #define CCEIL '\204' /* 132 204 84 >. */ #define CMAX '\204' /* 132 204 84 >. */ #define CGE '\205' /* 133 205 85 >: */ ...Using mnemonics such as CPLUS and CASGN instead of '+' and '\200' makes the source code more readable and more amenable to automatic manipulation.
static C spell[3][68]={ '=', '<', '>', '_', '+', '*', ..., CASGN, CFLOOR, CCEIL, 1, CPLUSDOT,CSTARDOT, ..., CGASGN, CLE, CGE, CFCONS, CPLUSCO, CSTARCO, ..., };For example, the first column specifies that =. has the ID CASGN (assignment) and =: the ID CGASGN (global assignment).
Parsing
Parsing occurs after word formation and is controlled by
function parse and table cases in file p.c. cases
is a direct translation of the parse table in Section II E of the dictionary:
#define AVN ( ADV+VERB+NOUN) #define CAVN (CONJ+ADV+VERB+NOUN) #define EDGE (MARK+ASGN+LPAR) PT cases[] = { EDGE, VERB, NOUN, ANY, monad, ..., 1,2, ..., EDGE+AVN, VERB, VERB, NOUN, monad, ..., 2,3, ..., EDGE+AVN, NOUN, VERB, NOUN, dyad, ..., 1,3, ..., EDGE+AVN, VERB+NOUN, ADV, ANY, adv, ..., 1,2, ..., EDGE+AVN, VERB+NOUN, CONJ, VERB+NOUN, conj, ..., 1,3, ..., EDGE+AVN, VERB, VERB, VERB, trident, ..., 1,3, ..., EDGE, CAVN, CAVN, CAVN, trident, ..., 1,3, ..., EDGE, CAVN, CAVN, ANY, bident, ..., 1,2, ..., NAME+NOUN, ASGN, CAVN, ANY, is, ..., 0,2, ..., LPAR, CAVN, RPAR, ANY, punc, ..., 0,2, ..., };The sentence to be parsed is prefaced with a marker and placed on a queue, and as parsing proceeds words are moved from the right end of the queue onto a stack. The classes of the first four words on the stack are compared to the patterns in columns 0 to 3 of cases. The first row matching in all four columns is selected; the action in column 4 is applied to the words on the stack indicated by the inclusive indices in columns 8 and 9, with the result replacing those words. If none of the rows match, the word at the end of the queue is moved onto the stack by the function move. Scanning for a matching pattern then begins anew. The process terminates when the queue is empty and none of the rules are applicable. At that time, the stack should have exactly two words: the marker and a noun, verb, adverb, or conjunction; anything else signals syntax error.
§((i.#y)=i.~y)#y original sentence §((i.#y)=i.~y)# 'aba' 13 move §((i.#y)=i.~y) #'aba' 13 move §((i.#y)=i.~y )#'aba' 13 move §((i.#y)=i.~ 'aba')#'aba' 13 move §((i.#y)=i. ~'aba')#'aba' 13 move §((i.#y)= i.~'aba')#'aba' 13 move §((i.#y) =i.~'aba')#'aba' 13 move §((i.#y) =v0 'aba')#'aba' 3 adv v0=: i.~ §((i.#y )=v0 'aba')#'aba' 13 move §((i.# 'aba')=v0 'aba')#'aba' 13 move §((i. #'aba')=v0 'aba')#'aba' 13 move §(( i.#'aba')=v0 'aba')#'aba' 13 move §( (i.#'aba')=v0 'aba')#'aba' 13 move §( (i.3)=v0 'aba')#'aba' 1 monad 3 -: #'aba' §( (0 1 2)=v0 'aba')#'aba' 0 monad 0 1 2 -: i.3 §( 0 1 2=v0 'aba')#'aba' 12 punc §( 0 1 2=0 1 0)#'aba' 1 monad 0 1 0 -: v0 'aba' § (0 1 2=0 1 0)#'aba' 13 move § (1 1 0)#'aba' 2 dyad 1 1 0 -: 0 1 2=0 1 0 § 1 1 0#'aba' 12 punc §1 1 0#'aba' 13 move §'ab' 2 dyad
Trains
A train is an isolated phrase not interpreted by the parsing
rules pertaining to verbs, adverbs, and conjunctions, and
(as a matter of language design) may be assigned any meaning whatsoever.
Iverson and McDonnell [1989]
defined a train of three verbs as a
fork and a train of
two verbs as a hook. That is,
if f, g, and h are verbs, then
so are (f g h) and (g h), and:
Fork Hook g g g g / \ / \ / \ / \ f h f h y h x h | | /\ /\ | | y y x y x y y yParsing rules 5, 6, and 7 deal with trains. (See Parsing.) A consequence of the rules is that a train of verbs is resolved by repeated forming a fork from the rightmost three verbs, with a final hook if the train is of even length. Likewise, a train of adverbs and conjunctions is assigned a meaning, and is resolved by repeatedly forming a group from the leftmost three adverbs or conjunctions, with a final group of two if the train is of even length. Trains are implemented by functions and variables of file cf.c. The main functions are folk and hook. (fork conflicts with UNIX usage.)
Name Resolution
During parsing, words are moved from the queue to the stack.
Suppose a name xyz is being moved.
If xyz is immediately to the left of a copula, it (as a name)
is put on the stack. Otherwise, if xyz denotes a noun,
that noun is put on the stack; if xyz denotes a verb,
adverb, or conjunction, 'xyz'~ is put on the stack,
to be evaluated when the verb, adverb, or conjunction is applied.
Names and their assigned values are stored in symbol tables.
A symbol table is an array of type SYMB whose atoms
are pairs (name,value). Functions and variables in the files s*.c work with
symbols tables. In particular, symbis(a,w,symb) assigns
the name a to w in the symbol table symb,
and symbrd(w) "reads" the value of the name w.