DIHYPH Hyphenation Interfacing




DHINT.C is an interface between textsystem and all DIHYPH-hyphenations.
It not only recodes textcharacters into hyphenation code, but also takes into
account all:
- language specialities
- typographic rules
- compound words
- special character handling
- letter standardization
- text-/typesetting-commands



Calling hyphenation



Before calling hyphenation, the following parameters have to be / may be set
by textsystem (see also "DHDEF.C" and "DHEXT.H" ):


Parameters to be set each time !
to describe the position of the "word" to be hyphenated within text array "line":
afc      = index of first letter of "word".

eol      = index of letter in "word" exceeding right margin                                           (s

alc      = index of last  letter of "word".
           To get all possible hyphens, set:  eol >= alc;


Parameters that may be overwritten !                  Default:

vs       = Minimum length of first syllable.          2       *)

ns       = Minimum length of last  syllable.          2       *)

minwl    = Minimum word length to be hyphenated       4       *)

spprm    = Bit 1: 4711-Splitting allowed
               2: "eol" = last hyphen position

dhpath[] = Directory name holding runtime files       \dihyph\

exfile[] = Name(s) of special exception file(s)

codspac  = RAM size for code-files DHCOnn             257L    *)

tabspac  = RAM size for one/more table-files DHTAnn   15000L  *)

excspac  = RAM size for largest exception catalogue   2000L   *)

exdspac  = RAM size for largest exception record      5000L   *)

                                 Language dependant:          *)


All default values may be overwritten by user

- directly during text-editing (when possible in textsystem).

- by editing file 'DHDFLT.CFG' before calling DIHYPH.



Textsystem calls hyphenation by:

rc = DHnn (line);            or better by:

rc = DHYPH(line, nn);        *)

rc =  0:  O.K.
     -1:  File "DHCOnn" or "DHTAnn" not found or wrong.
     -3:  Incorrect language-no.

line   =  Character array defined in text system
          holding word to be hyphenated.

nn     =  language-no. (01 =German, 02 =English, etc.).

*)        For this, "DHYPH"  (Unicode: "DHYPHEUC")  has to be linked
          in as well, containing calls for all languages available.

Once installed every DIHYPH update version and/or every new language added
to the system is just a couple of disk files.
Compiling and linking of programs is not necessary for that.



Returning from hyphenation



three external parameters (defined in hyphenation) have to be evaluated by
textsystem (see: "quality ranking" and "Evaluation of array radr"):
had     = integer index of "line" character to be shifted to next line
          before inserting hyphen at this position
(had = 0: no hyphenation possible !). hpw = see: Evaluation of array RADR. ic = character holding letter to be inserted at "had".

Note: Index of first "line"-character is 0, etc.

Some examples:      [ ] means "delete"     ( ) means "insert"

Letter      Return-
index:      parameters:
0123456789  had hpw ic  print line after  H & J
----------  --- --- --  ---------------------------

aber        0   0       ...........................
                        aber.......................

Jo-Ann      3   1       ........................Jo-
                        Ann........................

Schiffahrt  5   2   f   ..................Schif(f-)
                        fahrt......................

asszony     2   2   z   .....................as(z-)
                        szony......................

Couve-Flor  5   3   -   ...................Couve(-)
                        -Flor......................

Dackel      3   6   k   ..................Da[c](k-)
                        kel........................


Hyphenation demonstration resulting from 8 different
"end-of-line" (right margin) conditions.
Sample-word... 4711(System.22)-NN./AB2'Processor
right             :  :      : : :  :   :      :
margin........... 1  2      3 4 5  6   7      8
:
: results:
:              .................................
1              4711(System.22)-NN./AB2'Processor

2              .................................
               4711(System.22)-NN./AB2'Processor

3              ........................4711(Sys-
               tem.22)-NN./AB2'Processor........

4              ........................4711(Sys-
               tem.22)-NN./AB2'Processor........

5              .................4711(System.22)-
               NN./AB2'Processor................

6              .............4711(System.22)-NN./
               AB2'Processor....................

7              .............4711(System.22)-NN./
               AB2'Processor....................

8              ..4711(System.22)-NN./AB2'Proces-
               sor..............................



Action Codes



Before entering DIHYPH hyphenation logic every text character is automatically
converted into language specific DIHYPH Action Code by interface program
DHINT.C using one of following code files:
DHnnACOD.C    DHnnCO.C    or    DHCOnn   (nn=language-no.).

Text Code                Action-Code (hex.)

letters:                       01 - 1E  depending on language

others :
       Ignored character       00
       Space                   20
'      Apostrophe              22
*      Asterix                 23
-  /   Hyphen characters       24
       Forbidden hyphen        25
+      Plus                    26
#      Number sign             27
.      Point                   28
,      Colon                   29
(      Bracket on              2A
)      Bracket off             2B
       Wanted hyphen           2C
0 - 9  Numbers                 30 - 39

`      Accent grave            41
'      Accent acute            42
     Accent dieresis         43
      Accent angstrom         44
       other  accents          00

       all other characters    21



Code Files


DHCOnn            are ASCII / ANSI disk files read in during runtime
                  ( changeable by user ! ).

DHnnACOD.C        are used only in older DIHYPH versions !


DHnnCO.C          are compiled and linked in.



DHCOnn code files

(nn =language no.) have a specific construction:


Lines starting with:
1)     Blank   or    ---   are treated as comment lines

1.1)   ---A    Any non-hyphen-character in position four of first line
               means, letters A to Z and a to z are standard.
               ASCII (hex. 41 - 5A and 61 - 7A).  Else see point 3.1 .


2)     '       (Apostrophe) have a maximum length of 25 characters each.
               These lines have to start and end with ' apostrophes,
               the text between them is error text used in exception
               dictionary programs.
               This text may be translated into other languages.


Construction of action code
3)    UUAA  An Action Code is represented by 4 hexadecimal digits.
            Lower two hex. digits  (AA) are described above
            (see: Action Code).

3.1)        Higher two hex. digits (UU) are used for 'special'
            lower-/uppercase conversion:
            Every lowercase non-standard-letter holds hexa.
            position of corresponding uppercase letter in first
            two hex. digits  (UU).

            Sample Action Code  8E01  means:
            Letter  ä  gets Action Code  01  and is converted to
            letter  Ä  on hex. position  8E  (see: DHCO02 table
            next page !).

Note
Special command codes from word composition systems are not allowed in
DHCOnn files !



Sample code file DHCO02


---ASCII-(a-z)-----------------------------------------------------------------
   DIHYPH- DHCO02:  Oxford-English  code table
   0  1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
-------------------------------------------------------------------------------

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
                          
0000 0000 0000 0000 0000 0021 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
  Sp  !    "    #    $    %    &    '    (    )    *    +    ,    -    .    /
0020 0021 0021 0027 0021 0021 0021 0022 002A 002B 0023 0026 0029 0024 0028 0024
   0  1    2    3    4    5    6    7    8    9    :    ;     <    =    >   ?
0021 0031 0032 0033 0034 0035 0036 0037 0038 0039 0021 0021 0021 0021 0021 0021
   @  A    B    C    D    E    F    G    H    I    J    K    L    M    N    O
0021 0001 0008 0009 000A 0002 000B 000C 000D 0003 000E 000F 0010 0011 0012 0004
   P  Q    R    S    T    U    V    W    X    Y    Z    [    \    ]    ^    _
0013 0014 0015 0016 0017 0005 0018 0019 001A 0006 001B 0021 0021 0021 0021 0021
   '  a    b    c    d    e    f    g    h    i    j    k    l    m    n    o
0022 0001 0008 0009 000A 0002 000B 000C 000D 0003 000E 000F 0010 0011 0012 0004
   p  q    r    s    t    u    v    w    x    y    z    {    |    }    ~
0013 0014 0015 0016 0017 0005 0018 0019 001A 0006 001B 0021 0021 0021 0021 0021
                                                             
0009 9A05 9002 0001 8E01 0001 8F01 8009 0002 0002 0002 0003 0003 0003 0001 0001
                                                             
0002 9201 0001 0004 9904 0004 0005 0005 0006 0004 0005 0009 0021 0021 0021 0021
                                                                 
0001 0003 0004 0005 A512 0012 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021

0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021

0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021

0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
      sz                                                   
0021 E116 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
                                                  
0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
-------------------------------------------------------------------------------
' Word is too long '
' Incorrect character '
' Incorrect word-start '
' Word is too short '
-------------------------------------------------------------------------------



Hyphenation quality ranking



DIHYPH hyphenations are not only able to hyphenate with highest accuracy
but are also able to return ranking of every hyphen.

Although grammatically correct, some hyphenations are much better than others.
So very often it is better not to select hyphen next to right margin but another
one if it's ranking is better.
In addition a hyphen ranking better than 4 in all probability is not an incorrect one.

Note: Do not look for best rankings only !

Hyphen ranking is either defined by algorithm and program tables or may be
inserted as numbers 1 to 5 in exception dictionary.

Some examples showing hyphenation rankings
( 1, 2 = good      3 = quite good      4 = acceptable      5 = bad):

DIHYPH hyphenation with ranking is much better  than without ranking
----------------------------------------------  --------------------

English:
auto-1mo-5bile           auto-mobile            automo-bile
chemo-1ther-4apy         chemo-therapy          chemother-apy
ex-1ca-5vate             ex-cavate              exca-vate
micro-1or-5gan-4ism      micro-organism         microor-ganism
mid-1sum-4mer            mid-summer             midsum-mer
mis-1in-5formed          mis-informed           misin-formed
mon-1ox-5ide             mon-oxide              monox-ide
per-2se-5cute            per-secute             perse-cute

French:
bis-1an=5nuel            bis-annuel             bisan-nuel
cis-1al=5pine            cis-alpine             cisal-pine
co-2ad-5ju=4teur         co-adjuteur            coad-juteur
trans-1al=5pine          trans-alpine           transal-pine

German:
ent-1ge-5gen-1tre-4ten   ent-gegentreten        entge-gentreten
Fahr-1er-5laub-4nis      Fahr-erlaubnis         Fahrer-laubnis
Non-4nen-2klo-4ster      Nonnen-kloster         Nonnenklo-ster
See-1ad-5ler             See-adler              Seead-ler
Volks-1or-5che-4ster     Volks-orchester        Volksor-chester
wohl-1er-5ge-4hen        wohl-ergehen           wohler-gehen
Zi-4vil-1an-5zug         Zivil-anzug            Zivilan-zug



Evaluation of array RADR

on return from hyphenation


Every letter (and combined-word-hyphen) is described within RADR by one
"RADR word" (= 2 integer fields) starting with RADR field 1, first field (iiii) holding
position of letter relative to start of word (afc + iiii), second field holding hyphen-
bits (h), hyphen ranking (q) and letter possibly to be inserted (ic).

RADR-field 0 (bbbb) is index to RADR-field that holds parameters forhyphenation
next to right margin (see values: had, hpw, ic)..
Variable "CAP" is index to end of "RADR"-array.

Following example demonstrates connection between textword and RADR-array,
dots (...) in example word symbolizing possible textcommands or characters
ignored by hyphenation.
Meaning of  "radr[nn]"  int-words:

int radr[nn]
         nn =  00     01   02     03   04     05   06 ... "cap"
              bbbb   iiii hqic   iiii hqic   iiii hqic

radr-word  description:
---------------------------------------------------------------------
 bbbb      index to radr word holding hyphenation         | one
           next to right text-margin                      | int-word
---------------------------------------------------------------------
 iiii      index to text character                        |
                                                          |
--------                                                  | one text-
 hqic      h   =hyphenation bits:                         | character
                 0000  insert hyphen, split (standard)    | field
                 0001    no   hyphen, split (compound)    |
                 0011  insert "ic" but no hyphen, split   | =
                 0010  insert "ic", insert hyphen, split  |
                 0100  erase letter, insert hyphen, split | two
                 1xxx  hyphen from exception dictionary   | integer
                                                          | words.
            q  =hyphenation quality (ranking 1-5)         |
                                                          |
           ic  =character to be inserted (00 = no)        |
---------------------------------------------------------------------


Sample word:   . . D a c k e l - . . S c h i f f a h r t e n . . .
Index:        00  02   04     08    0B      0F      13    16


sample                       m e a n i n g
word   nn nn  radr[nn]   erase: insert: quality:
-----  -- --  ---- ----  ------ ------- --------
              bbbb:
       00     0013

              iiii hqic:
  D    01 02  0002 0000
  a    03 04  0003 0000
  c    05 06  0004 646B     c     k -      4
  k    07 08  0005 0000
  e    09 0A  0006 0000
  l    0B 0C  0007 0000
  -    0D 0E  0008 1100                    1
  S    0F 10  000B 0000
  c    11 12  000C 0000
  h    13 14  000D 0000
  i    15 16  000E 0000
  f    17 18  000F 2166           f -      1
  f    19 1A  0010 0000
  a    1B 1C  0011 0000
  h    1D 1E  0012 0000
  r    1F 20  0013 0400             -      4
  t    21 22  0014 0000
  e    23 24  0015 0000
  n    25 26  0016 0000
----   ----------------
       27  =  'cap'



Resulting
hyphenation:   . . D a k- k e l - . . S c h i f f- f a h r- t e n . . .
                       1        2               3        4  

1   erase  'c',
    insert 'k',   insert '-',   split behind '-' of quality 4.
2                               split behind '-' of quality 1.
3   insert 'f',   insert '-',   split behind '-' of quality 1.
4                 insert '-',   split behind '-' of quality 4.

Attention:
As you may see from example above hyphenation bit combinations are possible !




Contact