DIHYPH Hyphenation Interfacing
DHINT.C is an interface between textsystem and all DIHYPH-hyphenations.
It not only recodes textcharacters into hyphenation code, but also takes into
account all:
- language specialities
- typographic rules
- compound words
- special character handling
- letter standardization
- text-/typesetting-commands
Calling hyphenation
Before calling hyphenation, the following parameters have to be / may be set
by textsystem (see also "DHDEF.C" and "DHEXT.H" ):
Parameters to be set each time !
to describe the position of the "word" to be hyphenated within text array "line":
afc = index of first letter of "word".
eol = index of letter in "word" exceeding right margin (s
alc = index of last letter of "word".
To get all possible hyphens, set: eol >= alc;
Parameters that may be overwritten ! Default:
vs = Minimum length of first syllable. 2 *)
ns = Minimum length of last syllable. 2 *)
minwl = Minimum word length to be hyphenated 4 *)
spprm = Bit 1: 4711-Splitting allowed
2: "eol" = last hyphen position
dhpath[] = Directory name holding runtime files \dihyph\
exfile[] = Name(s) of special exception file(s)
codspac = RAM size for code-files DHCOnn 257L *)
tabspac = RAM size for one/more table-files DHTAnn 15000L *)
excspac = RAM size for largest exception catalogue 2000L *)
exdspac = RAM size for largest exception record 5000L *)
Language dependant: *)
All default values may be overwritten by user
- directly during text-editing (when possible in textsystem).
- by editing file 'DHDFLT.CFG' before calling DIHYPH.
Textsystem calls hyphenation by:
rc = DHnn (line); or better by:
rc = DHYPH(line, nn); *)
rc = 0: O.K.
-1: File "DHCOnn" or "DHTAnn" not found or wrong.
-3: Incorrect language-no.
line = Character array defined in text system
holding word to be hyphenated.
nn = language-no. (01 =German, 02 =English, etc.).
*) For this, "DHYPH" (Unicode: "DHYPHEUC") has to be linked
in as well, containing calls for all languages available.
Once installed every DIHYPH update version and/or every new language added
to the system is just a couple of disk files.
Compiling and linking of programs is not necessary for that.
Returning from hyphenation
three external parameters (defined in hyphenation) have to be evaluated by
textsystem (see: "quality ranking" and "Evaluation of array radr"):
had = integer index of "line" character to be shifted to next line
before inserting hyphen at this position
(had = 0: no hyphenation possible !).
hpw = see: Evaluation of array RADR.
ic = character holding letter to be inserted at "had".
Note: Index of first "line"-character is 0, etc.
Some examples: [ ] means "delete" ( ) means "insert"
Letter Return-
index: parameters:
0123456789 had hpw ic print line after H & J
---------- --- --- -- ---------------------------
aber 0 0 ...........................
aber.......................
Jo-Ann 3 1 ........................Jo-
Ann........................
Schiffahrt 5 2 f ..................Schif(f-)
fahrt......................
asszony 2 2 z .....................as(z-)
szony......................
Couve-Flor 5 3 - ...................Couve(-)
-Flor......................
Dackel 3 6 k ..................Da[c](k-)
kel........................
Hyphenation demonstration resulting from 8 different
"end-of-line" (right margin) conditions.
Sample-word... 4711(System.22)-NN./AB2'Processor
right : : : : : : : :
margin........... 1 2 3 4 5 6 7 8
:
: results:
: .................................
1 4711(System.22)-NN./AB2'Processor
2 .................................
4711(System.22)-NN./AB2'Processor
3 ........................4711(Sys-
tem.22)-NN./AB2'Processor........
4 ........................4711(Sys-
tem.22)-NN./AB2'Processor........
5 .................4711(System.22)-
NN./AB2'Processor................
6 .............4711(System.22)-NN./
AB2'Processor....................
7 .............4711(System.22)-NN./
AB2'Processor....................
8 ..4711(System.22)-NN./AB2'Proces-
sor..............................
Action Codes
Before entering DIHYPH hyphenation logic every text character is automatically
converted into language specific DIHYPH Action Code by interface program
DHINT.C using one of following code files:
DHnnACOD.C DHnnCO.C or DHCOnn (nn=language-no.).
Text Code Action-Code (hex.)
letters: 01 - 1E depending on language
others :
Ignored character 00
Space 20
' Apostrophe 22
* Asterix 23
- / Hyphen characters 24
Forbidden hyphen 25
+ Plus 26
# Number sign 27
. Point 28
, Colon 29
( Bracket on 2A
) Bracket off 2B
Wanted hyphen 2C
0 - 9 Numbers 30 - 39
` Accent grave 41
' Accent acute 42
Accent dieresis 43
Accent angstrom 44
other accents 00
all other characters 21
Code Files
DHCOnn are ASCII / ANSI disk files read in during runtime
( changeable by user ! ).
DHnnACOD.C are used only in older DIHYPH versions !
DHnnCO.C are compiled and linked in.
DHCOnn
code files
(nn =language no.) have a specific construction:
Lines starting with:
1) Blank or --- are treated as comment lines
1.1) ---A Any non-hyphen-character in position four of first line
means, letters A to Z and a to z are standard.
ASCII (hex. 41 - 5A and 61 - 7A). Else see point 3.1 .
2) ' (Apostrophe) have a maximum length of 25 characters each.
These lines have to start and end with ' apostrophes,
the text between them is error text used in exception
dictionary programs.
This text may be translated into other languages.
Construction of action code
3) UUAA An Action Code is represented by 4 hexadecimal digits.
Lower two hex. digits (AA) are described above
(see: Action Code).
3.1) Higher two hex. digits (UU) are used for 'special'
lower-/uppercase conversion:
Every lowercase non-standard-letter holds hexa.
position of corresponding uppercase letter in first
two hex. digits (UU).
Sample Action Code 8E01 means:
Letter ä gets Action Code 01 and is converted to
letter Ä on hex. position 8E (see: DHCO02 table
next page !).
Note
Special command codes from word composition systems are not allowed in
DHCOnn files !
Sample code file DHCO02
---ASCII-(a-z)-----------------------------------------------------------------
DIHYPH- DHCO02: Oxford-English code table
0 1 2 3 4 5 6 7 8 9 A B C D E F
-------------------------------------------------------------------------------
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0021 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
Sp ! " # $ % & ' ( ) * + , - . /
0020 0021 0021 0027 0021 0021 0021 0022 002A 002B 0023 0026 0029 0024 0028 0024
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0021 0031 0032 0033 0034 0035 0036 0037 0038 0039 0021 0021 0021 0021 0021 0021
@ A B C D E F G H I J K L M N O
0021 0001 0008 0009 000A 0002 000B 000C 000D 0003 000E 000F 0010 0011 0012 0004
P Q R S T U V W X Y Z [ \ ] ^ _
0013 0014 0015 0016 0017 0005 0018 0019 001A 0006 001B 0021 0021 0021 0021 0021
' a b c d e f g h i j k l m n o
0022 0001 0008 0009 000A 0002 000B 000C 000D 0003 000E 000F 0010 0011 0012 0004
p q r s t u v w x y z { | } ~
0013 0014 0015 0016 0017 0005 0018 0019 001A 0006 001B 0021 0021 0021 0021 0021
0009 9A05 9002 0001 8E01 0001 8F01 8009 0002 0002 0002 0003 0003 0003 0001 0001
0002 9201 0001 0004 9904 0004 0005 0005 0006 0004 0005 0009 0021 0021 0021 0021
0001 0003 0004 0005 A512 0012 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
sz
0021 E116 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021 0021
-------------------------------------------------------------------------------
' Word is too long '
' Incorrect character '
' Incorrect word-start '
' Word is too short '
-------------------------------------------------------------------------------
Hyphenation quality ranking
DIHYPH hyphenations are not only able to hyphenate with highest accuracy
but are also able to return ranking of every hyphen.
Although grammatically correct, some hyphenations are much better than others.
So very often it is better not to select hyphen next to right margin but another
one if it's ranking is better.
In addition a hyphen ranking better than 4 in all probability is not an incorrect one.
Note: Do not look for best rankings only !
Hyphen ranking is either defined by algorithm and program tables or may be
inserted as numbers 1 to 5 in exception dictionary.
Some examples showing hyphenation rankings
( 1, 2 = good 3 = quite good 4 = acceptable 5 = bad):
DIHYPH hyphenation with ranking is much better than without ranking
---------------------------------------------- --------------------
English:
auto-1mo-5bile auto-mobile automo-bile
chemo-1ther-4apy chemo-therapy chemother-apy
ex-1ca-5vate ex-cavate exca-vate
micro-1or-5gan-4ism micro-organism microor-ganism
mid-1sum-4mer mid-summer midsum-mer
mis-1in-5formed mis-informed misin-formed
mon-1ox-5ide mon-oxide monox-ide
per-2se-5cute per-secute perse-cute
French:
bis-1an=5nuel bis-annuel bisan-nuel
cis-1al=5pine cis-alpine cisal-pine
co-2ad-5ju=4teur co-adjuteur coad-juteur
trans-1al=5pine trans-alpine transal-pine
German:
ent-1ge-5gen-1tre-4ten ent-gegentreten entge-gentreten
Fahr-1er-5laub-4nis Fahr-erlaubnis Fahrer-laubnis
Non-4nen-2klo-4ster Nonnen-kloster Nonnenklo-ster
See-1ad-5ler See-adler Seead-ler
Volks-1or-5che-4ster Volks-orchester Volksor-chester
wohl-1er-5ge-4hen wohl-ergehen wohler-gehen
Zi-4vil-1an-5zug Zivil-anzug Zivilan-zug
Evaluation of array RADR
on return from hyphenation
Every letter (and combined-word-hyphen) is described within RADR by one
"RADR word" (= 2 integer fields) starting with RADR field 1, first field (iiii) holding
position of letter relative to start of word (afc + iiii), second field holding hyphen-
bits (h), hyphen ranking (q) and letter possibly to be inserted (ic).
RADR-field 0 (bbbb) is index to RADR-field that holds parameters forhyphenation
next to right margin (see values: had, hpw, ic)..
Variable "CAP" is index to end of "RADR"-array.
Following example demonstrates connection between textword and RADR-array,
dots (...) in example word symbolizing possible textcommands or characters
ignored by hyphenation.
Meaning of "radr[nn]" int-words:
int radr[nn]
nn = 00 01 02 03 04 05 06 ... "cap"
bbbb iiii hqic iiii hqic iiii hqic
radr-word description:
---------------------------------------------------------------------
bbbb index to radr word holding hyphenation | one
next to right text-margin | int-word
---------------------------------------------------------------------
iiii index to text character |
|
-------- | one text-
hqic h =hyphenation bits: | character
0000 insert hyphen, split (standard) | field
0001 no hyphen, split (compound) |
0011 insert "ic" but no hyphen, split | =
0010 insert "ic", insert hyphen, split |
0100 erase letter, insert hyphen, split | two
1xxx hyphen from exception dictionary | integer
| words.
q =hyphenation quality (ranking 1-5) |
|
ic =character to be inserted (00 = no) |
---------------------------------------------------------------------
Sample word: . . D a c k e l - . . S c h i f f a h r t e n . . .
Index: 00 02 04 08 0B 0F 13 16
sample m e a n i n g
word nn nn radr[nn] erase: insert: quality:
----- -- -- ---- ---- ------ ------- --------
bbbb:
00 0013
iiii hqic:
D 01 02 0002 0000
a 03 04 0003 0000
c 05 06 0004 646B c k - 4
k 07 08 0005 0000
e 09 0A 0006 0000
l 0B 0C 0007 0000
- 0D 0E 0008 1100 1
S 0F 10 000B 0000
c 11 12 000C 0000
h 13 14 000D 0000
i 15 16 000E 0000
f 17 18 000F 2166 f - 1
f 19 1A 0010 0000
a 1B 1C 0011 0000
h 1D 1E 0012 0000
r 1F 20 0013 0400 - 4
t 21 22 0014 0000
e 23 24 0015 0000
n 25 26 0016 0000
---- ----------------
27 = 'cap'
Resulting
hyphenation: . . D a k- k e l - . . S c h i f f- f a h r- t e n . . .
1 2 3 4
1 erase 'c',
insert 'k', insert '-', split behind '-' of quality 4.
2 split behind '-' of quality 1.
3 insert 'f', insert '-', split behind '-' of quality 1.
4 insert '-', split behind '-' of quality 4.
Attention:
As you may see from example above hyphenation bit combinations are possible !
Contact