DIHYPH / DITECT Exception Dictionary
D I H Y P H exception writing
An exception may have a minimum length of 3 and a maximum length of 49 characters.
Only letters (no special characters !) are permissible in the first two positions
of an exception, except - and ' which are accepted in 2.-position
(e.g., A-bend or c'est ).
The abbreviation asterix * may only be placed at the end of a word.
Exceptions ending on an asterix * mean that up to the * every hyphenation is
provided as specified. All the succeeding characters of the word are processed by
hyphenation logic starting at the last hyphen found in the exception.
Example: lau-f* ==> lauf lau-fe lau-fen lau-fend lau-fen-den
but: ==> lau-fsper-re (and other errors !!)
To avoid errors like those created by exception writing, in exceptions with less
than two letters between the last hyphen and the concluding *, the program logic
of some languages will check whether these syllable-start-letters are allowed.
If not, the last hyphen is corrected automatically:
lau-fsperre ==> lauf-sperre (has been corrected)
Example: Kam-bo-d* ==> Kam-bod-scha (has been corrected)
but: Kam-bo-ds* ==> Kam-bo-dscha (has been accepted )
An exception not ending with an * will be found if it is equal, or up to three
characters shorter than the word to be hyphenated.
Example: lau-f ==> lauf lau-fe lau-fen lau-fend.
but not: laufende (4 characters longer !).
The length of an exception ending with ) must be equal to word length.
Example: pre-cede) ==> pre-cede
but not: prec-e-dence, pre-ced-ent
The following characters are permitted in exceptions
Letters:
xXyY All characters are allowed in both upper and lower case.
Special characters:
* exception abbreviation : lau-f* ==> lau-fen-den
) exception length fixed : lauf) ==> lauf but: lau-fe
- hyphen : Tren-nun-gen ==> Tren-nun-gen
+ hyphen with addition : Voll+a* ==> Voll-la-den...
# hyphen with elimination: Taxie#tje ==> Taxi-tje
. any vowel : Zei-t.n ==> Zei-ten Zei-tung
, any consonant : lo-,en ==> lo-ben lo-ten
' apostrophe : c'est
Numbers 1 to 5 may be inserted instead of, or following the hyphen
in order to define the hyphenation and its quality ranking.
Example: auto1mo5bile
or: auto-1mo-5bile
Hyphen characters + and # may not be replaced by quality ranking numbers.
Definition of their quality ranking is done by logic automatically !
Example
exception produces following word splitting
------------------------------------------------
vill+ad vill-laden vill-ladung ( 3rd L is added)
but: vil-la-den-de ( > 3 characters follow !)
vel-la-den ( other spelling !)
vall+ad* vall-la-den-der ( 3rd L is added,
remainder hyph. by logic)
taxie#tje taxi-tje ( elimination of e )
but: te-xiet-je ( other spelling !)
st.,,- stopp-end Stach-eln
but: spuc-ken ( other spelling !)
( German c-k is changed
to k-k by textsystem !)
The special hyphen signs + and # are language-dependent !
Letter addition + only operates for languages with corresponding grammatical
rules e.g. in German.
Character elimination # only works in Dutch.
D I T E C T exception writing
Correct initial capital and small letter writing is important:
In the 1. position of words only letters are allowed.
In the 2. position of words or later seven other characters are allowed:
Blank - / * # . '
With the exception of an asterix (*), dot (.) and slash (/), special characters
within words are treated as part of the word and have no special meaning.
If an abbreviation ends with a period, it must be stored that way, otherwise
DITECT is unable to determine whether the period marks the end of an abbreviation
or the end of a sentence !
A word may be stored into exception-
file(s) for the following reasons: Writing style
-------------------------------------- -------------
1. "abcd" is unknown to DITECT abcd
2. "crude" is unwanted crude/
"crude" or "crudely" is unwanted crude#
"crude" or "crudeness" is unwanted crude*
3. "Photo" is unwanted Photo/Foto/
3. "Photo" or "Photos" is unwanted Photo/Foto/#
"Photo" or "Photograph" is unwanted Photo/Foto/*
4. Special expression "qrst" qrst/uvwx/.
to be replaced by "uvwx" automatically
Explanations
to 1:
Words unknown to DITECT are stored into exception file(s) one expression
per line.
to 2:
If an expression is unwanted by the user or it is not recognized as incorrect by
DITECT, the user may reject it by storing it into the exception dictionary with
a concluding * # or /
That part of word preceding * or # may be a complete or an abbreviated word.
With ending # an abbreviation is limited to 2 more letters, while with ending *
an abbreviation is unlimited, e.g.
crude# causes that crudely is marked as incorrect but not crudeness, while
crude* causes all words starting with crude to be marked as incorrect.
to 3:
For every incorrect expression found by DITECT, a list of proposed substitute
words is displayed, and the user may click on one of them for a replacement.
But
- when DITECT is unable to display the correct proposal or
- when the user doesn't whish to search a list of several proposals,
such a refused expression (e.g: Photo) may be expanded by a proposal
(e.g: Foto) like this: Photo/Foto/*
With ending * or # the proposal is expanded (if necessary) and is displayed in
the proposal list.
So e.g. with text word Photoatelier the proposal Fotoatelier will be displayed.
to 4:
Same as described in 3 above, but now the ending is a period (.) instead of an
asterix (*) and the first expression "qrst" is replaced by the publishing-
system automatically by the second expression "uvwx" !
Sample Exceptions Meaning:
----------------- -----------------------------------------------
Dr. Correct abbreviation of the word "Doctor"
pj's Correct abbreviation
Louisville Correct name of city unknown to DITECT.
Photograph Accepted word instead of the following
Photog* unwanted writing of "Photog..."
Kusine/Cousine/* "Kusine" is unwanted, "Cousine" is proposal.
am Besten/am besten/* "am Besten" is incorrect, "am besten" proposed.
fc/fan club/. "fc" is automatically replaced by "fan club" *)
Barbra Streisand "Barbra" is allowed if followed by "Streisand"
----------------- -----------------------------------------------
*) The replacement must be done by the publishing system, not by DITECT !
DIHYPH / DITECT exception dictionaries
During word processing several exception files may be used simultaneously:
Standard file is always searched first (if not switched off).
DIHYPH file names must always start with letters dh e.g. "dhspec
DIHYPH standard exception file automatically used is dhex??.txt
DITECT file names must always start with letters dt e.g. "dtspec
DITECT standard exception file automatically used is dtex??.txt
(?? = language-no.).
Special exception file is searched after a word could not be found in standard
exception file.
Special exception file name is defined by user and stored into array "exfile[ ]"
by textsystem (see: DHDEF.C or DHEXT.H) without extension !
The first special exception file name defined is preceded by key character + or =
"+filename" Standard file plus this file are used.
"+" Standard file only is used.
"=filename" Only this file is used without standard file.
"=" No exception file is used.
Several special exception files are seperated by 'space'-character:
"+file1 file2" Standard file + file1 + file2 are used in this sequence.
"=file1 file2" Only file1 and file2 are used.
Switching on/off exception files during runtime is only possible if the special
exception file(s) are defined before first calling the exception program as follows:
"+file1 file2"
To change access to base- or special-exception files, the following values may
be set into array "exfile[ ]" during runtime after the first call:
"++" Base- plus special-exception file(s) are used.
"=+" Only the special-exception file(s) are used.
"+" Base -exception file is used only
"=" No exception file is used.
Every change of program (DIHYPH, DITECT or language) is treated as first call!
DIHYPH / DITECT exception programs:
DMEXCAT.EXE creates an access catalog "file.CAT" for text file "file.TXT".
It may be used for DIHYPH- or DITECT- exception files and it handles
1-byte-code as well as Unicode files.
For Unicode following program modules are needed:
DMRDWTUC.C
DMCREPUC.C together with file
DMCREPUC
If only 1-byte-code is used these three DM????UC files may be replaced by
dummy file DMCREPUD.C
DMEXCAT is started with following parameters:
Program call Meaning
---------------------- ---------------------------------------------
DMEXCAT nn h for DIHYPH ( b DHEXnn.TXT =Default )
DMEXCAT nn t for DITECT ( b DTEXnn.TXT =Default )
DMEXCAT nn h b 1B-File for DIHYPH with 1-Byte-Code special file
DMEXCAT nn t u UC-File for DITECT with Unicode special file
DMEXCAT nn t 8 U8-File for DITECT with Utf-8-code special file
| | | |_____ Name of special exception file (without .txt)
| | |
| | |_________ b = 1-Byte-Code exception file (default)
| | u = Unicode exception file
| | 8 = Utf-8 code exception file
| |
| |___________ h = DIHYPH exception file
| t = DITECT exception file
|
|_____________ nn= language-no. 1-99
Before the catalog is created DMEXCAT.EXE does following operations:
exceptions are automatically checked and wrong words are marked with an error
message enclosed in ' ... '.
Finally an appropriate message is displayed.
Using a text editor's "search-function", errors enclosed in ' may now be jumped
at directly to correct the errors (the 'error message' adapted to a word is later
eliminated by the program automatically).
Double storing of exceptions is automatically suppressed by the program.
When no errors are found, exceptions are stored ready-sorted and a direct-
access catalog (filename.CAT) is created to be used by DIHYPH or DITECT.
DMEXINCT.C may be used by the publishing system to immediately insert a
new exception word in exception file and to automatically create the catalog.
This tool must not be used by clients when exception file is on the server !
Header lines of DMEXINCT.c file show how to use the program.
Contact