Why not Word ?
Open letter to all colleagues working on the ASP/RMIS project in Tirane, Albania
Markus Lepper, April 2006

Dear all!
Oftenly I am asked why my colleagues from the scientific community and myself prefer not to use integrated word processors like "microsoft Word".

Since this question arises frequently, I try to answer it here for now and ever !-)


First of all, as computer scientists we try to cleanly separate a "model" from its (possibly numerous different) "views".

A text, seen as a model, is a mathematical "term", following a well-defined "syntax grammar", together with (one or more) "semantics", which are mappings (or "interpretations") from syntactical structures to a pure mathematical model (which is mostly an algebraic system, based on sets, functions and/or relations).

This allows an "objective" view to a text as a data item, or "object", where ...

  1. no information is hidden!
    The text object, of which you see a graphical representation, is totally identical to what the text object is semantically, i.e. its information content.
    The contrast to the "graphical" approach becomes obvious whenever you copy something in a windows environment into the clipboard: The clipboard contents may contain many invisible things, linke binary data, text formatting mark-up, inivisible markup, and all kinds of viruses.
  2. every (syntactical) transformation made on the text data is totally under our control, and its semantics are well-defined according to the different, but always well-defined semantical interpretations.

As a consequence of seperating model and representations, a "source text" seen as such a model ...

  1. can be rendered to very different "back-ends", e.g. XHTML, pdf, ps, as its "views",
  2. does contain "semantical", not "physical" mark-up.

Esp. the second point is totally mixed-up in integrated word-processors like ms-word, (-- at least it cannot be remedied without a substantial effort of programming, using internal APIs and tools you have to pay for !-)

WYSIWIG is a chimaera: If you open a word document on a computer with a different printer selected as default than the computer the document was created on, "ms-word" will ask you whether "save changes?" when closing the document, even if it has been opened in "read only mode"! That gets on my nerves!

Any tool which confesses to follow the WYSIWIG principle implicity declares that it does not operate on text objects! A text is an intellectual structure inside your brain, and no human being has ever "seen " a text (except perhapes metaphorically as a sudden artistic of e.g. a new poem).

In our approach we use only semantic mark-up, e.g. for human beings we can write using e.g. "d2d" (the author's XML-frontend):

This is a sentence mentioning #pers Bill Gates."

This will be translated automatically into XML :

This is a sentence mentioning 
  <person><input>Bill Gates</input><prename>Bill</prename><name>Gates</name>

This "object" in turn will be ...

  1. compiled e.g. to PDF, using a "small-caps" font for each personal name.
  2. It can also be rendered to a word document, printing all personal names in yellow or pink.
  3. It can be compiled (using XSL-T) into XHTML, where each personal name is a hyper-ref to the person's e-mail account, or into an appendix listing all persons' criminal records (which of course may be empty !-),
  4. and so on, be processed in any thinkable way, --- transparently and explicitly.

All these very different rendering processes can be carried out arbitrarily often, independently and in parallel, since they do not affect the text model itself, the information source object. This is different when using an "integrated word processor".


Using "pure" text objects in this sense allows you to treat them with whatever tool you want to.


Supposed we have created a whole bunch of documents, in which the abbreviation "DoPA" appears frequently.
Proof-reading these docs we find that (concentrating on content, not on spelling) we did type them differently, e.g. "DOPA", "Dopa", etc.

What we will do for one document "t1.txt" is simply to issue

$  sed -e"s/\bDoPA\b/DoPA/ig" t1.txt > x ; mv -f x t1.txt

from the command line, and all occurences will be unified. (The "\b" is required since we do not want to replace e.g. "dopant" or "dopamin").

Since we have a proper "shell" (and not a thing like "command.com"), we can write

$  for f in *.txt ; do sed -e"s/\bDoPA\b/DoPA/ig" $f > x ; mv -f x $f ; done

which fixes the spelling of this abbreviation in all our documents immediately.

The most important feature is that with a written commandline you can make an abstraction from it, which is not possible with "gui actions". Abstraction is the heart of all programming! Since such a correction is probably required for more than one abbrev., we abstract it and create a "program" by one single command line (not needing any editor and mouse-clicking !-)

$  echo for f in $2 ; do sed -e"s/\b$1\b/$1/ig" $f > x ; mv -f x $f ; done > unify
$  chmod a+x unify

Now we can write

$ ./unify DoPA "*.txt"
$ ./unify USA "*.txt"
$ ./unify EC "*.txt"

Of course we can make further abstractions, e.g. comprehending all these corrections into a single command file for repeating them after a new editing cycle.

This flexibility in applying tools and abstracting the application can obviously only be done with true and well-defined text objects, not with binary encoded don't-no-how-to-call-it, containing proprietary secrets. (The byte sequence representing "DoPA" could match very different binary encoded information!)


Last not least: we do not want our fingers leave the keyboard, --- we want to do everything by simple typing, with no need to switch our brain from the "literal=structural=mathematical" mode into a "graphical=geometric" mode. This always costs valuable micro-seconds, and interrupts the flow of thinking painfully.


Of course, graphical interfaces have their areas of application where they are more sensible then text representation, but to our experience this is more on the output side, for proof-reading mathematical structures created by text input, e.g. layout programs using LaTeX or graphs which should not contain negative edges/cycles, etc.


That's the way in which my colleagues and myself work with a computer.

I hope to have answered the question "Why we prefer not to use integrated word processors?" satisfactorily.


made    2012-02-28_11h02   by    lepper   on    heine        Valid XHTML 1.0 Transitional Valid CSS 2.1

produced with eu.bandm.metatools.d2d    and    XSLT