Theoretical Stance

And Resolution of Theory Conflict


22 May 1989


All markup schemes implicitly rely upon a theory of textual objects. This document discusses six approaches to the problem of developing a useful conceptual basis for markup in fields marked by diversity of theoretical approach. They are:
  1. reliance upon a single theory
  2. pluralism (informal description of several theories)
  3. formal pluralism (formal descriptions of several theories)
  4. eclecticism
  5. controlled semantics
  6. polytheoretical consensus

1. The Problem of Theoretical Diversity

Textual features, their attributes, and their structural relations cannot be postulated in a conceptual vacuum. They require some theoretical basis. In many fields (e.g. classical metrics) scholars agree on many or all of the pertinent features and their characteristics; in others, several divergent theories posit different sets of textual features; in still others, scholars disagree without the theoretical bases of the disagreement becoming visible.
In the extreme cases, theoretical diversity poses no problems for the working committees: on the one hand, a clear scholarly consensus can readily be translated into a single list of textual features, while on the other a lack of theoretical clarity will make it virtually impossible to elicit any consensus as to the textual features at stake, and no tag set can be developed at all.
The middle case, however, presents the working committees with a delicate problem.

2. Harmonization of Theoretical Conflict

At one extreme, tags might be provided only to represent the features tagged by a given theory, without any consideration given to their relation to similar features used by other theories. At the other extreme, a tag set might express a consensus among representatives of various theories and provide a “theory neutral” or “poly-theoretical” notation for the expression of analytic results.
Such harmonization or resolution of theoretical diversity takes place over a set of “systems.” The universe of systems to be considered comprises:
  • different theories current in the field
  • different practices current in the field
  • schemes used in the various corpora relevant to the field
Six levels of theoretical harmonization can be specified; work may, but need not, progress through the six levels sequentially. In developing a tag set to encode the analytic results of a field, a number of possibilites exist:
  1. Choice of a single theory: provide tags for a single system, ignoring the others in the field.
  2. Pluralism (informal): elicit, for each system, a full description of the system (as it applies to text encoding), including
    1. list of features
    2. examples of each feature
    3. structural properties of each feature (especially combinations with other features and the like)
    4. test criteria for the recognition of the feature
    5. formalizations available for the system
    This will be possible for different theories in different measure.
  3. Pluralism (formal): generate an SGML formalization for the feature set of each system. If SGML syntax does not suffice, the metalanguage committee must be asked to consider or develop extensions to handle the recalcitrant features.
    Note that at this low level of formalization, different theories will have separate and incongruent tag sets. The meaning of any tag will be defined only in natural language, and users of different theoretical orientations will be responsible for any translation into their own terms. The same generic identifier might be used for textual features postulated by different theories, and thus be ambiguous when viewed in isolation. The two following approaches handle this ambiguity differently.
  4. Eclecticism: define a single tag set created by the union of all system-specific tag sets, eliminating ambiguity by giving each theoretically distinct textual feature a unique generic identifer (“tag”). Users of the scheme will be expected to tag some subset of the features in the set, mixing and matching as they wish.
  5. Controlled semantics: eliminate ambiguity of generic identifiers by the explicit formal definition of the linguistic and computational meaning of the tags. Different usages of the same term must at this level be reduced to a finite list of questions with enumerable sets of possible answers. Ultimately, of course, the meaning of these questions and their answers will be expressed in natural languages, so that this cannot amount to a full specification of meaning. The formal system, however, will be constrained more fully by these formal definitions than at the lower levels of conflict resolution. .* [1]
    Now, for any given finite list of questions, with well-defined ranges for each answer, a set of SGML tags and attributes can be created which specifies answers to each question. An SGML processor could then ascertain that answers are specified for each question (although it could not necessarily check the consistency of the answers with the actual practice in the encoding).
  6. Polytheoretical consensus: provide the smallest possible single set of features which includes as a subset each set of features used by an existing system, or else define explicit mappings from the tag sets provided for one system into the tag sets provided for another. Unlike an eclectic tag/feature set, a polytheoretical set avoids all redundancy and is expected to be used as a unit, with all features tagged, so that a text tagged with such a set is useful to researchers of widely varying theoretical persuasions.
Each working committee must decide on the basis of its own knowledge how much theoretical harmonization is possible in a given area. The working committees should strive for the highest level of harmonization they believe feasible, given the theoretical climate and the resources at hand.


[1] As an example of formalization at this level, let us consider the preparation of lemmatized frequency lists. Lemmatizing practice varies on issues such as the forms of the definite article (are French le, la, l', les, one lemma, two, three, or four? do the tokens au, aux contain one lemma each or two?), but range over a finite number of possible solutions to a finite number of standard problems. The lemmatization practice of existing lists may thus be fully defined by the answers to a finite number of questions (e.g. “Are articles of different gender reduced to separate lemmata, or the same lemma?” or “Are words which combine a preposition and an article reduced to one lemma or two?”), such that each answer is drawn from a well-defined range of possible answers. In practice, these questions and answers provide a sufficient definition of the meaning of the tag &lit tag='lemma'>.