In the second part of the chapter, 5.2 Workshop, we will shift attention to the program again and investigate in detail how the program relates to the topics discussed in the first part 5.1 Cognitive Linguistics. But before we move to this second part, we briefly sketch the design of the case study that will be used there as example. The case study can be found in the workshop "c:\abundant\user\vers_wnt.wrk". We invite the reader to read the remainder of this chapter with Abundantia Verborum running and the workshop loaded. Only the main features of the workshop appear in this text. Not the details.
The workshop is a repetition of a case study by Geeraerts. It is basically carried out within the same framework as the original one (although there are some differences, such as a more rigorous adherence to a bottom-up approach), so that it is relatively easy to monitor what is and what is not made easier or made more difficult with the program. In the caption of the workshop we read the following.
This workshop is installed with Abundantia Verborum as an example of a case study. Its setup mimics the "vers" analysis described in the first chapters of: Geeraerts, D. (1989) Wat er in een woord zit. Facetten van de lexicale semantiek. Peeters: Leuven. We chose to do over an existing study so that the two can be compared, which may reveal information about possible advantages, disadvantage ore other effects of working with the program. The study was carried out begin 1996. The topic is the semasiological profile of the Dutch adjective "vers" in 19th century Dutch. The study is based on data collected for the redaction of the WNT: Woordenboek der Nederlandse Taal (1865- ) 's Gravenhage/Leiden : Martinus Nijhoff. Although the study by Geeraerts implicitly is based on more examples, it mentions only 50 observations. These are the data we use in the current study. The data were collected automatically with Abundantia Verborum from the manuscript of Geeraerts (1989). However, the origin fields of the workshop were manually adjusted to specify that the actual origin of the quotations was the WNT-database. The labeling was carried out by Dirk Speelman (University of Leuven). The function of the labels is explained in the caption fields of the label groups. Sometimes side remarks are annotated in the caption fields of the individual labels. The label groups mention both a subject and a linguist. Actually this has been twice the same person. It goes without saying that this is not the most preferable situation.
As the caption says we chose the example so that it can be compared with the original study. The choice does, however, have some less favorable consequences. Because, unlike Geeraerts, we had no access to the complete collection of quotations that was compiled for the redaction of the gigantic historical dictionary of Dutch Woordenboek der Nederlandsche Taal (1865- ). Martinus Nijhoff: 's Gravenhage/Leiden, we were forced to restrict our attention to the quotations that were given explicitly in Geeraerts 1989b.
For the period the study was about (19th and early 20th
century) Geeraerts found about 150 quotations,
but he presents only 50 in his text, in such a way that all
the different senses he distinguishes are represented. The fact
that our study is only based on these 50 examples that are selected
so that all senses are (almost) equally represented,
deprives Abundantia Verborum of one of its major benefits,
namely easy presentation of frequency distributions. However,
later on we will show that in spite of the unfavorable outset,
still a lot of quantitative information resides in the data,
be it information of a different nature. A final remark: of the
50 examples Geeraert gives, eight are not from the period
the study is about. We chose to include them anyway, but to
introduce a period label group, so that,
whenever we want we can filter out these examples.
As the caption says, the origin fields contain a reference to the original source of the quotation, rather than to the location it was physically retrieved from by Abundantia Verborum (since the latter, the digital manuscript of the book, is not, and most likely will never be, a publicly available source). The reference to the original source is made in an informal way because we cannot think of any scenario's where the program itself, either sooner or later, can make use of this reference. It is there merely for the human user.
The assignment of labels was carried out in a bottom-up way, by which we mean that we started from the interpretation of individual observations and only afterwards tried to detect a global structure in this collection of individual interpretations. This approach is motivated by our believe that the spontaneous interpretation of an individual observation belongs to the "language use" we study, rather than to the language analysis by the linguist. These are two separate levels, just like the unreflected, spontaneous use of syntactic constructions when using language is something quite different from the conscious analysis of the syntactic structure of a sentence by a linguist.
We believe that if one is interested in how language is actually being used and possibly in which psychological processes are actually at work in spontaneous language use, then data collection is most pure when these two levels are separated in a clear way. For the first level operational tests should be designed to translate unreflected intuition to labels. The most suited framework probably are "blind tests" in the form of an experimental setup in which subjects are presented an observation and are given carefully designed tasks that result in behaviour that can be translated in a more or less straightforward way by the linguist label assignment.
Admitting that we took the easy way of presenting only a flawed simulation of what we see as an interesting two-step data collection method (first retrieving utterances from corpora, and then retrieving interpretations through experiments), we do find it important for understanding the case study at hand that it was carried out with the principle in mind of separating the two levels. It should be clear, however, that the resulting bottom-up approach is not enforced by the program. It is a feature of this particular case study, not of cognitive lexical semantics in general, nor of Abundantia Verborum in general.
The workshop contains five label groups. The first of
the two levels just described, namely the level of additional
data collection, is represented in the workshop by the group
semantics. The data are obtained through
'introspection as flawed surrogate for experimental setup'.
The second level, that of the actual analysis, is represented
by the groups schema, context,
higher_sem and highest_sem.
All information taken into account is of semantic nature.
The study could, of course, be further refined by
adding extra information, e.g. of a syntactical nature,
so that correlations between syntactic and semantic
use could be detected. In the remainder of this text we briefly explain
the function of the individual label groups.
Actually, there is one group with non-semantic information,
namely the group period contain meta-information
about the period the observations originate from.
The caption of this group is as follows:
The subject was confronted with each individual observation, and each time asked which information he felt was contributed explicitly to the overall message in the observation by the use of "vers". Afterwards the linguist has reworked the original labels to make them conform to the principle that each label in the workshop represents a unique, atomic, unambiguous piece of information that does not overlap with the semantics of other labels.
As told in the caption, the operational test we used to obtain semantic labels was the confrontation of the pseudo-subject with the question "which information according to you is contributed explicitly to the overall message by the use of the word vers?". The pseudo-subject attempted to simulate spontaneous language use when reading the observations, therefore refraining from deliberate searches for inspiration in the other study (but how pure and unreflected can the language use of a linguist be?). The goal was to obtain a list of all information, either linguistic or world-knowledge, that subjects feel is in the "intension" of the lexical item, i.e. the meaning of the lexical item at the intensional level. Instead of seeing an "intension" as a set of criteria, we treat it as a bundle of information. As will be explained later this information-perspective is equally open to a distinction classical vs. non-classical semasiological profile as the criteria-perspective is.
Since the information was reworked by the linguist in the same label group, the result of the additional data collection is not retained in its pure state. In cases, such as experiment-based or interview-based data collection, where the additional data collection would have been much more labour intensive, and the result more precious, it would be wiser to do the reworking in a separate label group, so that the pure result of the additional data collection step would be easy to re-use or re-consult.
The labels in the semantics group are
the following:
- far from gone bad, tasty - rich in refreshing ingredients - having few noxious ingredients - recently fabricated - not washed away yet - new in context - just prepared for use - recently acquired - recently picked or slaughtered - not conserved artificially - having many useful ingredients - recently come into existence - still strong/fit - not yet used - not yet stained or polluted - still unfamiliar - recently used - wet - cool - in supply
The group schema is an attempt to translate the information
in the semantics group to a schema-like description.
This process implies an extra level of linguistic interpretation
rather than being a direct transformation of identical
information. The schemata are the result of the linguist
who is trying to find logical coherence in the global semasiological
profile. There are no clear criteria to evaluate this work, except
the principle that more straightforward and coherent resulting
representations can be considered to be (psychologically) more
plausible.
We use implication rules to specify the relation "X is a schematization
of Y". The following is the context of the caption of the group.
This information was completely added by the linguist, as an a posteriori interpretation of the structure that underlies the information in the "semantics" group, and more precisely as an attempt to find a plausible model of the cognitive mechanisms that could be at work behind the screens. The term schema here, in a theory-neutral way, refers to general scenario's, i.e. "sequences of events that are applicable to many situations that language describes". The linguist went through several 'what-if' trains of thought to eventually select the type of schemata that to his experience best motivates cognitive coherence in the structures and label configurations he found in the "semantics" group.
The labels in the schema group are
the following (for their meaning, please load the workshop
to consult their caption and to consult the label
implication rules):
- 2_flourish/decline - 2_flourish/decline_once - 2_flourish/decline_cyclic - 2_abundance/used up_once - 2_powerful/weak_once - 2_quality/decline_once - 2_young/old_once - 2_pure/polluted_once - 2_fit/tired_cyclic - 2_prepared/used_cyclic - 2_added/decline_cyclic - 2_humidity - 2_temperature
The next group, context, forms the
basis for investigation whether the semantic use of
"vers" depends upon or, more neutrally, correlates
with the type of noun "vers" is attributed to.
The caption of the group is given below.
It could be argued that this type of information
should be based on an additional data collection step
similar to the case of semantics,
rather than leaving it all up to the
interpretation of the linguist. We chose for the
latter.
The use of an expression in a specific sense often only seems possible (and seems to be motivated by) the context of a particular domain. For adjectives this context can often most easily be formulated as an hyperonym of the type of entity refered to by the (possibly implicit) noun that the adjective predicates over. This information was added by the linguist who, for each observation, asked himself in which context the use of "vers", as it was understood by the subject, occurred. The context was defined as general as possible, without subsuming contexts in which, according to the linguist's intuition, the reading would not be (as easily) applicable. The purpose of introducing the group was to investigate correlations between senses and contexts, and thus possibly refine the obtained picture of the semasiological profile of "vers" that emerges from the data (as the are interpreted by the subject).
The labels in the context group are
the following:
- clothing - organic food - organism - thought/feeling - matter that can disappear - sheets - entity - entity that can disappear - food - air - water - human - animal - plant - ink - event - artifact - physical object - earth - dose - strength/courage - situation - message - compost
As a bottom-up approach should, it goes up
in the end. The labels in the groups
higher_sem and highest_sem
contain, at two different levels of abstraction,
bundles of labels from semantics,
thus forming high-level semantic labels.
This is, of course, specified using implication rules.
The captions of the groups are presented below.
As principles for clustering we chose those aspects
of our schema-analysis that unify,
or reveal coherency in, the semantics information
more than other candidate-principles.
For higher_sem the caption is:
HIGHER SEMANTICS Because there are too many labels in the "semantics" group to display them all together in graphs, the linguist has created the "higher_sem" group to simplify the semantic information to a reduced set of labels. Implication rules from "semantics" to "higher_sem" specify how the labels in "higher_sem" are defined to be bundles of related labels in "semantics". The rules are such that a label in "semantics" may occur in several bundles of "higher_sem". The "higher_sem:(rest)" class bundles "semantics:wet" and "semantics:cool". Whereas in the "schema" group as many candidate structuring principles as possible are explored, the "higher_sem" group only retains those principles that are most suggested by the "semantics" labels themselves, i.e. those that are applicable to many "semantics" labels.
For highest_sem the caption is:
HIGHEST SEMANTICS A further generalization of "higher_sem". Here we abstract over whether focus is on the confirmation of stage 2 or on the negation of stage 3.
The labels in the higher_sem group are
the following (for their meaning, please load the workshop
to consult their caption and to consult the label
implication rules):
- good qualities - no bad qualities - hardly used - not over-used - recent - not old - (rest)
The labels in the highest_sem group are
the following (for their meaning, please load the workshop
to consult their caption and to consult the label
implication rules):
- high quality - lack of use - recency - (rest)
The labels in the group period are
the following. The decision to take the 19th century
and the first two decades of the 20th century as one
historical period was taken over
from the study by Geeraerts:
- 1500-1599 - 1600-1699 - 1700-1799 - 1899-1920