Abundantia Verborum

5. Workshops and cognitive linguistics

5.1 Cognitive linguistics


5.1.4 The case study "vers"

In the second part of the chapter, 5.2 Workshop, we will shift attention to the program again and investigate in detail how the program relates to the topics discussed in the first part 5.1 Cognitive Linguistics. But before we move to this second part, we briefly sketch the design of the case study that will be used there as example. The case study can be found in the workshop "c:\abundant\user\vers_wnt.wrk". We invite the reader to read the remainder of this chapter with Abundantia Verborum running and the workshop loaded. Only the main features of the workshop appear in this text. Not the details.

The caption of the workshop

The workshop is a repetition of a case study by Geeraerts. It is basically carried out within the same framework as the original one (although there are some differences, such as a more rigorous adherence to a bottom-up approach), so that it is relatively easy to monitor what is and what is not made easier or made more difficult with the program. In the caption of the workshop we read the following.

This workshop is installed with Abundantia Verborum as an
example of a case study. Its setup mimics the "vers" analysis
described in the first chapters of: Geeraerts, D. (1989)  Wat
er in een woord zit. Facetten van de lexicale semantiek.
Peeters: Leuven. We chose to do over an existing study so
that the two can be compared, which may reveal information
about possible advantages, disadvantage ore other effects of
working with the program. The study was carried out begin
1996.

The topic is the semasiological profile of the Dutch adjective
"vers" in 19th century Dutch. The study is based on data
collected for the redaction of the WNT: Woordenboek der
Nederlandse Taal (1865- ) 's Gravenhage/Leiden :
Martinus Nijhoff.

Although the study by Geeraerts implicitly is based on more
examples, it mentions only 50 observations. These are the
data we use in the current study.

The data were collected automatically with Abundantia
Verborum from the manuscript of Geeraerts (1989). However,
the origin fields of the workshop were manually adjusted to
specify that the actual origin of the quotations was the
WNT-database.

The labeling was carried out by Dirk Speelman (University of
Leuven). The function of the labels is explained in the caption
fields of the label groups. Sometimes side remarks are
annotated in the caption fields of the individual labels. The
label groups mention both a subject and a linguist. Actually
this has been twice the same person. It goes without saying
that this is not the most preferable situation.

The observations

As the caption says we chose the example so that it can be compared with the original study. The choice does, however, have some less favorable consequences. Because, unlike Geeraerts, we had no access to the complete collection of quotations that was compiled for the redaction of the gigantic historical dictionary of Dutch Woordenboek der Nederlandsche Taal (1865- ). Martinus Nijhoff: 's Gravenhage/Leiden, we were forced to restrict our attention to the quotations that were given explicitly in Geeraerts 1989b.

For the period the study was about (19th and early 20th century) Geeraerts found about 150 quotations, but he presents only 50 in his text, in such a way that all the different senses he distinguishes are represented. The fact that our study is only based on these 50 examples that are selected so that all senses are (almost) equally represented, deprives Abundantia Verborum of one of its major benefits, namely easy presentation of frequency distributions. However, later on we will show that in spite of the unfavorable outset, still a lot of quantitative information resides in the data, be it information of a different nature. A final remark: of the 50 examples Geeraert gives, eight are not from the period the study is about. We chose to include them anyway, but to introduce a period label group, so that, whenever we want we can filter out these examples.

As the caption says, the origin fields contain a reference to the original source of the quotation, rather than to the location it was physically retrieved from by Abundantia Verborum (since the latter, the digital manuscript of the book, is not, and most likely will never be, a publicly available source). The reference to the original source is made in an informal way because we cannot think of any scenario's where the program itself, either sooner or later, can make use of this reference. It is there merely for the human user.

Bottom-up analysis

The assignment of labels was carried out in a bottom-up way, by which we mean that we started from the interpretation of individual observations and only afterwards tried to detect a global structure in this collection of individual interpretations. This approach is motivated by our believe that the spontaneous interpretation of an individual observation belongs to the "language use" we study, rather than to the language analysis by the linguist. These are two separate levels, just like the unreflected, spontaneous use of syntactic constructions when using language is something quite different from the conscious analysis of the syntactic structure of a sentence by a linguist.

We believe that if one is interested in how language is actually being used and possibly in which psychological processes are actually at work in spontaneous language use, then data collection is most pure when these two levels are separated in a clear way. For the first level operational tests should be designed to translate unreflected intuition to labels. The most suited framework probably are "blind tests" in the form of an experimental setup in which subjects are presented an observation and are given carefully designed tasks that result in behaviour that can be translated in a more or less straightforward way by the linguist label assignment.

Admitting that we took the easy way of presenting only a flawed simulation of what we see as an interesting two-step data collection method (first retrieving utterances from corpora, and then retrieving interpretations through experiments), we do find it important for understanding the case study at hand that it was carried out with the principle in mind of separating the two levels. It should be clear, however, that the resulting bottom-up approach is not enforced by the program. It is a feature of this particular case study, not of cognitive lexical semantics in general, nor of Abundantia Verborum in general.

The label groups

The workshop contains five label groups. The first of the two levels just described, namely the level of additional data collection, is represented in the workshop by the group semantics. The data are obtained through 'introspection as flawed surrogate for experimental setup'. The second level, that of the actual analysis, is represented by the groups schema, context, higher_sem and highest_sem. All information taken into account is of semantic nature. The study could, of course, be further refined by adding extra information, e.g. of a syntactical nature, so that correlations between syntactic and semantic use could be detected. In the remainder of this text we briefly explain the function of the individual label groups. Actually, there is one group with non-semantic information, namely the group period contain meta-information about the period the observations originate from.

semantics

The caption of this group is as follows:

The subject was confronted with each
individual observation, and each time
asked which information he felt was
contributed explicitly to the overall
message in the observation by the use of
"vers". Afterwards the linguist has reworked
the original labels to make them conform to
the principle that each label in the workshop
represents a unique, atomic, unambiguous
piece of information that does
not overlap with the semantics of other
labels.

As told in the caption, the operational test we used to obtain semantic labels was the confrontation of the pseudo-subject with the question "which information according to you is contributed explicitly to the overall message by the use of the word vers?". The pseudo-subject attempted to simulate spontaneous language use when reading the observations, therefore refraining from deliberate searches for inspiration in the other study (but how pure and unreflected can the language use of a linguist be?). The goal was to obtain a list of all information, either linguistic or world-knowledge, that subjects feel is in the "intension" of the lexical item, i.e. the meaning of the lexical item at the intensional level. Instead of seeing an "intension" as a set of criteria, we treat it as a bundle of information. As will be explained later this information-perspective is equally open to a distinction classical vs. non-classical semasiological profile as the criteria-perspective is.

Since the information was reworked by the linguist in the same label group, the result of the additional data collection is not retained in its pure state. In cases, such as experiment-based or interview-based data collection, where the additional data collection would have been much more labour intensive, and the result more precious, it would be wiser to do the reworking in a separate label group, so that the pure result of the additional data collection step would be easy to re-use or re-consult.

The labels in the semantics group are the following:

  - far from gone bad, tasty
  - rich in refreshing ingredients
  - having few noxious ingredients
  - recently fabricated
  - not washed away yet
  - new in context
  - just prepared for use
  - recently acquired
  - recently picked or slaughtered
  - not conserved artificially
  - having many useful ingredients
  - recently come into existence
  - still strong/fit
  - not yet used
  - not yet stained or polluted
  - still unfamiliar
  - recently used
  - wet
  - cool
  - in supply

schema

The group schema is an attempt to translate the information in the semantics group to a schema-like description. This process implies an extra level of linguistic interpretation rather than being a direct transformation of identical information. The schemata are the result of the linguist who is trying to find logical coherence in the global semasiological profile. There are no clear criteria to evaluate this work, except the principle that more straightforward and coherent resulting representations can be considered to be (psychologically) more plausible. We use implication rules to specify the relation "X is a schematization of Y". The following is the context of the caption of the group.

This information was completely added by
the linguist, as an a posteriori interpretation
of the structure that underlies the
information in the "semantics" group, and
more precisely as an attempt to find a
plausible model of the cognitive
mechanisms that could be at work behind
the screens.

The term schema here, in a theory-neutral
way, refers to general scenario's, i.e.
"sequences of events that are applicable to
many situations that language describes".

The linguist went through several 'what-if'
trains of thought to eventually select the
type of schemata that to his experience
best motivates cognitive coherence in the
structures and label configurations he
found in the "semantics" group.

The labels in the schema group are the following (for their meaning, please load the workshop to consult their caption and to consult the label implication rules):

  - 2_flourish/decline
  - 2_flourish/decline_once
  - 2_flourish/decline_cyclic
  - 2_abundance/used up_once
  - 2_powerful/weak_once
  - 2_quality/decline_once
  - 2_young/old_once
  - 2_pure/polluted_once
  - 2_fit/tired_cyclic
  - 2_prepared/used_cyclic
  - 2_added/decline_cyclic
  - 2_humidity
  - 2_temperature

context

The next group, context, forms the basis for investigation whether the semantic use of "vers" depends upon or, more neutrally, correlates with the type of noun "vers" is attributed to. The caption of the group is given below. It could be argued that this type of information should be based on an additional data collection step similar to the case of semantics, rather than leaving it all up to the interpretation of the linguist. We chose for the latter.

The use of an expression in a specific
sense often only seems possible (and
seems to be motivated by) the context of a
particular domain.
For adjectives this context can often most
easily be formulated as an hyperonym of
the type of entity refered to by the (possibly
implicit) noun that the adjective predicates
over.

This information was added by the linguist
who, for each observation, asked himself in
which context the use of "vers", as it was
understood by the subject, occurred.

The context was defined as general as
possible, without subsuming contexts in
which, according to the linguist's intuition,
the reading would not be (as easily)
applicable.

The purpose of introducing the group was to
investigate correlations between senses
and contexts, and thus possibly refine the
obtained picture of the semasiological
profile of "vers" that emerges from the data
(as the are interpreted by the subject).

The labels in the context group are the following:

  - clothing
  - organic food
  - organism
  - thought/feeling
  - matter that can disappear
  - sheets
  - entity
  - entity that can disappear
  - food
  - air
  - water
  - human
  - animal
  - plant
  - ink
  - event
  - artifact
  - physical object
  - earth
  - dose
  - strength/courage
  - situation
  - message
  - compost

higher_sem and highest_sem

As a bottom-up approach should, it goes up in the end. The labels in the groups higher_sem and highest_sem contain, at two different levels of abstraction, bundles of labels from semantics, thus forming high-level semantic labels. This is, of course, specified using implication rules. The captions of the groups are presented below. As principles for clustering we chose those aspects of our schema-analysis that unify, or reveal coherency in, the semantics information more than other candidate-principles.

For higher_sem the caption is:

HIGHER SEMANTICS

Because there are too many labels in the
"semantics" group to display them all
together in graphs, the linguist has created
the "higher_sem" group to simplify the
semantic information to a reduced set of
labels.

Implication rules from "semantics" to
"higher_sem" specify how the labels in
"higher_sem" are defined to be bundles of
related labels in "semantics".
The rules are such that a label in
"semantics" may occur in several bundles
of "higher_sem". The "higher_sem:(rest)"
class bundles "semantics:wet" and
"semantics:cool".

Whereas in the "schema" group as many
candidate structuring principles as possible
are explored, the "higher_sem" group only
retains those principles that are most
suggested by the "semantics" labels
themselves, i.e. those that are applicable to
many "semantics" labels.

For highest_sem the caption is:

HIGHEST SEMANTICS

A further generalization of "higher_sem".
Here we abstract over whether focus is on
the confirmation of stage 2 or on the
negation of stage 3.

The labels in the higher_sem group are the following (for their meaning, please load the workshop to consult their caption and to consult the label implication rules):

  - good qualities
  - no bad qualities
  - hardly used
  - not over-used
  - recent
  - not old
  - (rest)

The labels in the highest_sem group are the following (for their meaning, please load the workshop to consult their caption and to consult the label implication rules):

  - high quality
  - lack of use
  - recency
  - (rest)

period

The labels in the group period are the following. The decision to take the 19th century and the first two decades of the 20th century as one historical period was taken over from the study by Geeraerts:

  - 1500-1599
  - 1600-1699
  - 1700-1799
  - 1899-1920

Back to table of contents