Abundantia Verborum

3. Tutorial

3.2 Classifying the data


3.2.1 Using labels

The observations in a workshop are classified by assigning labels to them. Labels are formal, atomic pieces of information. They are grouped in label groups. Each group contains a few up to a few thousands of labels that all represent information of a particular type. An example of a label group might be "SYNTACTIC CATEGORY". Labels in this group could be "VERB", "NOUN", etc. The inventory of label groups and labels is managed with the Label Browser, which is a tool for viewing and editing the pool of labels that are used in a particular workshop. Before a label can be assigned to an observation it first has to be either added a label group that is already in the Label Browser, or stored in a newly created label group. After that, the label can be assigned to any set of observations in the workshop. Labels are either assigned at once to a whole set of observations in the Observation Browser or to an individual observation in the Observation Editor. The next four sections illustrate common techniques for assigning labels to observations. But first some basic concepts.

Basic Concepts

You could think of label groups as parameters and of the labels in a particular group as possible values for that parameter. You have to have in mind, though, that it is possible, and often useful, that observations are assigned no labels or more than one label from a particular label group. The latter situation is particularly interesting for groups that contain semantic information, since it is often convenient to describe the semantics of a linguistic phenomenon as a set of smaller pieces of semantic information that may co-occur in variable numbers.

We tend to think of a workshop as an n-dimensional space, n being the summed cardinality of all label groups. In each dimension there are two possible locations, namely point 0 for 'label absent' and point 1 for 'label present'. The observations are spread over this n-dimensional space in such a way that their location corresponds to which labels they do and which they do not have.

Whatever metaphor you prefer, essential is that the information assigned to observations is split up into, and is restricted to atomic units of information, the function of which should be 'clear and distinct'. For this purpose labels not only have a name, but also a description field in which their function can be described in full text in an explicit and unambiguous way. These description fields have a similar function as the "Caption" fields of workshops, virtual corpora, queries, etc. Label groups too have a description field.


Back to table of contents