In the foregoing sections, 3.2.2 Adding labels via queries, 3.2.3 Adding labels via filters and 3.2.4 Adding labels via zooming we saw techniques for automatically adding labels to sets of observations on the basis of some formal criterion. Unfortunately the implicit information we want to make explicit often is hard or even impossible to derive automatically from formal criteria. If this is the case, we will have to go through the observations one by one and for each observation decide whether certain labels should be assigned or not. We call this manually adding labels. It is a common, and unfortunately time consuming practice.
In order not to make this tutorial more tedious than necessary, some example manual labeling work has been done for you. The result is in "c:\abundant\user\demowork.wrk", which is a demonstration workshop that has been installed on your system together with the program. Open this workshop now (after having closed whatever window was still open on the Abundantia Verborum desktop)! Next, open the Observation Browser! This should render a rather familiar sight, since "demowork.wrk" is an only slightly modified copy of the previous example "over_gen.wrk", the workshop you have created at the end of section 3.2.3 Adding labels via filters.
The difference between "over_gen.wrk" and "demowork.wrk" is that
in the latter, next to <query> and COMPAR,
two new label groups have been added, namely
SEM and SAID-OF. The group SEM,
short for "semantics",
contains encountered meaning aspects of the
occurrences of "old". There are several linguistic techniques,
empirical as well as introspective ones, for
arriving at a representation of what an expression means in a particular
observation. In this tutorial
theoretical motives behind one choice or another at that level
are not the issue. Much more important is the methodological
point that whatever choices we make, we can and should describe them
in the "Caption" of the workshop and in the
description fields of its labels and label groups.
As you can read in that location in "democorp.wrk", we have stuck to the simplest linguistic approach:
introspection.
For each observation we have asked ourselves: which information
is added to the overall message by the word "old", and how can
we represent this information by means of a preferably small set of
unambiguous, clear units of information (the labels).
After a first pass, i.e. having asked this question for each
individual observation
and having made a first attempt at formulating an answer,
each time creating new labels on the fly whenever the
existing ones were not appropriate, the work was far from done. Consecutive
passes and revisions revealed that some labels were simply
not adequate representations of the meaning aspects they were intended
to represent and had to be replaced by others, or that some meaning
aspects first described with different
labels on second
thought
were facets of the same thing and should be represented with the same label,
or that some observations first tagged with
the same label on second thought did not reflect one and the same
meaning aspect and therefore should not get the same treatment, etc.
The other new group, SAID-OF, contains encountered types of
entities that are called "old" in the data. Here, once again relying
upon introspection, we asked ourselves: what type of entity are the
meaning aspects we represent with SEM labels
being attributed to. As far as the level of specific-ness is concerned,
we chose to be as general as possible
without crossing the border of the field to which we judged the
observed use of the word to be applicable. To give an example:
if in a certain observation "old" was said of yogurt, having a meaning
with which we judged the use of the word invariably applicable to various types of food, but
not to non-food, we used
SAID-OF:FOOD as value, rather than either the too general
SAID-OF:CONCRETE OBJECT or the too specific SAID-OF:YOGURT.
As was the case for the SEM group, here too consecutive
passes were needed to fine-tune the descriptions.
Both new groups differ from the two old ones in the fact that
their values are not mutually exclusive. In SEM
values can co-occur because they can collectively represent
a meaning. In SAID-OF values can co-occur because
entities belong to several classes. Abundantia Verborum does
not distinguish between groups with mutually exclusive labels
and groups with combinable labels. It is good practice for
the user to make the distinction in the description field of the
groups.
So much for the contents. Back to the program! We left you there in the Observation Browser. Double click on the first observation! Having arrived in the Observation Editor you read, at the bottom of the dialog box, that you're currently looking at observation 1 of 32. These numbers correspond to those in the list in the Observation Browser. Being in the Observation Editor, looking at observation 1 of several, is the typical starting point of a manual pass. By a "pass" we mean the act of going over all observations, one by one, each time carrying out the same tests or actions. A complete manual labeling job typically consists of several passes.
Let us have a look at
the values
for the groups SEM
and SAID-OF that have been added manually to
the observations in "demowork.wrk",
starting with the currently displayed observation 1 of 32.
The bottom part of the Observation Editor displays that
the current observation has four labels,
namely:
<query>:<1>COMPAR:COMSEM:from other eraSAID-OF:personClick on "Edit Labels"! You end up in the Set Observation's Labels dialog box. This dialog box looks a lot like the Add Label to Observations dialog box we've already encountered (cf. "Tag all" in 3.2.3 Adding labels via filters). The major difference between the two is that the Add Label to Observations dialog box serves for making a single change at once to a set of observations, whereas the Set Observation's Labels is designed for making several changes at once to a single observation. In light of the time consuming nature of manually adding labels, the dialog box has been optimized for fast editing.
In the top panel, the "Groups" panel, you can switch between groups. In the bottom panel, the "Labels" panel, the labels of the currently selected group are displayed. Clicking on a label toggles its state between "on" and "off". The state "on" means that the current observation is assigned the label. This state is indicated in the "Labels" panel by a plus sign before the name of the label. If you, like many people who like to work fast, prefer the keyboard to the mouse, here are some tips. You can switch between the "Groups" list and the "Labels" list with Shift+Tab and Tab (for up and down respectively). This works as follows. At any time one of the objects in a dialog box is the so-called active object. It is depicted slightly different from the others. For instance, it might have an extra little frame. Pressing the Tab key makes another object active, pressing this key again yet another, and so on, until all objects that are selectable with this mechanism have been active in turn. Then yet another Tab will make the first one active again. Try this ! The key combination Shift+Tab does the same thing, only it travels in the opposite direction. When you open the Set Observation's Labels dialog box the "Labels" list is active. Afterwards in practically all common circumstances either one of the two list objects is the active object. At any time a single Tab or Shift+Tab can bring you from the one list to the other. When a particular object is the active object, keystrokes have effect to that object. If the "Groups" list is the active object, you can navigate between the groups with the arrow keys. In the "Labels" list is the active object, you can navigate between the labels with the arrow keys and you can select or deselect labels with the space bar. A final keyboard tip: in any dialog box push buttons can be activated with the keyboard by pressing Alt+X, where X stands for the underlined character in the text on the button. An alternative, usually slower but also applicable when there is no underlined character, is to first navigate to the button with a series of Tabs (or Shift+Tabs) and then press the space bar to activate it.
We invite you to experiment with the dialog box, changing the state of a few labels in a few groups. Afterwards, leave the Set Observation's Labels dialog box with "Cancel" (Alt+"c")! You're back in the Observation Editor. Since you have left Set Observation's Labels with "Cancel" none of the changes you made there have taken effect. They would have taken effect if you would have left Set Observation's Labels with "OK".
Click on the ">" button (or press Alt+">")! This brings you to
the next observation, observation 2 of 32. In most cases simply pressing the space bar
has the same effect, because ">" by default is the active object
in the Observation Editor. Techniques for going to the previous
observation are clicking on the "<" button, pressing Alt+"<"
or, given the default situation that ">" is the active object,
first pressing Shift+Tab and then pressing the
space bar. Knowing all this, navigate to observation 4 of 32! In this example
you see that sometimes we have assigned more than one label
from the SEM group to an observation, in this
case SEM:having old age and SEM:turned bad. This
does not mean that we judged the example to be ambiguous. We rather
judged the semantics of "old" in this example to be decomposable into
units that may but in principle not necessarily have to co-occur.
Whether or not these units indeed sometimes show up in isolation from each other, and which
combinations are more frequent than others would be a typical question
to ask in the analysis phase
(cf. 3.3 Displaying statistics).
Now click on "OK" in the Observation Editor (or press Alt+"o")! You're in the Observation Browser again now, and you see that the Observation Browser has followed the steps you've been taking in the Observation Editor: in the list observation 4 is now highlighted. You can also do the opposite: you can navigate through the observations in the list in the Observation Browser and then enter the observation editor, which will have followed the steps you took in the Observation Browser. Press the Arrow Down key twice! Observation 6 is highlighted now. Press the Enter key! Pressing the Enter key activates the so-called default button of a dialog box. Often this is the "OK" button. Here it is the "Edit Selected Item" button (notice that its border is a little thicker than that of the other buttons). After pressing the Enter key you end up in the Observation Editor again, looking at observation 6 of 32.
One final important remark about navigation concerns the question when the changes you make are definitive and to which extent they can be made undone again? The answer is threefold :
We're still in the Observation Editor, looking at observation 6 of 32. Its contents is copied below:
<CONSTITUENT>As the bus crosses the Orontes River into
this city's <MATCH>old</MATCH>
Moslem neighborhoods,
the only sound from its Syrian passengers is an occasional muffled
gasp.</CONSTITUENT>
When we constructed the workshop at hand this example posed a problem because we did not have enough context to infer what "old" means here. Does it mean "what used to be Moslem neighborhoods"? Or rather "Moslem neighborhoods built a long time ago"? Or still something completely different? Therefore we looked up the larger context. This is done with the information in the origin field. The origin field is copied below:
<corpus name="c:\abundant\user\democorp.vic" dmy="20/1/1997" h="5:9">
<file name="c:\abundant\corpora\wsj\wsj23b.txt" id=2>
<constituent id=000001182 view="s.vcv">
</constituent></file></corpus>
The crucial parts are printed in bold typeface here. Collectively they tell us that the passage was found in constituent 1182 of file 2 of virtual corpus "democorp.vic". If you don't like remembering numbers, then select "1182" in the origin field (by clicking right before the first "1", and while keeping the mouse button pressed, moving the mouse pointer to right after the "2", and then releasing the mouse button) and make a copy of this selection in working memory (by pressing Ctrl+Insert).
Close the Observation Editor with "Cancel", and also close the Observation Browser with "Close"! Then choose "Workshop | Run Query..."! First, make "constitu.que" the current query with the "Set" button in the "QUERY" panel! This query, which is one of the files that is installed together with the program, is matched by any complete constituent. Next, make "democorp.vic" the current virtual corpus with the "Set" button in the "CORPUS" panel! Then choose for the following runtime parameters:
<CONSTITUENT><MATCH>As the bus crosses the Orontes River
into this city's old Moslem neighborhoods, the only sound from its
Syrian passengers is an occasional muffled gasp.
</MATCH></CONSTITUENT>
There is rubble in every direction.
To the left, many of the buildings have been flattened by artillery
fire. Bulldozers now are clearing away the debris.
To the right are the empty shells of what until February were
homes, shops, apartment buildings, offices and mosques.
From the four trailing constituents we learn that the neighborhoods the passage is about have been destroyed. It could have been that we would have needed still more trailing, or leading constituents, but we're lucky. We know what we wanted to know. As you see, the complete contents field is highlighted. This is because the text is selected, and therefore ready to be copied (with the Crtl+Insert technique we just explained). Rather than copying it as a whole, only select the trailing contituents and then press Ctrl+Insert to copy them into working memory! Next choose "Abort Query" and in the Query Report click "OK"! Note that the query parameters we used were such that the query run has left no traces in our workshop neither in the "Caption" nor in the <query> label group. We merely illustrate this approach. We do not plea for it. It is a choice with pros and cons. Going into it would lead us too far.
Open the Observation Browser again and double click on the sixth observation! In the Observation Editor press Tab to make the contents field the active object. You see that the complete text is selected. Click right behind the text to deselect the text and at the same time place the text cursor at the end of the text. Now press Shift+Insert! The four trailing constituents are pasted into the field, at the place of the cursor. Note that if you paste text in a text field where text is currently selected, then the pasted text replaces the selected text.