In Abundantia Verborum data classification means: making implicit information explicit, so that after this phase everything about an observation we want to take into account in our analysis is represented in an explicit and unambiguous way; more precisely: is represented by a label.
In the sections 3.2.2 Adding labels via queries, 3.2.3 Adding labels via filters and 3.2.4 Adding labels via zooming we saw which techniques can be used for explicitating information that has some formal counterpart or trace in the physical data, so that we can automate the process of label assignment. In 3.2.5 Manually adding labels we saw how to add labels ourselves when the computer, so to speak, is not intelligent enough to be taught how to do it.
In this section we discuss a last type of information that
can be explicitated. Implicit information can not only reside in
the data from the corpus, but also in the labels we have already
assigned. If this is the case, it is usually easier to automatically
explicitate such information on the basis of already assigned
labels than on the basis of the actual data.
Let us take an example from the previous
section. If a particular observation has been assigned
the label SAID-OF:food, then we humans
can infer from this that "old" is said here of an organic entity
(at least for most ingredients), and when
SAID-OF:human has been assigned, then we
can infer that "old" is said of an animate entity (at least
in some stage of its existence). There is a structural,
as opposed to accidental, implication relation between
being food and being mostly organic, and between
being human and being animate. Such structural implication
rules can be taught to Abundantia Verborum.
If it is not already open, then open the workshop "demowork.wrk", and choose "Workshop | Browse Labels..."! In the Label Browser, click on "Implication Rules"! The Label Hierarchy Table appears. In this environment we can structure our labels into one or several label taxonomies, either partially or completely. Let us take the following taxonomy as an example. It differs a bit from what we find in "demowork.wrk".
+---SAID-OF:human
SAID-OF:(animate entity)---|
+---SAID-OF:(animal)---SAID-OF:bird
Without wanting to go into matters such as whether or not in a good taxonomy
'human' belongs under or next to 'animal', we use the above
midget example merely for illustrating the implication rule mechanism.
For each X is a son of Y relation in a taxonomy you have to
create an implication rule X->Y. In such a rule X is called
the implying label, and Y is called the implied label. For specifying
the above taxonomy you need the rules listed below.
SAID-OF:human -> SAID-OF:(animate entity)SAID-OF:bird -> SAID-OF:(animal)SAID-OF:(animal) -> SAID-OF:(animate entity)The philosophy behind using implication rules is that
assigning a leaf label to an observation automatically
implies that its parent label and all its ancestor labels are
being assigned too. Given the above rules, assigning SAID-OF:bird
means assigning the high level labels SAID-OF:(animal)
and SAID-OF:(animate entity) as well. What we gain
is that in the analysis phase we can incorporate high level features
in our calculations. We will be able to ask the program
questions such as: is the distribution of such and such
semantic labels similar when "old" is said of animate entities
compared to when it is said of inanimate entities?
To conclude this section, let us see what implied labels look
like in observations. Close all dialog boxes, if any are open,
and open the Observation Browser! Double click on the fourth (!)
observation in the list of observations!
The Observation Editor opens, containing
this observation. You see that in the list of assigned
labels there is the label
SAID-OF:food, but if you scroll down you also find
the label
SAID-OF:(organic entity),
SAID-OF:(physical entity) and
SAID-OF:(entity), all three being marked
as "<implied>". Click on the "Graph" button to get a better picture!
The Label Inclusion Graph appears. In such a tree all
non-leaf nodes (apart from the root) represent implied labels.
They are depicted in red (if you have a colour screen).
The tree teaches you that
the observation is marked as
SAID-OF:food, hence
SAID-OF:(organic entity), hence
SAID-OF:(entity).
Now navigate to the first observation and look at
the Label Inclusion
Graph of this observation. This is an example of
multiple taxonomies. The label SAID-OF:person is
specified to be both an instantiation of
SAID-OF:(organic entity) and of
SAID-OF:(animate entity). In this particular case
the two categories come together again at a higher level.
The label SAID-OF:(organic entity) implies
SAID-OF:(physical entity) and both
SAID-OF:(physical entity) and
SAID-OF:(animate entity) imply
SAID-OF:(entity).
This is not necessarily the case. It is possible to specify
multiple partial taxonomies that do not link up into a bigger
structure.