Whenever we have used examples from the SAID-OF
group in diagrams so far, we have been using the "Set Labels" button in
the Graph Settings dialog box, never the "Add Group"
button. The reason is that there are 16 labels in the
SAID-OF group, and that the graph mechanism
in Abundantia Verborum does not allow for more than
8 displayed labels. A diagram with 16 displayed
labels would have to be able to display 2 to the power of
16, which is 65536, possible nodes (or regions, if you want). In other words, the
complexity of the diagrams grows exponentially with the number of
displayed labels. If you want to get a feel of this complexity,
just create a Hasse diagram with 8 displayed labels and
with the threshold off. You'll agree that this is too much for
one screen and that the display threshold is needed for making sense
out of this multitude of information.
The current section is about a technique for creating diagrams for parameters (group) with more than eight values (labels). The technique is to display high level implied labels rather than the low level labels that were actually assigned to the observations. Remember that implied labels were introduced in section 3.2.6 Label Taxonomies.
A global picture of a big parameter can be obtained by grouping the actual low level labels in subgroup, treating the disjunction of the labels in one subgroup as a single, more general label, and using these more general labels as displayed labels. Make sure "demowork.wrk" is open and is the current workshop on your Abundantia Verborum desktop! Next set the following displayed labels:
SAID-OF:(animate entity)SAID-OF:(organic entity)SAID-OF:(non-organic physical entity)SAID-OF:(physical entity)SAID-OF:(non-physical entity)These are all the high level labels in
the group SAID-OF, apart from SAID-OF:(entity).
We do not include this one, because it does not introduce any extra
information (it is implied by all other high level labels), and it
would complicate the diagram. After you selected the labels,
set the diagram type to Hasse and the display threshold to 0% (enabled)!
Then click OK!
A first thing we can read from the diagram is that, surprisingly,
the node {} is not empty. Apparently, and contrary to
our intentions,
our high level labels do not cover the complete workshop.
You could investigate what is the problem by setting the filter
to NOT(SAID-OF:(entity)) and then using
"Workshop | Browse Observations..."
to check which
observations are still visible in the Observation Browser,
because these will be the ones that populate the current node {}.
You don't have to do this now. In section
3.3.6 Zooming in on diagram parts we will
learn about a faster, more direct technique for exploring the
population of a node. But if you would do it, you would find out that
a single observation, tagged with SAID-OF:feeling,
causes the problem. Apparently, we have forgotten to link this label
to the higher layers of the taxonomy. If you would add the rule
SAID-OF:feeling ---> SAID-OF:(non-physical entity)
to the workshop, the diagram would instantaneously loose its bottom node {}.
This is one of the advantages of Hasse diagrams: the node {}
is very useful for signaling classification phase errors. These errors
would be much harder to detect on the basis of Schematic diagrams.
Another thing we can see on the diagram, on the basis of the
existence of the node {4}, is that at least one
observation is tagged as
SAID-OF:(physical entity), but is neither
SAID-OF:(organic physical entity) nor
SAID-OF:(non-organic physical entity).
As you will learn to verify for yourself in
3.3.6 Zooming in on diagram parts,
the node {4} is inhabited by one observation.
This observation is SAID-OF:line product, and
since line products can be both organic or non-organic, we judged
the best solution to be to introduce the rule
SAID-OF:line product -> SAID-OF:(physical entity).
Our first two observations reflect classification
errors or decisions. But most of all we want to learn
about features of the actual data, rather than about
features of our own classification activities. Therefore let us
look at the gray scales in the nodes. What we learn
is that the following is the SAID-OF top three:
by far the most frequent use of "old" is in the context
of animate, organic entities. On the second place we have
non-organic physical entities. On the third place come the
non-physical entities. Of course, this example merely
illustrates the technique of generating such statistics.
Contents-wise it cannot serve as a proper example. For that the
workshop is far too small.
In the example all high level labels more or less fit
into one taxonomy and therefore it makes sense to include them
all in one diagram. But this does not have to be the case.
It is possible to encode multiple hierarchies into the implication
rules, and then investigate these alternatives one by one in
separate diagrams. For instance, you could expand the current workshop
with high level SAID-OF labels that
classify the entities according to their size or shape (using
"not applicable" as a catch all), or
you could expand the workshop with high level
SAID-OF labels that
classify the entities according to positive, negative or neutral
connotation.
After you have obtained a first global picture with one or several diagrams that contain nothing but high level labels, it may be useful to test your taxonomy by replacing one high level label by the low-level labels it encompasses. The resulting hybrid diagram may be useful to detect statistically remarkable phenomena in the data that are hidden in the high level diagram. Such observations may then lead to modifications to the taxonomies, or to the preference of one taxonomy above another.
One final remark about diagrams and implied labels is
that in "demowork.wrk" we have stored both low level
and high level "said-of" labels in one and the same group.
This is not obligatory. You could reserve the group SAID-OF
for low level labels and introduce e.g. the group (SAID-OF)
for the high level labels. You could even have several
high level groups, in case you have several alternative
taxonomies.
Technically there is no problem, because implication rules
can cross label groups. Contents-wise,
in a sense, it is even a neater solution.
First of all, there is a more explicit distinction between
actually assigned labels and implied labels.
Secondly, it may be judged to be closer to the spirit of
using groups as containers of comparable things.