Abundantia Verborum

3. Tutorial

3.3 Displaying statistics


3.3.4 Diagrams and implied labels

Whenever we have used examples from the SAID-OF group in diagrams so far, we have been using the "Set Labels" button in the Graph Settings dialog box, never the "Add Group" button. The reason is that there are 16 labels in the SAID-OF group, and that the graph mechanism in Abundantia Verborum does not allow for more than 8 displayed labels. A diagram with 16 displayed labels would have to be able to display 2 to the power of 16, which is 65536, possible nodes (or regions, if you want). In other words, the complexity of the diagrams grows exponentially with the number of displayed labels. If you want to get a feel of this complexity, just create a Hasse diagram with 8 displayed labels and with the threshold off. You'll agree that this is too much for one screen and that the display threshold is needed for making sense out of this multitude of information.

The current section is about a technique for creating diagrams for parameters (group) with more than eight values (labels). The technique is to display high level implied labels rather than the low level labels that were actually assigned to the observations. Remember that implied labels were introduced in section 3.2.6 Label Taxonomies.

Global high level diagrams

A global picture of a big parameter can be obtained by grouping the actual low level labels in subgroup, treating the disjunction of the labels in one subgroup as a single, more general label, and using these more general labels as displayed labels. Make sure "demowork.wrk" is open and is the current workshop on your Abundantia Verborum desktop! Next set the following displayed labels:

These are all the high level labels in the group SAID-OF, apart from SAID-OF:(entity). We do not include this one, because it does not introduce any extra information (it is implied by all other high level labels), and it would complicate the diagram. After you selected the labels, set the diagram type to Hasse and the display threshold to 0% (enabled)! Then click OK!

A first thing we can read from the diagram is that, surprisingly, the node {} is not empty. Apparently, and contrary to our intentions, our high level labels do not cover the complete workshop. You could investigate what is the problem by setting the filter to NOT(SAID-OF:(entity)) and then using "Workshop | Browse Observations..." to check which observations are still visible in the Observation Browser, because these will be the ones that populate the current node {}. You don't have to do this now. In section 3.3.6 Zooming in on diagram parts we will learn about a faster, more direct technique for exploring the population of a node. But if you would do it, you would find out that a single observation, tagged with SAID-OF:feeling, causes the problem. Apparently, we have forgotten to link this label to the higher layers of the taxonomy. If you would add the rule SAID-OF:feeling ---> SAID-OF:(non-physical entity) to the workshop, the diagram would instantaneously loose its bottom node {}. This is one of the advantages of Hasse diagrams: the node {} is very useful for signaling classification phase errors. These errors would be much harder to detect on the basis of Schematic diagrams.

Another thing we can see on the diagram, on the basis of the existence of the node {4}, is that at least one observation is tagged as SAID-OF:(physical entity), but is neither SAID-OF:(organic physical entity) nor SAID-OF:(non-organic physical entity). As you will learn to verify for yourself in 3.3.6 Zooming in on diagram parts, the node {4} is inhabited by one observation. This observation is SAID-OF:line product, and since line products can be both organic or non-organic, we judged the best solution to be to introduce the rule SAID-OF:line product -> SAID-OF:(physical entity).

Our first two observations reflect classification errors or decisions. But most of all we want to learn about features of the actual data, rather than about features of our own classification activities. Therefore let us look at the gray scales in the nodes. What we learn is that the following is the SAID-OF top three: by far the most frequent use of "old" is in the context of animate, organic entities. On the second place we have non-organic physical entities. On the third place come the non-physical entities. Of course, this example merely illustrates the technique of generating such statistics. Contents-wise it cannot serve as a proper example. For that the workshop is far too small.

Multiple hierarchies

In the example all high level labels more or less fit into one taxonomy and therefore it makes sense to include them all in one diagram. But this does not have to be the case. It is possible to encode multiple hierarchies into the implication rules, and then investigate these alternatives one by one in separate diagrams. For instance, you could expand the current workshop with high level SAID-OF labels that classify the entities according to their size or shape (using "not applicable" as a catch all), or you could expand the workshop with high level SAID-OF labels that classify the entities according to positive, negative or neutral connotation.

Introducing low level elements in the diagrams

After you have obtained a first global picture with one or several diagrams that contain nothing but high level labels, it may be useful to test your taxonomy by replacing one high level label by the low-level labels it encompasses. The resulting hybrid diagram may be useful to detect statistically remarkable phenomena in the data that are hidden in the high level diagram. Such observations may then lead to modifications to the taxonomies, or to the preference of one taxonomy above another.

Implied labels label groups

One final remark about diagrams and implied labels is that in "demowork.wrk" we have stored both low level and high level "said-of" labels in one and the same group. This is not obligatory. You could reserve the group SAID-OF for low level labels and introduce e.g. the group (SAID-OF) for the high level labels. You could even have several high level groups, in case you have several alternative taxonomies. Technically there is no problem, because implication rules can cross label groups. Contents-wise, in a sense, it is even a neater solution. First of all, there is a more explicit distinction between actually assigned labels and implied labels. Secondly, it may be judged to be closer to the spirit of using groups as containers of comparable things.


Back to table of contents