In 3.3.2 Venn, Hasse and Schematic diagrams we presented the three basic diagram types that Abundantia Verborum provides for displaying statistics about workshops. The display threshold, the topic of this section, is a device for simplifying complex graphs, retaining only the most important information. To set the display threshold is to specify which information is important enough to be displayed. It can be used with all three diagram types. However, the mechanism is not equally crucial in the three types. Incidentally, the order of appearance of the different graph type in the previous section is inversely proportional to the degree to which they rest upon the display threshold mechanism, which is the topic of this section. Therefore we reverse the order this time, starting with the more important uses of the threshold mechanism.
In the last part of the previous section 3.3.2 Venn, Hasse and Schematic diagrams we saw that Schematic diagrams are inherently linked to the display threshold mechanism, and we explained how this diagram type behaves with the default threshold settings (which is: threshold on, and set to zero percent). Open the "demowork.wrk" workshop and maximize it! If you followed the instructions of the previous section in detail, the workshop should now redisplay the Schematic diagram we worked with in that section. Current graph settings are saved together with a workshop and restored when the workshop is opened again. This is also true for the settings in the other panels of a workshop window. For instance, the current filter is also saved together with a workshop and restored when the workshop is opened again. Another example is the current display threshold.
The "Display threshold" panel, the middle panel in the left
part of the workshop window, should read
"Display threshold: 0%". What this means, in the
context of Schematic diagrams, is that the condition for admitting
a node to the diagram is that its weight is strictly higher than 0%.
Now, the concept weight of a node is somewhat
complex in a Schematic diagram. Rather than being calculated
on the basis of its own population, it is calculated on the
basis of the population of its counterpart node in the equivalent
Hasse diagram. Let us take node "1" as an example.
The population of this node is the set of all observations
in the (filtered) workshop that are matched by the condition
SEM:having old age. This set includes all
of reading A cases
(cf. 3.3.2 Venn, Hasse and Schematic diagrams),
also those in node "1,5".
However, the reason for existence of the
node "1" itself is not
based on the frequency of reading A cases, but rather on
the basis of the frequency of reading A cases not already
represented by other nodes, such as node "1,5".
The latter frequency can be calculated as the population of node
{1}, the
Hasse diagram counterpart of node "1".
Let us have a closer look at the current diagram.
There are eight displayed labels, the labels
of the SEM group The display threshold
is set to 0%. When there are eight displayed labels,
there are 256 candidate nodes, since the
set {1,2,3,4,5,6,7,8} has 256 subsets.
Of those 256 candidate nodes only 11 are displayed.
The other have a weight equal to the display
threshold of 0%. For instance, node "1,3,6" does
not occur, because the Hasse diagram node "{1,3,6}" has a
population of zero percent, or in other words, because none
of the inhabitants of the (filtered) workshop have
exactly those three out of the eight displayed labels.
Actually, there is one exception to the rule. The root node
of a Schematic diagram is always displayed, even if the Hasse
diagram node {} is insufficiently populated (which
is the case in the current diagram).
Let us neglect the really rare cases in our workshop and
create a diagram that disregards all situations that occur only once.
Click on the little
triangle pointing to the right in the scroll bar in the
"Display threshold" panel! The threshold is now at 1%,
which means that a situation has to occur more than 0.32 (1 percent
of all 32 observations) times. Of course, the diagram
does not changes. Click on the triangle a second, a third and
a fourth time! The threshold is now at 4%, which means
that situations occurring only once are disregarded
(1 is less than or equal to 1.28). The diagram is reduced to five nodes (apart from the root).
Disregarding all phenomena that occur only once, the SEM
information for "old" can nicely be classified in four non-overlapping
readings. Set the threshold to 7% to further reduce the schema! Only three
nodes survive (apart from the root), none of which have more than one
label in their name. In other words, SEM
label clusters tend to be less populated than
isolated labels.
The weight calculation algorithm for Schematic diagrams we just explained is the default algorithm in Abundantia Verborum, but it is only one of the two types supported by the program. It is the most straightforward but also the rougher variant (hence the epitheton "rough" in the title of the "Graph panel"). The finer variant renders a cleaner picture, but requires more complex, time consuming calculations. Close "demowork.wrk" and open the workshop "schemata.wrk". This is a small, artificial dummy workshop, installed with the program to illustrate the difference between rough and fine calculation of Schematic diagrams. The table below illustrates the structure of the workshop.
| ||||||||||||||
| table 1 : the structure of "schemata.wrk" |
The difference between rough and fine calculation is that in rough calculation the weight of a node is calculated locally, solely on the basis of features of the node itself, whereas in fine calculation the weight of a node can be influenced by its environment. More precisely, in fine calculation, whenever a node is canceled out by the threshold, its weight adds to the weight of its father(s). If the father(s) too would be canceled out, the weight is further propagated. The propagation algorithm starts at the leaves and progresses towards the root. One final subtlety in the algorithm is that care is taken that extra weight originating from a particular node does not contribute more than once to the weight of a predecessor in case there is more than one path leading from the former to the latter.
In "schemata.wrk", add the group "dummy" to the displayed labels, choose a horizontal Schematic diagram, and make sure the display threshold is on and set to zero percent! The resulting Schematic diagram is a tree that consists of one branch:
[root]---[1]---[1,2]---[1,2,3]
Now increase the threshold to 17%! Since the population
of the Hasse node {1} consists of 1 out of 6
observations, which is 16.67% of the workshop, the
node Schematic "1" has disappeared. The new diagram
looks like:
[root]---[1,2]---[1,2,3]
Now increase the threshold to 34%. Since the population
of the Hasse node {1,2} consists of 2 out of 6
observations, which is 33.33% of the workshop, the
Schematic node "1,2" has also disappeared. The new diagram
looks like:
[root]---[1,2,3]
Now increase the threshold to 50%. Since the population
of the Hasse node {1,2,3} consists of 3 out of 6
observations, which is 50% of the workshop, the
Schematic node "1,2,3" has also disappeared. The new diagram
looks like:
[root]
Now choose the menu command "Options | Preferences..."! The User Preferences dialog box appears. Click on "Load" and load the settings called "finecalc.ini"! The settings specified by this file are exactly the same as the Abundantia Verborum default settings, apart from the fact that it selects the fine algorithm for calculating Schematic diagrams. Close the User Preferences dialog box with "OK"! Nothing happens yet, but all subsequent calculations will use the fine algorithm.
Set the display threshold to zero again! The resulting Schematic diagram, depicted below, is again the one branch tree we started out with above in the rough calculation. The only difference is that this time the title of the "Graph" panel signals that the program uses the "fine" algorithm.
[root]---[1]---[1,2]---[1,2,3]
Now increase the threshold to 17%! Since the population
of the Hasse node {1} consists of 1 out of 6
observations, which is 16.67% of the workshop, the
node Schematic "1" has disappeared. The new diagram
looks like:
[root]---[1,2]---[1,2,3]
The first difference with the rough calculation comes if you
increase the threshold to 34%! As was excepted,
the node Schematic "1,2" has disappeared,
since the population
of the Hasse node {1,2} consists of 2 out of 6
observations, which is 33.33% of the workshop.
But at the same time "1" has reappeared. This is because
the {1,2} cases have moved up the tree, adding to
the weight of "1", which now has a weight of
16.67% + 33.33% = 50%, and therefore survives the threshold.
The reasoning behind the algorithm is that the inhabitants of
{1,2} are also node "1" cases and by lack
of a node "1,2" can give node "1" reason of existence.
In rough calculation the population of {1,2} is neglected
as soon as "1,2" disappears.
In fine calculation these cases are
'recovered' at a higher level.
In the dummy example, the new diagram looks like:
[root]---[1]---[1,2,3]
Now increase the threshold to 50%! Since the population
of the Hasse node {1,2,3} consists of 3 out of 6
observations, which is 50% of the workshop, the
Schematic node "1,2,3" has disappeared.
At the same time "1,2" has reappeared
since the {1,2,3} cases have moved up the tree, adding to
the weight of "1,2", which now has a weight of
33.33% + 50% = 83.33%, and therefore survives the threshold. Node
"1" finally has disappeared again, since this time
it is no longer fed with extra weight, and its own weight
of 16.67% is far below the display threshold.
The new diagram
looks like:
[root]---[1,2]
In fine calculation the picture cannot be reduced to its utter limit. The maximum threshold value is the point where no node fits the threshold in its own right. This is just a limit imposed by the current implementation of Abundantia Verborum. In theory one could continue, working with nothing but cluster nodes, i.e. nodes that take their right of existence from the weight of disappeared descendants. The next step in the example would be a threshold value of 84%. The diagram would be:
[root]---[1]
In this diagram node "1" has a weight of 16.67% +
33.33% + 50% = 100%. Making this last node disappear would take a
display threshold of 100%.
The dummy workshop "schemata.wrk" is an extreme case, maximizing the
difference between the two types of calculation. In practice the resulting
diagrams will not differ this much, especially if one sticks to
modest thresholds (which normally is the case). Nevertheless it
is important to understand the conceptual difference between the
two, in order to be able to judge which method is most
appropriate for one's own case study. To give an example of
the interpretation of the difference:
applied to the SEM example we used above, fine calculation
would allow readings to be bundles of dispersed related cases, whereas rough
calculation would demand that readings have a strong nucleus.
To conclude this subsection, let us restore the default settings of
the program. Choose "Options | Preferences..."! In the User Preferences
dialog box click on "Restore Default" and then click on "OK" to close
the dialog box! Note that more about the topic "User Preferences"
can be found in the online Abundantia Verborum Help.
We saw that the display threshold can have a drastic impact on Schematic diagrams, especially in fine calculation. Increasing the threshold can make nodes disappear and reappear again and can fundamentally change the overall shape of the diagram. In Hasse diagrams this is different. Here a display threshold behaves like an eraser. Parts of the diagram may disappear, but the rest of the picture remains intact. As in Schematic diagrams, the criterion for a node to stay in is its weight. In Hasse diagrams the weight of a node is a straightforward concept. It is the population of the node, relative to the population of the whole (filtered) workshop. If a node is not depicted, links to that node disappear too.
Close "schemata.wrk" and open "demowork.wrk" again! Open the Graph settings dialog box, and set the following displayed labels:
COMPAR:POSCOMPAR:COMPCOMPAR:SUPSAID-OF:machineSAID-OF:personWe combine displayed labels from different groups. This is one technique to look for interesting correlations between parameters (supposing that groups stand for parameters). In general, it is not the easiest technique. More straightforward approaches are discussed in 3.3.5 Filtered diagrams. The currently explained approach has the advantage that it compresses a lot of information in one diagram, but has the drawback that the resulting diagrams are not always easy to interpret.
Since we have 5 displayed labels and the display threshold is off, the diagram contains 2 to the power of 5, which is 32, nodes. Hasse diagrams of this complexity become difficult to oversee. To get a clearer picture, open the Graph Settings dialog box again and enable the display threshold (also make sure its value is set to 0%)! Then click "OK"! The resulting picture, which has 6 out of the original 32 nodes, still contains all the information that was in the previous one. Only, the empty node have been erased, together with all links to these empty nodes. This illustrates the major function of the threshold in Hasse diagrams. Is serves to clear up a picture. Empty nodes are only informative in case you're interested in questions such as: what are all the theoretically possible label combinations, and which of those do not occur? But if you're more interested in what does occur, you may as well leave out the empty nodes.
How can we interpret the resulting diagram? First of all, the
absence of {} in combination with the fact that
all present nodes have either "1" or "2" in their names tells us
that the observations in the workshop are all either
COMPAR:POS or COMPAR:COMP.
Further you see that both these types show up in three different
SAID-OF contexts, namely either SAID-OF:machine
(cases {1,4} and {2,4}) or
SAID-OF:person
(cases {1,5} and {2,5})
or neither one of these two
(cases {1} and {2}).
Interesting is that both in the person cases and in the neither
person nor machine cases positive use is more frequent than comparative use,
whereas in the machine cases comparative use is more frequent than positive use.
Perhaps
people have less difficulty with qualifying machines as "older" than
they have with doing the same thing for other categories,
especially for people?
Whatever the reason for the phenomenon, if we would want to go into it,
the first step would be to corroborated it with data that have
more statistical weight. After all, here we're looking at only 32 observations.
Like in Schematic diagrams, we can also use the threshold for canceling out the really rare cases. Set the threshold to 5%! This gets rid of all nodes with only one inhabitant (1/32 = 3.124, which is less than or equal to 5). Now only four nodes remain. From this diagram we can read things like: disregarding extremely rare phenomena (below 5%) the only significant use of "old", when applied to machines, is comparative use.
Click on the "Graph" panel with you RIGHT mouse button!
A popup menu appears, listing the different diagram types. Choose
"Venn Diagram"! This is the fastest ways to switch between diagram
types, and it is very practical if you want to change nothing but the
diagram type. Currently the short cut doesn't help us much,
because we are going to change some other graph settings as well.
Our Venn diagram is incomplete, since there
are five displayed labels, so let use choose a simpler diagram.
Open the Graph Settings
dialog box! Click on COMPAR:COMP in the list
of displayed labels, and then click on "Delete Selection" to
remove this element from the list! Then click on
"Delete Selection" again to remove COMPAR:SUP from the list!
Also make sure the display threshold is enabled and set to 0%.
Finally click "OK"!
The resulting diagram has only three displayed labels, but
since we already learnt from the data that in the current workshop
not having COMPAR:POS implies having COMPAR:COMP
(and vice versa),
the diagram still contains all information that was in the
5 label Hasse diagram we used above. We invite the reader to
interpret the diagram, hereby admitting that interpreting diagrams
with heterogeneous labels is not straightforward.
The function of the display threshold in Venn diagrams is still more modest than in Hasse diagrams. It does not make the picture more clear (Why should it? Big Venn diagrams don't become complex, they rather become incomplete). It merely blanks out regions that, according to their weight and the threshold value, are judged to be insufficiently significant. The weight of a region is its population, relative to the population of the whole (filtered) workshop. Just increase the threshold value and see what happens!