The two most fundamental diagram types used by Abundantia Verborum, Venn and Hasse diagrams, basically serve the same purpose in the program. The Venn diagram is the most familiar type of the two, but it has some drawbacks. The Hasse diagram provides an alternative that scores better at these points. The third diagram type, the Schematic diagram, provides a more linguistics-oriented perspective on the data.
Venn diagrams need no elaborate introduction. The last few decades they have been the standard way of graphically representing sets and set membership relations. Ellipse-shaped figures represent sets. They function like containers. Although not always actually depicted, the members of a set are taken to be located within the boundaries of the ellipse. Non-members are located outside of the ellipse. If depicted, elements are often represented as dots. The overlap of two or more ellipses represents the intersection of these sets. The members of this intersection are located within this overlapping region. Non-members are outside of this region.
In Abundantia Verborum a Venn-diagram representation of a set always has the name of a label, and it represents the set of all observations in a workshop that have been assigned this label. If the workshop is filtered, in other words, if there is a filter active, then only the observations that are matched by the filter are taken to belong to the workshop at that moment. The filter mechanism was introduced in section 3.2.3 Adding labels via filters. Its relation to diagrams will be the topic of section 3.3.5 Filtered diagrams. The labels assigned to an observation include both the labels that were added explicitly and the labels that were added implicitly though implication rules. For implied labels, see section 3.2.6 Label taxonomies. Using implied labels in diagrams is discussed in section 3.3.4 Diagrams and implied labels.
Make sure the workshop "demowork.wrk" is open and active on the Abundantia Verborum desktop and make sure its filter is empty! Then choose "Workshop | Set Graph..."! The Graph Settings dialog box appears. Set the graph type to "Venn diagram"! In order to do so first click on the box with the little triangle that points downwards and then click on the item of your choice. Next make sure that "Disable Threshold" is checked! Finally you have to set the most crucial part of the graph settings, namely the displayed labels. Whatever settings you choose for graph type and for display threshold, as long as you have not selected any displayed labels, there is no diagram.
The displayed labels are the building blocks of your diagram, the
criteria for making the piles. Click on "Add Group"! In the
Select Label Group dialog box select COMPAR and
click "OK"! Back in the Graph Settings you see that
the labels of the group COMPAR are now the
current displayed labels. Click "OK" to apply the new
graph settings!
The workshop window now looks quite different from what it looked before. Three of the four panels in the window have changed. You might want to maximize the window to obtain an optimal overview. The "Display threshold" panel, the middle one on the left, signals that the display threshold is off. For the moment, we leave this information for what it is. The "Displayed labels" panel, the bottom panel on the left, lists the displayed labels, and assigns a number to each. These numbers reappear in the actual diagram in the "Graph" panel on the right. In this diagram the ellipses, or rather circles, are named 1, 2 and 3. Their full names can be looked up in the "Displayed labels" panel. Instead of using dots for representing elements, gray scales are used to represent the 'population' of regions. Dark regions contain many observations, light regions few.
What we can read on the diagram is that the largest group of
observations has the label COMPAR:POS and
that a second, smaller group has the label COMPAR:COMP.
Further we see, on the basis of the white regions, that no observations
contain the label COMPAR:SUP, that no observations
contain more than one of the displayed labels, and that there
are no observations that contain none of the displayed labels.
These facts, of course, do not come as a surprise. They're just
a first exercise in interpreting diagrams.
Let us create another diagram. Click, with the left mouse button,
anywhere in either the "Display threshold" panel or the
"Displayed labels" panel! You'll see that the Graph Settings
dialog box appears. Indeed, clicking on either one of these
panels is a shortcut for "Workshop | Set Graph...", like
clicking on the "Workshop filter" panel is a shortcut for
"Workshop | Set Filter...". In the Graph Settings dialog
box remove the label COMPAR:COMP from
the list of displayed labels! You do this by clicking
on it to select it, and then clicking on "Delete selection". After
you have done this, click "OK". The new diagram displays
the distribution of the observations over the sets
COMPAR:POS and COMPAR:SUP. In contrast to
the first diagram, now the region outside of both sets is
non-empty. You would expect this region to be gray, but
instead of making everything outside of the sets
gray (which would hide the numbers of the sets), only a small triangle
in the upper lefthandside corner is made gray. So this is
an important little corner of the picture.
In the beginning of this section we spoke of a drawback of Venn diagrams. This drawback shows up when you have more than three sets being displayed. Open the Graph Settings dialog box again and remove all displayed labels with "Clear All"! Now make the labels listed below with "Set Labels"! In the Select Labels dialog box you first have to select the correct group before you can select a label. Selecting and deselecting a label is done by clicking on it. Selected labels have a plus sign before their name (cf. the Set Observation's Labels dialog box in 3.2.5 Manually adding labels). The order in which you select the labels determines the order in which they will appear in the list of displayed labels.
SAID-OF:personSAID-OF:line productSAID-OF:foodSAID-OF:buildingsHasse diagrams are used in mathematics to represent a.o. lattices and Boole algebras. They also show up in other sciences. For instance, they are used to represent crystal structures. In Abundantia Verborum they are used to chart label configurations. They turn out to be a valuable alternative to Venn diagrams. Before we look at them, first some theory.
Hasse diagrams are graphs, consisting of nodes, represented as circles, and links, represented as lines between the circles. In Abundantia Verborum the nodes of a Hasse diagram represent, and carry the name of, the different subsets of the set of displayed labels. There are as many nodes as there are different subsets. Suppose the set of displayed labels would contain the following labels:
SAID-OF:personSAID-OF:line productSAID-OF:foodSAID-OF:buildings{1,2,3,4}{1,2,3}{1,2,4}{1,3,4}{2,3,4}{1,2}{1,3}{1,4}{2,3}{2,4}{3,4}{1}{2}{3}{4}{}In Abundantia Verborum the nodes of a Hasse diagram are the counterparts of the regions in the Abundantia Verborum Venn diagrams. Just like these regions, nodes too are thought of as containing observations. More precisely, a node contains those observations of a (filtered) workshop that have all the labels in the node's name and that do not have any displayed labels that are not in the node's name. As in Venn diagrams, the 'population' of a node is represented by a gray scale, dark being crowded. We conclude the 'theory' with the remark that what was said for Venn diagrams about filtered out observations and about implied labels also applies to Hasse diagrams, which meant that, the former are not, and the latter are taken into account when calculating frequencies.
Back to practice. Open the Graph settings dialog box, clear the list of
displayed labels, and once again add all labels
of the COMPAR group with "Add group"! As you start using
the program for your own work you'll notice that you typically
will want many if not all labels from the same label group
in a diagram. Therefore the button "Add group" is often a convenient
tool. This being said, nothing forbids you to mix labels from
different groups in the same diagram. But such diagrams are likely
to be more difficult to interpret.
Select the graph type "Hasse diagram" and click "OK" to activate
the new diagram! This diagram is the counterpart of
the first Venn diagram we tried out above. Notice the similarity
of the diagram with the icon of Abundantia Verborum.
For a moment think of what is depicted as a three-dimensional cubical
construction with little spheres attached to its corners.
Let us call the direction going from node {} to node {1} the heigth of the object, the
direction going from node {} to node {2} the depth of the object and
the direction going from node {} to node {3} the width of the
object. In section
3.2.1 Using labels we introduced the following
metaphor: "You can think of a workshop as
an n-dimensional space,
n being the summed cardinality of all label groups. In each
dimension there are two possible locations, namely point 0 for 'label
absent' and point 1 for 'label present'. The observations are spread
over this n-dimensional space in such a way that their location corresponds
to which labels they do and which they do not have."
Hasse diagram can be looked at in a similar way, with this difference
that in the diagrams only a few dimensions are displayed so that the information
becomes representable. In our example heigth is the dimension of
COMPAR:POS. Being in some corner of the ceiling of the
construction implies having the label COMPAR:POS. Being
somewhere on the floor implies not having this label. In a
similar way depth is the dimension of COMPAR:COMP and
width is the dimension of COMPAR:SUP.
At first it will seem counter-intuitive to map COMPAR-information
on a compound scale consisting of three component binary scales, rather
than on one scale with three positions POS, COMP and SUP.
The reason for mapping information related to a group on a compound scale that bundles the
component scales of the individual labels of the group,
is that it yields one general type of representation, applicable
to all sorts of groups or other label sets. For
instance, it allows for observations to have compound values, or
to have no value at all for a particular group. Compound values
are often interesting for linguistic information. We already
have used them in the SAID-OF and the SEM
groups.
Now click on the button in the speed bar that looks like a
small tree diagram! This button is the speed bar equivalent of
"Workshop | Set Graph...". We introduce the speed bar button now
because the representation
in its icon is a midget version of the type of diagram we
treat next, the Schematic diagram. In the Graph Settings dialog
box clear all current
displayed labels and then add the group SEM!
Next select "Horizontal Schematic Diagram" as graph type! Finally
make sure the "Display Threshold" is enabled! We're aware of the
fact that display thresholds have not been explained yet. Please
be patient. We'll refer back to this passage in
3.3.3 The display threshold.
Finally Click "OK"!
A tree-like diagram appear. You can think of this diagram
as a rudimentary attempt by the machine to divide the observations
in the workshop into different readings of "old", on the
basis of their SEM labels. You could call it
a first proposal for a dictionary entry structure for
the lemma "old". According to the diagram there are six basic readings.
They are the sons of the root of the tree, namely:
1" (reading A)2" (reading B) 3" (reading C)6" (reading D)7" (reading E)8" (reading F)SEM:having old age (reading A)SEM:having age (reading B) SEM:not most recent type (reading C)SEM:no longer existing (reading D)SEM:dating back a long time (reading E)SEM:from other era (reading F)1", has one son
that represents a more specific sub-case: "1,5". The name of this node can be paraphrased
as AND(SEM:having old age, SEM:turned bad). Paraphrasing with
filter syntax is appropriate here, because in a sense the nodes in a
Schematic diagram function as an
additional filter, on top of the one (if any) specified in the
"Workshop filter" panel. You can think of node
"1"
as a container of all observations in the
(filtered) workshop that are matched by the
additional filter SEM:having old age, and likewise you can think of
node "1,5" as a container of all
observations in the (filtered) workshop that are
matched by the additional filter
AND(SEM:having old age, SEM:turned bad).
The second reading has no further subdivisions. The third one
has two, namely AND(SEM:not most recent type, SEM:outdated)
and
AND(SEM:not most recent type, SEM:no longer existing).
The latter can also be seen as a subtype of reading D,
SEM:no longer existing. This is indicated by the
red link. The link is red to indicate that it violates the
tree-nature of the graph. In a proper tree a node cannot be the
son of more than one father. The presence of the red links in the
example indicates that the distribution of SEM labels
in the workshop does not reflect a purely classical,
i.e. hierarchical semantic structure. The readings overlap.
To finish the description of the example: reading D has two
subdivisions; reading E has none; reading F finally has one, which also is a subclass
of reading D. Note that the fact that the diagram depicts this subclass
in the subtree of reading D, and only links it with an oblique
red link to reading F, should not be interpreted as
an indication of preferred classification.
It is an arbitrary choice by the program, and
could just as well have been the opposite way. The same is true for all
red links in Schematic diagrams.
So much for the informal presentation of the diagram type. How is
the diagram constructed ? Any Schematic diagram has at least one root, namely the node root
which is always depicted. The maximum number of nodes that a Schematic
diagram can have is equal
to the number of nodes the corresponding Hasse diagram has
(with display threshold off). If we would have used
the following displayed labels...
SAID-OF:personSAID-OF:line productSAID-OF:foodSAID-OF:buildings1,2,3,41,2,31,2,41,3,42,3,41,21,31,42,32,43,41234rootAn important difference between Schematic diagrams and
the other two diagram types is that Schematic diagrams inherently rest
upon the
display threshold principle, which is the principle that
candidate pieces of the graph that do not meet a specific condition
are not displayed.
In the current diagram in the program there were 256 candidate nodes,
since there are eight displayed labels, and a set with eight elements has
two to the power of eight, which is 256, subsets. So why are
only 11 displayed? A lot of information, if not most,
in Schematic diagrams is in the presence and absence of nodes.
For example, restricting our attention to reading A
for a moment (and shifting again to a less formal level of
explanation), what
does it mean that
both the nodes "1" are "1,5" are displayed in the
subtree of this reading, and no others?
First of all, it means
that in the (filtered) workshop displayed label 1
does co-occur with displayed label 5 in at least one observation.
Otherwise the leaf node "1,5" would not be there.
Second, it means that in the (filtered) workshop
displayed label 1 does not co-occur with any other
displayed label but 5.
Otherwise node "1" would have other descendants than
"1,5".
Finally, it means that there are observations that have
displayed label 1 and do not have displayed label 5. Otherwise
node "1" would not be there and node "1,5" would
be attached directly to the root. Of course, the lack of cross links
to other readings, and moreover the fact that neither 1 nor 5
occur anywhere in the names of nodes outside of the reading A subtree, are
also informative: these facts show that
reading A is clearly isolated from the other readings.
Summarizing all this we conclude that
reading A consists of the cluster 1+5, in which
1 seems to be obligatory and 5 seems to be optional, and that the
reading does
not share any labels with other readings. The rest of the
diagram can be interpreted in a similar vein. More technical detail
about the display threshold are given in
3.3.3 The display threshold.
The second important difference between Schematic diagrams and the
other two diagram types is that in Schematic diagrams observations
do not necessarily have a unique location. In Schematic diagrams the gray
scale of node A
reflects the percentage of observations in the (filtered)
workshop that are matched by the additional filter that paraphrases A.
To take reading A again as an example, it is clear that all
observations matched by
AND(SEM:having old age, SEM:turned bad), the
paraphrase of "1,5", are also matched by
SEM:having old age, the
paraphrase of "1". The general rule is that
in a Schematic diagram the inhabitants
of node A by definition also inhabit all ancestors of A. Or in other
words, the population of a node is a (specific) subclass of the
population of its father node. By extrapolation the root
node represents all observations in the (filtered) workshop,
but this small fact doesn't become relevant until section
3.3.6 Zooming in on diagram parts.
We've come at the end of the presentation of the different diagram types. Before you go to the next section, save the workshop by clicking on the first button in the speed bar (the one with an arrow pointing to a disk) and then close the workshop!