In the previous section we concluded that in studies such as "vers_wnt.wrk" the computer basically does preparative work, and that in the last phase of the case study the linguist, due to the non-quantitative nature of what needs to be done in this phase, more or less is all by himself and has to rely on complementary analyses. Strategies for taking the functionality of the program one step further could take several forms. We consider three that are inspired by the "vers_wnt.wrk" case, but that we judge to also have a more general applicability. Finally we introduce a fourth, of a more general nature.
A first family of extensions relates to assisting the user in what remains to be done in the last phase, after the preparative calculations by the computer have been carried out. One could opt for including drawing functionality in the program, so that the user could enrich the diagrams with the results of complementary analyses. Technically such extensions would not pose any major problems. The difficulty would rather be in the choice of what is supported. Candidates are:
A second family of extensions relates to taking the preparative calculations a step further, introducing new functionality. In the previous section we already mentioned the possibility of having the program suggest a division of uses over senses, on the basis of the overall frequency-based structure of the semasiological profile.
Heuristics for this could be implemented on top of the existing graph mechanism. The following is a candidate algorithm, in simplified pseudo-code, that is based on Hasse diagrams (in the form of complete hypercubes mapped on the two-dimensional plane), thought of as a landscape in which weight represents the (third) dimension of height. We apologize for the formalness of the description, but we judged it to be a condicio sine qua non, or at least the most straightforward method, for obtaining an accurate description. A less accurate description would be that the program divides the nodes, which are seen as point in the plain ("uses"), over hills/mountains ("sense"), which are identified by their highest peek (the prototype). The heuristic is based on the hypothesis that senses are shaped as hills/mountains. It consists of three steps:
The reader who is not interested in the exact behaviour of the algorithm may choose to 'jump' to the discussion of an alternative approach, which immediately follows the pseudo-code.
The pseudo-code is written in the spirit of the object-oriented paradigm, but a few liberties are taken to reduce the need for explaining technicalities (such as the use of messages and the consistent use of parameters). The parts between curly braces are comments. The diagram itself is described as an object of the typeHASSEDIAGRAM.
The algorithm is described as the method HASSEDIAGRAM.ProposeSenses.
By "method" we mean a procedure dedicated to a specific type
of object. In this case the method ProposeSenses is dedicated to
the object type HASSEDIAGRAM. Basically the
code is a clustering detection algorithm that is fine-tuned
to what we expect to be typical structures in semasiological
profiles. But at present there are not enough data available to
either support or contradict this expectation. In short, the algorithm
has not proved its use yet.
{specification of the object type NODE}
NODE is an object type with the fields
-Height, containing a Real number
-CurrentSummit, containing a Node
-FutureSummit, containing a Node
-HillSize, containing a Real number
-AdjustedHeight, containing a Real number
and the methods
-AssignNewSummit(X), of which the parameter X is a Node
-GoToNextState
end
{specification of the method NODE.assignsummit(X), which
makes X the new summit of the hill that the NODE belongs to;
this method modifies the field GlobalActionCounter of the
diagram the node is part of}
NODE.AssignSummit(X) is a method containing the steps
begin
-increment the field GlobalActionCounter of the diagram by 1
-set the field NODE.FutureSummit to X
end
{specification of the method NODE.GoToNextState, which updates
the state of the object to possible uses of NODE.AssignSummit(X) made
after the previous use of NODE.GoToNextState}
NODE.GoToNextState is a method containing the single step
begin
-set the field NODE.Summit to the value of the field NODE.FutureSummit
end
{specification of the object type HASSEDIAGRAM}
HASSEDIAGRAM is an object type with the fields
-Nodes, containing a set of objects of the type NODE
-GlobalActionCounter, containing an Integer number
-MinimalHillSize, containing an Real number
and the method
-ProposeSenses
and the hidden auxiliary methods
-StepOne
-StepTwo
-StepThree
-TimeTick
-DetermineMinimalHillSize
end
{specification of the auxiliary hidden method HASSEDIAGRAM.TimeTick
that similates the simultaneous, instantaneous evolution of the
whole diagram to its nexts state}
HASSEDIAGRAM.TimeTick is a method containing the single compound step
begin
-for each node X in the field HASSEDIAGRAM.Nodes do
begin
-X.GoToNextState
end
end
{specification of the auxiliary hidden method HASSEDIAGRAM.StepOne,
the first phase of HASSEDIAGRAM.ProposeSenses, in which
a first calculation is made of the landscape. The landscape
consists of hills, which are made up of local maxima and the areas they
dominate; the result of the method is that each node in the
the diagram contains in its Summit field the top of the hill
it is calculated to be part of on the basis of local slopes
in the landscape; for simplicity, the pseudo-code doesn't explain how
to determine which are neighbouring nodes. Graphically they can
be identified as those that have links between then in the Hasse diagram}
HASSEDIAGRAM.StepOne is a method containing the steps
begin
{initialisation of the fields; each node initially is a separate hill}
-set HASSEDIAGRAM.GlobalActionCounter to 0
-for each node X in the HASSEDIAGRAM.Nodes do
begin
-X.AssignSummit(X)
end
-HASSEDIAGRAM.TimeTick
{clustering of the nodes into larger hills, on the basis of the slope }
-while (HASSEDIAGRAM.GlobalActionCounter is higher than 0) do
begin
-set HASSEDIAGRAM.GlobalActionCounter to 0
-for each node X in HASSEDIAGRAM.Nodes do
begin
-for each node Y in HASSEDIAGRAM.Nodes that is a neighbour of X do
begin
-if (Y.Height is higher than or equal to X.Height) and
(Y.Summit.Height is higher that X.Summit.Height) then
begin
-X.AssignSummit(Y.Summit)
end
end
end
-HASSEDIAGRAM.TimeTick
end
end
{specification of the auxiliary hidden method HASSEDIAGRAM.StepTwo,
the second phase of HASSEDIAGRAM.ProposeSenses, in which
it is determined which hills seem important enough to receive
the status of sense (or at least nucleus of a sense);
the method HASSEDIAGRAM.DetermineMinimalHillSize, which is supposed
to result in the field HASSEDIAGRAM.MinimalHillSize containing
a threshold value for accepting a hill as a sense, is not elaborated;
it could some heuristic, or a dialog with the user}
HASSEDIAGRAM.StepTwo is a method containing the steps
begin
-for each node X in HASSEDIAGRAM.Nodes do
begin
-set X.HillSize to 0
-for each node Y in HASSEDIAGRAM.Nodes do
begin
-if (Y.Summit is X) then
begin
-increase X.HillSize by Y.Height
end
end
end
-HASSEDIAGRAM.DetermineMinimalHillSize
end
{specification of the auxiliary hidden method HASSEDIAGRAM.StepThree,
the third phase of HASSEDIAGRAM.ProposeSenses, in which
the areas of the hills that do not meet the conditions set in
phase two, are redistributed over the hills that do meet these
conditions, on the basis of local features of the landscape}
HASSEDIAGRAM.StepThree is a method containing the steps
begin
{reinitialisation of hills that were not accepted}
-set HASSEDIAGRAM.GlobalActionCounter to 0
-for each node X in the HASSEDIAGRAM.Nodes do
begin
-if (X.Summit.HillSize is less than HASSEDIAGRAM.MinimalHillSize) then
begin
-set X.AdjustedHeight to 0
-X.AssignSummit(X)
end
else
begin
-set X.AdjustedHeight to X.Height
end
end
-HASSEDIAGRAM.TimeTick
{clustering of the nodes into larger hills, on the basis of the slope}
-while (HASSEDIAGRAM.GlobalActionCounter is higher than 0) do
begin
-set HASSEDIAGRAM.GlobalActionCounter to 0
-for each node X in HASSEDIAGRAM.Nodes do
begin
-for each node Y in HASSEDIAGRAM.Nodes that is a neighbour of X do
begin
-if (Y.AdjustedHeight is higher than or equal to X.AdjustedHeight) and
(Y.Summit.AdjustedHeight is higher that X.Summit.AdjustedHeight) then
begin
-X.AssignSummit(Y.Summit)
end
end
end
-HASSEDIAGRAM.TimeTick
end
end
An alternative approach to starting for diagrams would be to use classical clustering strategies to cluster the observations of a workshop, on the basis of vector representations of the label structure of the individual observations, and using singular value decomposition to reduce the dimensions of the vectors (cf. Golub and van Loan 1989). An advantage of this alternative would be that the results could be compared in a straightforward way to existing strategies in automatic word sense disambiguation that use similar techniques on the basis of vector representations of what in our terminology would be the contents field of the individual observations, rather than of the labels (cf. Schütze 1992).
A third family of extensions relates to modifying the features of the program's preparative work in order to improve the current functionality. A candidate in this category would be to explore the possibilities of displaying more complex graphs. One example has already been mentioned: by working with asymmetrical forms and profiting from the principle that empty regions need not be depicted, one could delay the moment where incompleteness occurs in Venn diagrams. Another possibility would be to make the maximum number of displayed labels dependent on the diagram type. This could, for instance allow for raising the limit in the case of Schematic diagrams, in which, given a specific number n of displayed labels, the actually depicted complexity in practice is typically far less than the theoretical maximum complexity (2 to the power of n). This modification is logically linked to a second one. Allowing for much higher numbers of displayed labels (and, consequently, longer calculation times) would introduce the need for having the option to have graphs only re-calculated on demand, as an alternative to the current situation of instantaneous recalculation whenever changes to the workshop have been made.
A completely different range of possibilities lies at the level of introducing functionality that involves several workshops. For instance, consider the situation that several case studies have been carried out in different workshops, but on the basis of the same labeling conventions. This situation could be served by the development and implementation of formulas for calculating e.g. the degree of similarity between the semasiological profiles in different workshops (which under certain conditions could serve as a quantification of 'degree of synonymy'). We stop here, because we do not want to write more about what is not in the program than about what is in the program. Nevertheless we hope to have shared our conviction that what is already there is a solid base for many interesting extensions (by which, for once, we do not mean semantic relations).