Abundantia Verborum

5. Workshops and cognitive linguistics

5.2 Workshops


5.2.4 Possible extensions

In the previous section we concluded that in studies such as "vers_wnt.wrk" the computer basically does preparative work, and that in the last phase of the case study the linguist, due to the non-quantitative nature of what needs to be done in this phase, more or less is all by himself and has to rely on complementary analyses. Strategies for taking the functionality of the program one step further could take several forms. We consider three that are inspired by the "vers_wnt.wrk" case, but that we judge to also have a more general applicability. Finally we introduce a fourth, of a more general nature.

drawing facilities

A first family of extensions relates to assisting the user in what remains to be done in the last phase, after the preparative calculations by the computer have been carried out. One could opt for including drawing functionality in the program, so that the user could enrich the diagrams with the results of complementary analyses. Technically such extensions would not pose any major problems. The difficulty would rather be in the choice of what is supported. Candidates are:

cluster detection

A second family of extensions relates to taking the preparative calculations a step further, introducing new functionality. In the previous section we already mentioned the possibility of having the program suggest a division of uses over senses, on the basis of the overall frequency-based structure of the semasiological profile.

Heuristics for this could be implemented on top of the existing graph mechanism. The following is a candidate algorithm, in simplified pseudo-code, that is based on Hasse diagrams (in the form of complete hypercubes mapped on the two-dimensional plane), thought of as a landscape in which weight represents the (third) dimension of height. We apologize for the formalness of the description, but we judged it to be a condicio sine qua non, or at least the most straightforward method, for obtaining an accurate description. A less accurate description would be that the program divides the nodes, which are seen as point in the plain ("uses"), over hills/mountains ("sense"), which are identified by their highest peek (the prototype). The heuristic is based on the hypothesis that senses are shaped as hills/mountains. It consists of three steps:

The reader who is not interested in the exact behaviour of the algorithm may choose to 'jump' to the discussion of an alternative approach, which immediately follows the pseudo-code.

The pseudo-code is written in the spirit of the object-oriented paradigm, but a few liberties are taken to reduce the need for explaining technicalities (such as the use of messages and the consistent use of parameters). The parts between curly braces are comments. The diagram itself is described as an object of the type HASSEDIAGRAM. The algorithm is described as the method HASSEDIAGRAM.ProposeSenses. By "method" we mean a procedure dedicated to a specific type of object. In this case the method ProposeSenses is dedicated to the object type HASSEDIAGRAM. Basically the code is a clustering detection algorithm that is fine-tuned to what we expect to be typical structures in semasiological profiles. But at present there are not enough data available to either support or contradict this expectation. In short, the algorithm has not proved its use yet.

{specification of the object type NODE}
  NODE is an object type with the fields
   -Height, containing a Real number
   -CurrentSummit, containing a Node
   -FutureSummit, containing a Node
   -HillSize, containing a Real number
   -AdjustedHeight, containing a Real number
  and the methods
   -AssignNewSummit(X), of which the parameter X is a Node
   -GoToNextState
  end

{specification of the method NODE.assignsummit(X), which
 makes X the new summit of the hill that the NODE belongs to;
 this method modifies the field GlobalActionCounter of the
 diagram the node is part of}
  NODE.AssignSummit(X) is a method containing the steps
  begin
   -increment the field GlobalActionCounter of the diagram by 1
   -set the field NODE.FutureSummit to X
  end

{specification of the method NODE.GoToNextState, which updates
 the state of the object to possible uses of NODE.AssignSummit(X) made
 after the previous use of NODE.GoToNextState}
  NODE.GoToNextState is a method containing the single step
  begin
   -set the field NODE.Summit to the value of the field NODE.FutureSummit
  end

{specification of the object type HASSEDIAGRAM}
  HASSEDIAGRAM is an object type with the fields
   -Nodes, containing a set of objects of the type NODE
   -GlobalActionCounter, containing an Integer number
   -MinimalHillSize, containing an Real number
  and the method
   -ProposeSenses
  and the hidden auxiliary methods
   -StepOne
   -StepTwo
   -StepThree
   -TimeTick
   -DetermineMinimalHillSize
  end

{specification of the auxiliary hidden method HASSEDIAGRAM.TimeTick
 that similates the simultaneous, instantaneous evolution of the
 whole diagram to its nexts state}
  HASSEDIAGRAM.TimeTick is a method containing the single compound step
  begin
   -for each node X in the field HASSEDIAGRAM.Nodes do
    begin
     -X.GoToNextState
    end
  end

{specification of the auxiliary hidden method HASSEDIAGRAM.StepOne,
 the first phase of HASSEDIAGRAM.ProposeSenses, in which
 a first calculation is made of the landscape. The landscape
 consists of hills, which are made up of local maxima and the areas they
 dominate; the result of the method is that each node in the
 the diagram contains in its Summit field the top of the hill
 it is calculated to be part of on the basis of local slopes
 in the landscape; for simplicity, the pseudo-code doesn't explain how
 to determine which are neighbouring nodes. Graphically they can
 be identified as those that have links between then in the Hasse diagram}
  HASSEDIAGRAM.StepOne is a method containing the steps
  begin
  {initialisation of the fields; each node initially is a separate hill}
   -set HASSEDIAGRAM.GlobalActionCounter to 0
   -for each node X in the HASSEDIAGRAM.Nodes do
    begin
     -X.AssignSummit(X)
    end
   -HASSEDIAGRAM.TimeTick
  {clustering of the nodes into larger hills, on the basis of the slope }
   -while (HASSEDIAGRAM.GlobalActionCounter is higher than 0) do
    begin
     -set HASSEDIAGRAM.GlobalActionCounter to 0
     -for each node X in HASSEDIAGRAM.Nodes do
      begin
       -for each node Y in HASSEDIAGRAM.Nodes that is a neighbour of X do
        begin
         -if (Y.Height is higher than or equal to X.Height) and
             (Y.Summit.Height is higher that X.Summit.Height) then
          begin
           -X.AssignSummit(Y.Summit)
          end
        end
      end
     -HASSEDIAGRAM.TimeTick
    end
  end

{specification of the auxiliary hidden method HASSEDIAGRAM.StepTwo,
 the second phase of HASSEDIAGRAM.ProposeSenses, in which
 it is determined which hills seem important enough to receive
 the status of sense (or at least nucleus of a sense);
 the method HASSEDIAGRAM.DetermineMinimalHillSize, which is supposed
 to result in the field HASSEDIAGRAM.MinimalHillSize containing
 a threshold value for accepting a hill as a sense, is not elaborated;
 it could some heuristic, or a dialog with the user}
  HASSEDIAGRAM.StepTwo is a method containing the steps
  begin
   -for each node X in HASSEDIAGRAM.Nodes do
    begin
     -set X.HillSize to 0
     -for each node Y in HASSEDIAGRAM.Nodes do
      begin
       -if (Y.Summit is X) then
        begin
         -increase X.HillSize by Y.Height
        end
      end
    end
   -HASSEDIAGRAM.DetermineMinimalHillSize
  end

{specification of the auxiliary hidden method HASSEDIAGRAM.StepThree,
 the third phase of HASSEDIAGRAM.ProposeSenses, in which
 the areas of the hills that do not meet the conditions set in
 phase two, are redistributed over the hills that do meet these
 conditions, on the basis of local features of the landscape}
  HASSEDIAGRAM.StepThree is a method containing the steps
  begin
  {reinitialisation of hills that were not accepted}
   -set HASSEDIAGRAM.GlobalActionCounter to 0
   -for each node X in the HASSEDIAGRAM.Nodes do
    begin
     -if (X.Summit.HillSize is less than HASSEDIAGRAM.MinimalHillSize) then
      begin
       -set X.AdjustedHeight to 0
       -X.AssignSummit(X)
      end
      else
      begin
       -set X.AdjustedHeight to X.Height
      end
    end
   -HASSEDIAGRAM.TimeTick
  {clustering of the nodes into larger hills, on the basis of the slope}
   -while (HASSEDIAGRAM.GlobalActionCounter is higher than 0) do
    begin
     -set HASSEDIAGRAM.GlobalActionCounter to 0
     -for each node X in HASSEDIAGRAM.Nodes do
      begin
       -for each node Y in HASSEDIAGRAM.Nodes that is a neighbour of X do
        begin
         -if (Y.AdjustedHeight is higher than or equal to X.AdjustedHeight) and
             (Y.Summit.AdjustedHeight is higher that X.Summit.AdjustedHeight) then
          begin
           -X.AssignSummit(Y.Summit)
          end
        end
      end
     -HASSEDIAGRAM.TimeTick
    end
  end

An alternative approach to starting for diagrams would be to use classical clustering strategies to cluster the observations of a workshop, on the basis of vector representations of the label structure of the individual observations, and using singular value decomposition to reduce the dimensions of the vectors (cf. Golub and van Loan 1989). An advantage of this alternative would be that the results could be compared in a straightforward way to existing strategies in automatic word sense disambiguation that use similar techniques on the basis of vector representations of what in our terminology would be the contents field of the individual observations, rather than of the labels (cf. Schütze 1992).

more displayed labels

A third family of extensions relates to modifying the features of the program's preparative work in order to improve the current functionality. A candidate in this category would be to explore the possibilities of displaying more complex graphs. One example has already been mentioned: by working with asymmetrical forms and profiting from the principle that empty regions need not be depicted, one could delay the moment where incompleteness occurs in Venn diagrams. Another possibility would be to make the maximum number of displayed labels dependent on the diagram type. This could, for instance allow for raising the limit in the case of Schematic diagrams, in which, given a specific number n of displayed labels, the actually depicted complexity in practice is typically far less than the theoretical maximum complexity (2 to the power of n). This modification is logically linked to a second one. Allowing for much higher numbers of displayed labels (and, consequently, longer calculation times) would introduce the need for having the option to have graphs only re-calculated on demand, as an alternative to the current situation of instantaneous recalculation whenever changes to the workshop have been made.

crossing the boundaries of one workshop

A completely different range of possibilities lies at the level of introducing functionality that involves several workshops. For instance, consider the situation that several case studies have been carried out in different workshops, but on the basis of the same labeling conventions. This situation could be served by the development and implementation of formulas for calculating e.g. the degree of similarity between the semasiological profiles in different workshops (which under certain conditions could serve as a quantification of 'degree of synonymy'). We stop here, because we do not want to write more about what is not in the program than about what is in the program. Nevertheless we hope to have shared our conviction that what is already there is a solid base for many interesting extensions (by which, for once, we do not mean semantic relations).


Back to table of contents