Abundantia Verborum

4. Virtual corpora and corpus linguistics

4.2 Virtual corpora


4.2.3 Users and encapsulation

Abundantia verborum is designed to be usable as a multi-user environment. More specifically, we consider the following local area network settings to be the typical configuration:

In such a configuration only the system administrator has to know all the technical details about the centrally stored objects (notably the corpora and the virtual corpora). Other people may be spared some details about these. This is why we can say Abundantia Verborum supports encapsulation to some degree. The current implementation supports only a modest form: in many situations people can work with objects of which they strictly speaking only have to know the name and the caption field (but if they want to they can see the objects, and if they have the access rights they can even edit them). A strong version of encapsulation would imply that people are not allowed to see anything but the name and the caption field of the objects on the server, let alone modify anything. It would not take much work to adjust the current program to support this. It would merely require replacing a set of "edit X" button with "see X's caption" buttons.

This current section addresses encapsulation only in the context of virtual corpora. We do not go into the details of setting up a multi-user configuration for the program. Neither do we address encapsulation for other objects than virtual corpora (once again we refer to the online help Abundantia Verborum Help).

macro level encapsulation

At the department of linguistics at the University of Leuven the following situation is being tested. Virtual corpora on the workstation, e.g. in "c:\abundant\user" are pointers to virtual corpora with the same name on a central place on a server, called "s:\abundant\user". We say virtual corpus X is a pointer to virtual corpus Y if X contains but Y in its file list and has no explicit view attached to itself. In such a situation searching X will have the same net effect as searching Y. The virtual corpora in "s:\abundant\user" are themselves also pointer to the actual virtual corpora that are typically stored in a subdirectory "abundant" of the directory that contains the actual data. The actual data can be on the most diverse locations.

The purpose of the latter 'indirection', from "s:\abundant\user" to all sorts of places, is that the system becomes easy to control for the administrator:

The other indirection, from "c:\abundant\user" to "s:\abundant\user", facilitates the work of the user. He can find all objects he needs, both his own creations and the other ones, in the same directory, namely his own user directory "c:\abundant\user".

micro level encapsulation

The possibilities of encapsulation at the micro level depend on the type of markup that is being used in the corpora. As long as there is hardly any markup or the user does not need the markup, the situation is relatively simple. But as soon as one wants to benefit from micro-level markup it becomes clear that Abundantia Verborum is far away from true encapsulation at this level. Let us once again consider the example of Part Of Speech (POS) markup we first encountered in section 4.2.1 Virtual corpus views. The example is repeated below.

<s>This&DD1; article&NN1; talks&VVZ; about&RP; itself&PPX1; .&PUN;</s>
<s>It&PPH1; consists&VVZ; of&IO; two&MC; paragraphs&NN2; .&PUN;</s>

True hiding of the technicalities would have been accomplished if the query language supported queries such as, e.g. the following, which asks for all common nouns that begin with an "a", using an attribute-value based formalism.

WORD(appearance="RE(a.*)" POS="common noun")

The syntax is not important. The key issue is that the query is abstract, in the sense that it nowhere relies on concrete encoding schemata. For this to be feasible the program somehow has acquire the knowledge that internally the query has to be translates to:

WORD(RE(a.*&NN.*;))

The logical location to store this information would be the view attached to the virtual corpus. Only currently views do not support such detailed information. Why not?

In spite of this situation, we do plan to investigate the possibility of supporting such abstract query language in Abundantia Verborum. We'll come back to this issue in section 4.3.1 Typed query language.

For the moment however, the user will have to know that the question to ask is:

WORD(RE(a.*&NN.*;))

Such information he should find in the caption of the virtual corpus.


Back to table of contents