Two common types of software for searching information in texts bear resemblance with the search tools in Abundantia Verborum: corpus analysis tools and text retrieval tools.
By corpus analysis tools we mean the classical tools used in corpus linguistics to analyze corpora, mostly developed in academic circles and designed for analyzing corpora. By far the most common type of such tools is the concordance program. This is software that can generate concordances and word frequency lists of corpora, either exhaustive or on the basis of some specific query.
A list of such programs can be found on the Corpus Linguistics web site maintained by Michael Barlow, Dept of Linguistics, Rice University, Houston (http://www.ruf.rice.edu/~barlow/corpus.html).
Another type of software for performing text searches is know as text retrieval tools. These mainly commercial programs are tools for retrieving texts about a certain subject, typically in a huge collection of relatively small texts. Popular programs are ZYIndex, DtSearch and Isys, but there are many more.
Possible applications are fast retrieval of a particular document in one's personal collection of documents, or in a company's archive of orders, notes or mail in general. A new and fast growing application of this software is the World Wide Web, where WEB-versions of the software can be used to retrieve web pages about a particular topic.
Due to the different field of application, this type of software has different accents. The data these programs are meant for are not linguistic corpora, but every-day documents. By consequence text retrieval search languages are not designed to recognize typical linguistic markup. On the other hand, to be commercially attractive, they are designed to recognize many popular proprietary formats (such as word processor documents).
The most important similarities between Abundantia Verborum and the above types of software are in the syntax of the search languages. There are no standards for such languages, and practically no two programs use the same language throughout. Nevertheless some features are very frequent. Abundantia Verborum shares the following common features with other query languages:
As far as differences are concerned, both in the case of corpus analysis tools and in the case of text retrieval software the most important difference with Abundantia Verborum is the format of the output.
Corpus linguistics primarily focuses on general linguistic information of general use, a prototypical case being part-of-speech information. The objective, which clearly is justified for this type of information, is to put all linguistic information in the corpus, so that it is there for everybody to use. The main object of manipulation is the corpus, hence the focus on developing automated corpus taggers. The output of searches on the corpus is seen as a result, not an object of manipulation. The results are merely numbers and lists of examples, and there seldom is a need for other output formats than those generated by concordance programs.
In Abundantia Verborum it is the output of searches that is the main object of manipulation, not the corpus. The main reason for this difference is that the prototypical type of study Abundantia Verborum was designed for, namely lexicological case studies, is a field where there is no consensus on which annotation strategies are of the most general use. Therefore the idea of putting all results in the corpus, so that they are there for everybody to use, is more problematic here. One could stick to the canonical corpus linguistics minded approach and use what is called problem-oriented tagging (cf. de Haan, 1984). In problem-oriented tagging users add to the corpus their own form of annotation, oriented particularly towards their own research goal. For Abundantia Verborum we chose for a more reserved, indirect approach. The material for a case study is retrieved from the corpus and stored in a workshop. The case study, and with it the problem-oriented annotation (or rather labeling), is carried out on the workshop. At the end of the study on may decide whether or not to copy specific linguistic information from the workshop to the corpus. Computational efforts for performing the latter task are reduced to a minimum by the extremely computer-friendly syntax of workshop files and by the query history and observation origin bookkeeping mechanisms in Abundantia Verborum.
The practical consequence of this difference perspective is that although facilities for building concordances and frequently list are also present in Abundantia Verborum (in the Basic Corpus Statistics Tools), the main focus in Abundantia Verborum is on collecting data into workshops (with the Abundantia Verborum query mechanism).
As was mentioned above, text retrieval tools serve another goal.
In text retrieval tools the prototypical use
is to retrieve complete texts in which a particular word
or expression is used, e.g. to find, in a big collection
of articles, all articles that are about, or at least mention this or this subject.
Therefore the results are presented as lists of texts, rather
than as lists of occurrences of linguistic phenomena.
As in corpus analysis tools the results are thought of as objects to
be read or printed, not to be manipulated (although there sometimes is
the possibility to add bookmarks). In this context it should be
added that the query languages of these tools are also
designed to look for complete texts. Therefore, the resemblance
between query languages we mentioned earlier are rather at the level
of the components of the languages than at the level of the semantics
of the queries: for instance, in a text retrieval tool
a query like "AND(WORD(justice),WORD(law))" would typically
retrieve texts in which both words co-occur rather than co-occurrences
of the words within some infra-textual linguistic context.