Abundantia Verborum

3. Tutorial

3.1 Getting the data


3.1.6 Query settings

This section concludes our introduction to the Abundantia Verborum query mechanism, which was the topic of "3.1 Getting the data", and which was introduced in 3.1.1 Using queries. Filling in the top part of the Run Query dialog box, which concerns the WHAT question, was the topic of the sections 3.1.2 A first query, 3.1.3 Queries with Boolean operators and 3.1.4 Queries with wildcards. Filling in the middle part of the Run Query dialog box, which concerns the WHERE question, was the subject of section 3.1.5 Virtual corpora. This final section summarizes and complements what has already been said about the bottom part of the Run Query dialog box, which concerns the HOW question, and where the 'mode' of running is determined using all sorts of parameters or settings.

The process: counting, collecting or asking the user

The first thing you can determine concerns the general process. What happens when a hit is found ? If "Explore" is checked, the program does nothing but count. The program goes through the complete virtual corpus without any user interaction. The Query Report signals how many hits were encountered. The query run adds no new observations to the workshop. This behaviour could be described as skip all.

The opposite behaviour, keep all, is obtained by having both "Explore" and "Prompt on Hit" unchecked. All hits that are encountered are automatically copied to the workshop. Once again the program goes through the complete virtual corpus without any user interaction. After the run the workshop contains a new observation for each match.

The middle way, each time ask the user whether to skip or keep, is obtained by having "Explore" unchecked and "Prompt on Hit" checked. Each time the program encounters a hit, it pauses to ask the user whether this particular hit should be skipped or kept.

Adding query meta-information to the workshop

You can log in the workshop all query runs carried out from within that workshop. If "Update Caption" is checked when running a query, the run is logged in the "Caption" field of the workshop. The "Caption" field of a workshop contains a textual description of the contents of the workshop. Its function can be compared to the "Caption" field of a query, or the "Caption" field of a virtual corpus. We first discussed these "Caption" fields while introducing the Boolean operator AND in section 3.1.3 Queries with Boolean operators.

If "Add New Query Label" is checked when running a query, the program tags all observations added to the workshop due to this query run with the information: "added by this or this query run". This technique is discussed in more detail in section 3.2.2 Adding labels via queries.

The context of a hit

Another thing that can be controlled by the user is how a hit is turned into an observation. When we looked at the results of our first query run in the section 3.1.2 A first query, we mentioned that the "Contents" field of observations contains text copied literally from the corpus, typically a few sentences. Further we told that the match was enclosed in the tags <MATCH> and </MATCH>, and that the sentence containing the match was enclosed in the tags <CONSTITUENT> and </CONSTITUENT>.

With the information we have gained since then, we can now be more accurate. When a match is turned into an observation, the "Contents" field of the observation contains at least the complete constituent in which the hit was found. This constituent is enclosed in the tags <CONSTITUENT> and </CONSTITUENT>. In "democorp.vic" constituents happen to be sentences. Matches or, in the case of compound queries, component matches are enclosed in the tags <MATCH> and </MATCH> according to the rules explained in the sections 3.1.2 A first query and 3.1.3 Queries with Boolean operators.

One constituent is the minimal setting for the "Contents" field. It is possible to specify that more context is to be added. This additional context is always a number of complete contituents. Click on the button "Settings" in the Run Query dialog box! The Query Settings dialog box appears. The amount of context is specified next to the texts "Number of Leading Constituents" and "Number of Trailing Constituents", which respectively stand for the number of constituents immediately preceding the one with the hit you want to see included in the "Contents" field, and the number of constituents immediately following the one with the hit you want to see included in the "Contents" field. Fill in '2' for each and click "OK"! These new settings are the current ones until you either change them again or close the program. If you want your new settings also to be the default settings in future Abundantia Verborum sessions, then you should save them as follows. Close whatever dialog box is open, if any! Choose "Options | Preferences"! The User Preferences dialog box appears! Click on "Save" to save all current settings and then on "OK" to close the User Preferences dialog box! The settings you just saved will be activated again when you later start Abundantia Verborum another time.

Choose "Workshop | Run Query..." and then click "Settings"! Right underneath "Number of Leading Constituents" and "Number of Trailing Constituents" there is the parameter "Allow for multiple hits per constituent". This too is a parameter concerning the context of a hit. The question here is not "how much context goes into an observation" but rather "how many observations can come from this context". The best setting for this parameter depends on the size of a constituent and the nature of the study. Sometimes a second occurrence of a phenomenon shortly after the first one is merely a repetition of the first one and therefore the two should be regarded as one observation: for instance, a word being used twice in the same sentence to refer to the same item.

Searching only parts of a virtual corpus

We stay in the Query Settings dialog box. We skip the middle panel. It is not so interesting. The bottom part on the contrary does deserve some attention. A virtual corpus consists of files, and each of these files consists of constituents. When searching a virtual corpus the program proceeds file by file and within each file constituent by constituent. If you run a query with the bottom part of the Query Settings dialog box empty, the search begins at the first constituent of the first file and ends at the last constituent of the last file. It is possible, though, to overrule both the default point where the program begins searching and the default point where the program stops searching. You can specify both these points in terms of 'constituent number this and that in file number this and that'. The boundary points themselves are included in the search. For instance, if you specify to start at constituent 1 of file 1 and to stop at constituent 1 of file 1, then one constituent is searched (if we may take for granted that a normal corpus contains at least one constituent).

This "start at; stop at" feature has two purposes. One is fast retrieval of a location the coordinates you already know, for instance if you want to look at the broader context of a passage the location of which you already know. Another purpose is to use it in combination with the "Prompt on hit" parameter checked. If you work with the Prompt On Hit feature checked, which is a slow process, it is handy to be able to interrupt a search by aborting the query run, and later resume the search by rerunning the same query setting "start at" to the location where you interrupted the previous query run.


Back to table of contents