About the corpus search

A full-text search of two textual corpora can be performed on this website, the Corpus of Contemporary Irish and a parallel English-Irish corpus of legislation. The information on this page will assist the user in making the best use of these resources.

Performing a search

Use the search box at the top of the page to perform a search. A single search term or a multi-word phrase can be entered in this box. Press the search button (the magnifying glass icon), or the enter key on your keyboard, to begin the search.

Search results, if found, will be displayed after a brief period. Note that the time it takes to retrieve search results is affected by the specificity of the search terms. It takes longer to search for a single simple preposition, for example, than is the case for a phrase with multiple words. The total number of results found, as well as the total number of documents in which results were found, are displayed above the list of search results. If there are many search results the list will be divided across several pages. Facilities are available at both the top and bottom of the search results list that enable the user to move from page to page or to jump to a specific page.

Only the top 10,000 results that best match the query will be retrieved. If the corpus contains more than 10,000 relevant results this will be indicated by a total result count of ‘10,000**+**’. This is done to ensure that the data processing load associated with any one query will not adversely affect the experience of other users.

For the same reason, it is not permitted to perform the following types of searches (they will return no results):

  • Searches for a single character.
  • Searches where the search term(s) contain punctuation characters only.
  • Searches for the indefinite or definite article only (e.g. ‘a’, ‘an’, ‘na’, ‘the’).
  • Searches for a single pronoun (e.g. ‘’, ‘í’, ‘’, ‘you’, ‘them’).

Search modes

The search mode can be selected below the search box. The choice of search mode affects the results retrieved. Two search modes are available:

  • ‘The phrase as is’ - Only results which contain the search term(s), with the same spelling and in the same order as they were input in the search box, will be retrieved.
  • ‘Broad search’ - As well as the kinds of results retrieved when ‘The phrase as is’ is selected as search mode, results containing inflected and alternate forms of the search terms will be retrieved. Results may be retrieved in which the order of the search terms in the result does not match the order of the search terms in the query.

Note that characters such as quotation marks, parentheses and the asterix are excluded from the search, regardless of which search mode is selected. The usage of these characters will not prevent a query from succeeding but they will not be used to refine the search results.

Filtering results

A more specific search may be performed by using one or more of the available filters. These filters are found below the search mode options, on mobile devices, or on the right-hand side of the screen in the case of laptop or desktop computers. The following filters are available:

  • ‘Collections’ - Results are filtered with respect to the collection of texts to which they belong. Collections are generally associated with a specific publication or publisher.
  • ‘Word forms’ - The search mode must be set to ‘Broad search’ in order to use this filter. This filter enables the exclusion of certain word forms contained in the search terms from the query. This can be useful, for example, to exclude plural forms of a search term from a query.

Having selected a filter option press the ‘Filter’ button, or the search button (the magnifying glass), to run the search.

Listing order of results

The search aims to return the texts that best match the query first in the list of search results. Search results are sorted according to a number of criteria including:

  • Similarity between the search terms and the words in the text - Texts which contain the exact search terms or which contain the majority of the search terms (depending on the chosen search mode) are ranked higher.
  • Frequency of the search terms in the text - Texts in which the search terms, in whole or in part, occur more frequently are ranked higher.
  • Proximity of the search terms in the text - Texts in which the search terms, in whole or in part, occur with closer proximity to each other are ranked higher.
  • Text length - Shorter texts where the search terms comprise a significant portion of the text are ranked higher.