1,721,203 research outputs found
If-conditionals as modal colligations: A corpus-based investigation
The weak claim motivating this study is that if-conditionals are strong modality attractors, due to the conditional (i.e. modal) meaning of if, with modality appearing in the if-clause, the main clause, or both. The strong claim is that if-conditionals can be regarded as modal colligations. The weak claim can be supported if it is shown that if-conditionals contain modality in a significantly higher than average frequency. Before examining the conditions under which the strong claim can be supported we need to turn our attention to the notions of modality, collocation, colligation and semantic preference, which inform the notion of modal colligation introduced in this paper. In: Davies, M., Rayson, P., Hunston, S., & Danielsson, P. (eds). Proceedings of the Corpus Linguistics Conference: Corpus Linguistics 2007
The academic Web-as-Corpus
As a result of the European Union’s pressure towards internationalization, universities in many countries find themselves increasingly urged to provide information on their requirements and services and to promote themselves in English on the web. Hence the need for corpus resources and studies of institutional academic English used as an international language (or lingua franca) on the web. This paper introduces “acWaC-EU” (an acronym for “academic Web-as-Corpus in Europe”), a corpus of web pages in English crawled from the websites of European universities and annotated with contextual metadata. The corpus contains approximately 40 million words from native English universities and a similar number of words from universities based in all other European countries, in which English is used as a lingua franca. Thanks to the metadata, it is possible to re-group texts for comparison based, e.g., on the language family of the native language spoken in the country where the text was produced. The paper describes and evaluates the corpus construction pipeline and the corpus itself, presents a case study on the use of modal and semi-modal verbs in lingua franca vs. native texts, and looks at future developments, in particular as concerns simple heuristics for topic-/genre-oriented subcorpus construction
Letting in the light and working with the Web: A dynamic corpus development approach to interpreting metaphor
Setting up corpora is a laborious process, requiring time and resources. One problem we may find is that once a corpus has been created, in a very short time its static nature may not reflect the way language is currently used. This raises the question of how to make corpus-building a dynamic process. Sharoff (2006) refers to ‘open-source corpora’, making use of the Internet in order to collect data which can constantly be updated following the trends of language change. Clearly, this process must be made rapid and efficient for research purposes. This paper firstly describes the initial development of a tool specifically studied to facilitate the search for linguistic data in a series of steps. Starting from the search for an initial, small amount of “thematic” linguistic data on the web, followed up by the manual examination and analysis of the collected data and, last, automatically extending the analysis -by means of analogical comparison to what was manually analysed- allowing the further extraction of a wider sample of data. Secondly a small scale study aims to examine the ways in which such an approach can be exploited in a specific context. Here attention is focused on the linguistic patterns characterising the metaphorical use of LIGHT = UNDERSTANDING (cf. Lakoff and Johnson 1980). We will illustrate the process of acquiring relevant contexts of such usages from the web and will outline the crucial step of the acquisition process through the analogy-based mechanism that extracts from the web examples of figurative usages of LIGHT by discriminating these from literal usages on the basis of analogical similarity to the manual analysis carried out on the initial data collection
Love - a familiar or a devil? An exploration of key domains in Shakespeare's Comedies and Tragedies
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
REVERE: support for requirements synthesis from documents
Documents are important sources of system requirements. This is particularly true of domains that are document-centric in terms of their operational and development processes. For system evolution in organisations that have been subject to organisational change and loss of organisational memory, documents may be the major source of key requirements. Hence, systems engineers often face a daunting task of synthesising crucial requirements from a range of documents that include standards, interview transcripts and legacy specifications. The goal of REVERE was to investigate support for this task which has been described as document archaeology (Robertson and Robertson, 1999). This paper describes the resulting REVERE toolset, its utility for document archaeology and for other tasks that have emerged in the course of our experiments with the toolset
VARD versus WORD: A comparison of the UCREL variant detector and modern spellcheckers on English historical corpora
- …
