Termontological framework

Introduction

Termontology is a linguistic approach that aims at describing language within the context of a broader framework: that of human cognition. The theory develops formalisms and terminological methods to represent linguistic information in general and its connection to other levels of communication. It describes in particular the relation between language (the linguistic level) and meaning (the semantic level).
  • The linguistic level includes morphosyntactic information about lexical units, their synonyms, acronyms, style features and collocation framework. It also includes examples that clarify the use of each lexical unit in a specific sense.
  • The semantic level connects lexical units to an ontological dimension.
The linguistic level is particularly interesting for translators, interpreters and anyone interested in language production. The semantic level is specially relevant for knowledge engineers and other specialists in the area of semantics.

Lexical units and the lexicon

Lexical units can have one or more senses. A sense is a relatively stable set of semantic features describing a certain idea. The stability is relative because in reality features are constantly undergoing adaptation as usage change. For each sense a lexical unit can have different or similar morphosyntactic features. Each sense is linked to a semantic category or object at the semantic level.

A key element in the termontology lexicon is the collocational framework that describes how a lexical unit combines with other lexical units and what terminological preferences it shows.

Language and Semantics

Termontology establishes formal mechanisms by which lexical units from any language can be connected to a semantic description of the world.

The semantic level contains the objects and instances that languages "mean". The information at the semantic level can exist independently from any specific natural language, but it is always generated by convention within the interaction of one or more language communities. The semantic level constains relationships between objects and instances. It also contains attributes inherent to every item. 

In natural language processing, the semantic level can be used to help determine the actual sense in which a lexical unit is employed in a text. Collocational and other linguistic features from the linguistic level, on the other hand, can be used to expand information on the semantic level.

TermontoFramework
Figure 1. Linguistic-semantic model

Figure 1 shows an overview of some of the key elements of termontographic data with the example of records for the term “film” in English and Dutch.

Methods for termontographic work

Figure 2 shows the basic steps for building multilingual termontological databases. First of all, the domain is defined. Then, a domain-specific corpus for each language is created with representative texts. Seed terms representing key concepts get selected by examining existing categories, taxonomies and the structure of representative documents.

All this can be used to establish the initial taxonomy. The Termonto Platform helps in automating some parts of these processes. The Termonto Platform, which contains several natural language processing modules, allows to generate relevant statistics and rules that allow to detect further seed terms, extract terms, suggest translations. A tool within the Termonto Platform, the Termonto Spider, uses term seeds to browse Wikipedia and extract new articles based on the existing Wikipedia categories, on the term interconnection and the Wikipedia translations, among other paramenters.

TermontoProcess
Figure 2. Modeling process

Users define semantic relations that are most relevant for the domain. Then they add linguistic rules that will support statistical approaches for term and relation extraction.
The Termonto Platform uses these rules and the statistics about general corpora and the domain corpora to produce candidates for terms, concepts and relations, which users validate. The data is delivered to the semantic parsers to be used as base for the ontological model.
The Termonto Platform also offers a module, Linsigna, that can provide natural language analysis and sense disambiguation to be used for an interface between humans and semantic processes.

References

De Baer, P., Meersman, R., Temmerman, R., 2009. Termontography and DOGMA for Knowledge Engineering within PROLIX, in: Meersman, R., Herrero, P., Dillon, T. (Eds.), On the Move to Meaningful Internet Systems: OTM 2009 Workshops, Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp. 534–543.
Domínguez Burgos, A., Kerremans, K., Temmerman, R., 2011. Text-based IE and Open Linguistic Data for Termontological Resources, in: Slodzian, M., Valette, M., Aussenac-Gilles, N., Condamines, A., Hernandez, N., Rothenburger, B. (Eds.), Workshop Proceedings of the 9th International Conference on Terminology and Artificial Intelligence. INALCO, Paris, France, pp. 30–32.
Domínguez Burgos, A., Kerremans, K., Temmerman, R., 2012. Strategies in automatic traversal of Wikipedia articles for mining multilingual resources, in: Proceedings of the Workshop on Challenges to Knowledge Representation in Multilingual Contexts (TKE 2012 conference). Presented at the Challenges to Knowledge Representation in Multilingual Contexts, Madrid.
Kerremans, K., De Baer, P., Temmerman, R., 2010. Competency-based job descriptions and termontography. The case of terminological variation, in: Thelen, M., Steurs, F. (Eds.), Terminology in Everyday Life. John Benjamins Publishing Company, Amsterdam/Philadelphia, pp. 181–194.
Kerremans, K., Temmerman, R., De Baer, P., 2008. Construing domain knowledge via terminological understanding. Linguistica antverpiensia 7, 177–191.
Mel’čuk, I., 1996. Lexical functions: a tool for the description of lexical relations in a lexicon. Lexical functions in lexicography and natural language processing 31, 37–102.
Temmerman, R., 2010a. A dual translation problem: embodied (metaphorical) naming in multiword units, in: Proceedings of the XVIIth European Symposium on Languages for Specific Purposes. Presented at the XVIIth European Symposium on Languages for Specific Purposes, Aarhus.