Termontological framework


Termontology is a linguistic approach that aims at describing language within the context of a broader framework: that of human cognition. The theory develops formalisms and terminological methods to represent linguistic information in general and its connection to other levels of communication. It describes in particular the relation between language (the linguistic level) and meaning (the semantic level).
  • The linguistic level includes morphosyntactic information about lexical units, their synonyms, acronyms, style features and collocation framework. It also includes examples that clarify the use of each lexical unit in a specific sense.
  • The semantic level connects lexical units to an ontological dimension.
The linguistic level is particularly interesting for translators, interpreters and anyone interested in language production. The semantic level is specially relevant for knowledge engineers and other specialists in the area of semantics.

Lexical units and the lexicon

Lexical units can have one or more senses. A sense is a relatively stable set of semantic features describing a certain idea. The stability is relative because in reality features are constantly undergoing adaptation as usage change. For each sense a lexical unit can have different or similar morphosyntactic features. Each sense is linked to a semantic category or object at the semantic level.

A key element in the termontology lexicon is the collocational framework that describes how a lexical unit combines with other lexical units and what terminological preferences it shows.

Language and Semantics

Termontology establishes formal mechanisms by which lexical units from any language can be connected to a semantic description of the world.

The semantic level contains the objects and instances that languages "mean". The information at the semantic level can exist independently from any specific natural language, but it is always generated by convention within the interaction of one or more language communities. The semantic level constains relationships between objects and instances. It also contains attributes inherent to every item. 

In natural language processing, the semantic level can be used to help determine the actual sense in which a lexical unit is employed in a text. Collocational and other linguistic features from the linguistic level, on the other hand, can be used to expand information on the semantic level.

Figure 1. Linguistic-semantic model

Figure 1 shows an overview of some of the key elements of termontographic data with the example of records for the term “film” in English and Dutch.

Methods for termontographic work

Figure 2 shows the basic steps for building multilingual termontological databases. First of all, the domain is defined. Then, a domain-specific corpus for each language is created with representative texts. Seed terms representing key concepts get selected by examining existing categories, taxonomies and the structure of representative documents.

All this can be used to establish the initial taxonomy. The Termonto Platform helps in automating some parts of these processes. The Termonto Platform, which contains several natural language processing modules, allows to generate relevant statistics and rules that allow to detect further seed terms, extract terms, suggest translations. A tool within the Termonto Platform, the Termonto Spider, uses term seeds to browse Wikipedia and extract new articles based on the existing Wikipedia categories, on the term interconnection and the Wikipedia translations, among other paramenters.

Figure 2. Modeling process

Users define semantic relations that are most relevant for the domain. Then they add linguistic rules that will support statistical approaches for term and relation extraction.
The Termonto Platform uses these rules and the statistics about general corpora and the domain corpora to produce candidates for terms, concepts and relations, which users validate. The data is delivered to the semantic parsers to be used as base for the ontological model.
The Termonto Platform also offers a module, Linsigna, that can provide natural language analysis and sense disambiguation to be used for an interface between humans and semantic processes.


