Computer-based tools for improving the language mastery: authoring and using electronic dictionaries

A. Vaquero
Departamento de Sistemas Informáticos y Programación, Universidad Complutense de Madrid,
E-28040 Madrid, Spain

F. Sáenz
Departamento de Sistemas Informáticos y Programación, Universidad Complutense de Madrid,
E-28040 Madrid, Spain

A. Barco
Computing Consultant,
E-28040 Madrid, Spain



We present in this paper computer-based tools for language learning based on constructionism. We focus language learning on using and authoring multilingual dictionaries, which enable the assimilation of fundamental linguistic concepts (lexicon, meanings, semantic categories, semantic relationships, and taxonomy.) We have developed automatic tools which helps in this task, allowing to use and author bilingual dictionaries. The linguistic concepts involved in the pedagogical goals are highlighted, and so the chosen learning model, which are the requirements for the development of both author and user tools. The authoring tool supports consistency of the intended semantics of the lexicon and can allow to detect omissions and inconsistency. These tools can be advantageously used in language teaching.


Computers should improve learning and teaching into the classroom




The challenge of integrating computers in education is mandatory but the traditional resistance of the Education World to innovation is well known. This is the general source of difficulties for exploiting the potentialities of New Technologies into Education [Cornu,1994]. Some very common facts observed in the educational centers illustrate this assertion: lack of hardware in schools, lack of quality software, limitations of imported software, lack of reliable and trustworthy reviews or evaluations, etc.

The use of computers in the school must be controlled by the teacher [Cuban,1987] and so, he must be prepared for and aided in assuming this control. Besides the general Computer Literacy that every teacher must currently have, our "user teacher" should have a specific formation in "Computers and Education" [Erickson,1994]. Particularly important is the aspect of Language, which we mention here because its influence in the involved computer-based applications, not only in the direct communication between teacher and learner.

But there are generally lack of support and appropriate training for teachers. Besides these difficulties, when teachers are committed to apply computers to the classroom, they detect a diversity of problems inhibiting the use of computers across the curriculum [Hodgson,1994], some common ones are: quality of software, access to software, picking out useful software for their own teaching, much of the existing software is difficult to integrate into teaching, often teachers must put in more preparation time before the computer can be used in the classroom, etc. Moreover, as the learning approach changes, new problems appear. The changes are centered in the dynamics of learning, that is, the learner’s processes for understanding. This new vision of the nature of learning is based on cognitive theories [Ausubel,1968] [Posner,1989] and it induces new ways of teaching [Tobin,1993]. Acquiring knowledge is an active construction process of the learner.

With this new vision in mind, the materials needed for teaching could be constructed thinking firstly in the learner more than in the curriculum [Karat,1997]. As a consequence, new computer-based materials and tools should be created for aiding to the learner. Computer-based tools are necessary for off-load heavies, time consuming and boring activities: calculators, symbolic manipulators, graphic and statistical tools, computer-based laboratories, word processors, spread sheets, data bases, graphic editors, and so on. More important, the activities involved must induce motivation. The tools presented here are of this kind.

When weak domains in the student skills and knowledge are detected, it is mandatory to fill the gap by applying appropriate computer-based environments. A specially weak domain is Language. There exist a worrying lack in the mother tongue mastery of the young and not so young people, as it has been recently shown by reliable inquiries made in Spain, but more or less strongly the problem is felt all around the world. The key part of the language misunderstanding is lexicon. There is experimental evidence of reading comprehension dependency of the vocabulary [Johnson,1978][Thorndike,1973].

In order to improve the level of language mastery, every pupil ought to handle specific tools with facilities for creation, consulting and modification of language parts as conventional dictionaries, glossaries, thesauri, encyclopedias, etc. We claim that electronic dictionaries [Wilks,1990] could motivate to the student more than paper ones, accordint to the learning model based on constructionism [Cabrera,1995]. Besides easily and quickly looking up terms, the computer allows the student to develop a series of new tasks with clear pedagogical goals. The global goal to be reached is word meaning [Quillian,1967]. Definition is a task for intending to reach the learning of word meaning dependency on other words, as so is word classification in semantic categories. Specific goals are diverse relationships such as polysemy and synonymy, and their implications into classification, and the relation of words of different languages. All these goals can be reached following a constructive and collaborative way among students and the teacher in the classroom. This could be efficiently done only with appropriate tools and friendly usable interfaces as a whole responsive environment [Zeltzen,1997].

In this paper, we present computer-based tools for authoring and consulting electronic dictionaries as learning tools. Our intention is to fill a gap in the niche of constructionist tools for improving the language mastery of students in every subject, which can be applied to a broad range of education levels. In order to situate our tools in their correct instructional place, one must distinguish between constructionist learning in user controlled environments (fully free environments) and navigation in hypermedia ones [Norman,1994]. Our tools belong to the first one of these two models of learning, the second one is more appropriate for learning other parts than lexicon [Goldman,1996]. Nonetheless, both are complementary and not absolutely separate [Teusch,1996][Fernández,1999]. As far as our knowledge, there are no similar tools to ours described in the literature.

This paper is organized as follows. In section 2, we highlight some linguistic concepts involved in multilingual dictionaries which are useful for language learning. Our first computer-based tool, a user tool for querying a bilingual dictionary, is presented in section 3. We present in section 4 our second computer-based tool, the author tool for creating bilingual dictionaries, which allows to fulfill more learning goals. In section 5, we present the development of the tools at a conceptual level without technical details, which is illustrative for formalizing the learning concepts of multilingual dictionaries. Finally, section 6 summarizes some conclusions and provides hints for future and related work.


Linguistic Concepts Incorporated in our Proposal

We present here some linguistic concepts incorporated in our proposal which can advantageously be exploited in language learning. Next sections will show the embodiment of this concepts in the core of our proposal.

Order, Classification, and Ontology

Typically, monolingual dictionaries show an alphabetical order that implies a simple term classification: terms are classified in singletons by its lexicographic form. Other possible less naïve classifications are derivative (root-shape), grammatical, and semantic. Derivative classifications [MaríaMoliner] are not common, and grammatical classifications are not intended for dictionaries. Finally, semantic classification groups terms by semantic categories (for instance, synonym and antonym dictionaries, or ideological dictionaries [Casares].) Semantic categories not also allow meaning classification, but the more meaningful taxonomy of meanings. Conventional lexical data bases, such as WordNet [Miller,1995], have term classification such as synonymy (grouped in the so called synsets.) Ontologies go beyond by playing the role of meaning taxonomy [Nirenburg,1995]. Our tools do support this important concept as will be explained along the paper.

Semantic categories are useless for first-term lookups since meanings will correspond, in general, to a set of (synonym) terms. However, it has an important role on learning by both using and authoring dictionaries because each meaning of a given term (polysemy and/or homonymy) is precisely identified by its semantic category (categories from now on, for the sake of brevity), instead of the usual nonsense sequential number. Therefore, we have a taxonomy or classification for meanings, but not a first-term order since meanings are abstract ideas that cannot be expressed in general by one distinctive word. It is commonly acknowledged that the best order for lookups is lexicographic (a derivative classification is a counterexample for this, but it still keeps a lexicographical order by repeating entries and adding links.) Figure 2.1 resumes the order for taxonomies in a hierarchy; it shows a taxonomy of categories along with the set of terms belonging to each category. From this point of view, there is a complete lexicographic order (provided categories are identified with terms or phrases.) A hierarchy is a natural structure for meaning classification. Each node in the hierarchy corresponds to a category. In principle, every category in the hierarchy can be used, no matter its hierarchy level. It must be noted that every category in the hierarchy contains at least the term which names the category, so that all categories are non-empty. On the other hand, the creation of new categories as intersection of several predefined ones should be avoided, in order to reach compactness.

Figure 2.1A Taxonomy



However, from an educational point of view, the goal is not to develop a general dictionary (in fact, it is a huge work which linguistic researchers are carrying out nowadays), but specialized dictionaries that restrict the linguistic domain to make easier the categorization of meanings and the definition of the taxonomy. There are a number of advantages in classifying meanings as a taxonomy. First, meaning taxonomy is a useful facility for an electronic dictionary because meanings embody additional semantics which provide more information to the reader (more than that of sequential numbers noted above.) Second, the system may also gain a new dimension because it is possible to generate in an automatic way specialized dictionaries under different categories (a sports dictionary may deal to soccer, tennis, or baseball dictionaries.) Third, it helps to develop a balanced dictionary by adding enough terms from different fields. Having the terms classified, it is easy to check out how many terms are under a given category. Fourth, to distribute the work between several authors by assigning categories to authors. A team of authors may develop a complete specialized dictionary by dividing the work by categories so that collaborative work is promoted for students. This finally means that the categories must be defined, which implies an added bonus for educational purposes, since it means that students have to organize ideas in a formal way, supported by the implementation of the author tool (covered in the next section.)


Polysemy and Synonymy

In every language there exists the known naming problem [Katzenberg,1993], which consists of two elements: one is polysemy (under the synchronic point of view, that is, embodies polysemy itself and homonymy), by which a term can have several meanings; and the other is synonymy, by which one meaning can have assigned different terms, as can be observed in Figure 2.2. In this Figure, Term 1 and Term 2 are synonyms and have a shared meaning, as so for Term 2 and Term3, under another meaning. Moreover, Term 2 is polysemic since it has two possible meanings.

Figure 2.2 Polysemy and Synonymy



We note here some remarks about the relationships between categories, meanings and terms. On the one hand, a given term can belong to several categories under different meanings. On the other hand, a given term can belong to several categories under the same meaning. Figure 2.3 shows two categories (C1 and C2) which respectively contain the meanings {M11, M12, M} and {M, M21, M22}. Each meaning has one or more terms associated. The term T2 is associated to meanings M12 and M21, which respectively belong to categories C1 and C2. We also show the term T that is assigned to meaning M, which belongs to both categories C1 and C2. Polysemy is present in T2, and synonymy is also present in T3, and T4, as it can be seen. T1 is neither polysemic nor synonym. TC1 and TC2 are the terms used to denote categories C1 and C2, respectively.

Figure 2.3 Relationships among categories, meanings and terms. Extensional definition

In this figure, the set of meanings {M11, M12, M} in C1 is the extensional definition of category C1. We must also note that a category has a meaning described by a definition. This figure does not embody this fact. In order to embody the meanings related to categories, we transform the scheme of Figure 2.3 in the one depicted in Figure 2.4. Now, C1 is the meaning of the category C1, and TC1 is the term assigned to such meaning, and the same applies to C2 and TC2. Then, we have one more meaning in each category. This meaning is the intensional definition of the category.

Figure 2.4 Relationships among categories, meanings and terms. Intensional definition

For a given language, we have a set of terms that holds the relationships with categories and meanings shown in Figure 2.4. If we now think of several languages, the same applies for each one. Then, relationships between terms from different languages come from considering at the same time the involved schemes.


User Tool

The user tool is a (simple) query interface which allows to easily recover the information about both English and Spanish terms as well as their relationships from the so called terminological data base. This interface allows the user to navigate the semantic categories, also allowing to retrieve the relevant information of any term (definition, other related terms, translation, synonims, …)

The Start window of this tool allows the user to select the base language ( i.e., the source language for translations and for representing dialogs) among the available languages by pressing its button (from now on, we consider a bilingual dictionary so that it is neither needed to select the source language nor the target language.)

This action pops up the Semantic Category window, as shown in Figure 3.1; its left pane shows the semantic categories structured as a tree, and the right pane, all the words under the highlighted semantic category. The total number of terms is showed on top of the right pane. The nodes in the tree can be clicked in order to expand or contract semantic categories subtrees. A text box is used for term lookups so that the closest word to the substring typed is shown in the right pane. Pressing Enter or double-clicking the highlighted word yields to the Query window. This window shows the relevant information about the selected term: its definition, comments, the list of semantic categories it belongs to (the one corresponding to the shown definition is highlighted), the synonym set and the list of related terms. It also displays a navigation history. It is possible to select another semantic category in this window, which results in updating all the relevant information. Direct access to the terms in both the synonym and related terms windows is allowed by double-clicking.

Figure .1 Semantic Category Window

The Semantic Category window has a control box with buttons which allows to return to the Start window, navigate backwards, translate the selected word, print, and exit the interface. The Translate button offers one of the main functionalities of this interface, i.e., the translation from the (source) base language to the target language and, when pushed, it pops up the Translation window (Figure 3.2.) This window shows a first field for the term in the first language, and a second field for the term in the second language. There are also navigation buttons for searching other terms in the same semantic category under an alphabetical order. It is possible to translate from the first or from the second language by using two buttons which expresses the two possible translation directions. Also, the Go to buttons allow to go to the Semantic Category window for the selected term. This completes the overall description of the functionalities of the user tool.

Figure .2 Translation Window


Author Tool

The author tool allows the author to add new terms to the terminological data base, and all the relevant information, such as its definition, semantic categories, meanings, synonym sets, and related terms. We have developed a Spanish user interface for this tool (easily rewritable to other language), and it consists mainly of one Author window, as shown in Figure 4.1. It has several management areas (indicated by superimposition in this figure) which are explained next.

Figure .1 Author Window


Semantic Category Management

This area is intended for managing all the operations related to semantic categories, as illustrated in Figure 4.2 with a fragment of a taxonomy. It has several controls: a hierarchical view of the semantic categories (with expand/hide functionality), text fields for the semantic category names (English and Spanish), and the buttons Add Category, Delete Category, and Modify Category. The insertion point when adding a new semantic category is the highlighted semantic category, and the Spanish and English texts for the semantic category name must be typed in the aforementioned text fields.

Figure 4.2 Semantic Category Management Area


Meaning Management

The area for meaning management, illustrated in Figure 4.3, consists of two lists for the meanings in both languages and the buttons Add, Delete, and Modify for addition, deletion, and modification of meanings, as well as buttons for edition (Copy and Paste buttons.) These lists shows the meanings in the form Term ->Definition for the highlighted category, so that one can see several meanings for the same term. Moreover, when a pair Term->Definition is selected, the corresponding Term->Definition translation is automatically highlighted; there is a one-to-one mapping between meaning representation in all the languages. It should also be noted that meanings, which are language independent, are shown with the best representation we have in a given language, i.e., a pair Term->Definition, since there are no other pair Term->Definition2 with the same meaning (note that is the same term in both pairs.)

Figure .3 Meaning Management Area


Synonyms and Related Terms Management Area

The area on the right in Figure 4.1 has four lists for the synonyms, and related terms in both languages which correspond to the highlighted meaning in the Meaning Management area.


Data Base Control Area

This area contains the button Update, which is used to modify the database with the typed information, and to obtain a report (text box Data Base Report) about consistency of the data base (Figure 4.1). Up to now, consistency detection only detects lack of textual definitions for terms, but it can be extended in order to detect other inconsistencies or omissions. This is quite important when authoring dictionaries, since a dictionary cannot be consistently built at each step, but it is constructively built from terms to relationships between terms (polysemy, synonymy.) For instance, this tool can be extended in order to give hints for detecting circular definitions (there are commercial dictionaries with this failure), for detecting possible lacks of synonym and related terms, and so on.


Development of the Tools

Our work in developing the tools is based on a sound conceptual model for the terminological data base which shall eventually hold the terms, definitions, meanings, and semantic categories. Since it is intended to deal with two or more languages (bilingual or multilingual dictionaries), we need to represent instances of terms, textual definitions, and textual semantic categories for each language, but, as meanings are not language dependent, we shall use unique representations for them.


Conceptual Model of the Terminological Data Base

We use the entity-relationship model to describe the conceptual model we propose shown in Figure 5.1. In this Figure (following some recommendations in [Pressman,1997] [Silberschatz,1996]), entity sets are represented with rectangles, attributes with ellipses, and relationship sets with directed and undirected lines. If B has an incoming line from A, this denotes a one (A) to many (B) mapping cardinality. Double arrows denote many to many mapping cardinalities. Undirected lines denote one to one mapping cardinalities. Relationship set names (not shown in this Figure) label each line.

Figure .1 Entity-Relationship Model for an English-Spanish Dictionary

For the sake of clarity and conciseness, we show in this picture an instance of a multilingual terminological data base for only Spanish and English languages, but it naturally derives from the general model depicted in Figure 5.3 (where Li denotes the i-th language, i Î {1,..,N}.) We depict in Figure 5.1 the entity Meaning, the central entity other entities rest on. The entity SynSet denotes the English synonym set (SynSet - Synonym Set.) The relationship set between both entities is one to one. The entity Term represents all the English terms that compose the terminological data base. The relationship set between SynSet and Term is many to many since a synonym set contains several terms, and a term may be contained in several synonym sets (obviously, with different meanings.) Figure 5.2 embodies this idea, in which Term 1 and Term 2 are synonyms and has a shared meaning, as so for Term 2 and Term 3, under another meaning. Moreover, Term 2 is polysemic.

Figure .2 Polysemy and Synonymy related with the synonym sets

The entity See denotes the set of English terms related under a given meaning. The relationship set between Meaning and See is one to one. The relationship set between See and Term is one to many, because a meaning may refer several English terms. The entity Definition represents the textual definition given to a meaning. The relationship set between Meaning and Definition is one to one. The entity Category denotes the category each meaning belongs to. The relationship set between Category and Meaning is many to many since many meanings are in a category, and a meaning can be in several categories (this situation is expected to be reduced to the minimum since the goal is to keep the classification as disjoint as possible.) This relationship set embodies the fact that our classification is not lexical (there is no a direct relationship between Category and Term) but semantic (we relate meanings to categories, i.e., we categorize meanings.) The entity Category has two attributes: CategoryName and NombreCategoría, which correspond to the textual name of the category in each considered language, English and Spanish, respectively. Meaning has two attributes: Definition and Definición, which correspond to the textual definition in the same considered languages. The remaining entities (CoSin, Véase, Término) are homologous to the respective entities (SynSet, See, Term.)

Figure .3 Entity-Relationship Model for a Multilingual Dictionary


Logical and Physical Models

After the conceptual model, we have developed the logical and physical models for the terminological data base. For the logical model, we ensure third normal form [Silberschatz,1996] and we have added consistency constraints for adding or modifying registers, and deleting registers.


Developing the tools

We have followed the iterative methodology for developing the tools, that is, firstly, the requirement analysis with the help of UML [Alhir,1998] as a notation for expressing requirements (cases diagrams.) Secondly, functional analysis has been performed by defining the business classes. Thirdly, the design stage is carried out by defining the infrastructures classes. Finally, Visual Basic has been used for coding, and testing has been carried out as well.



We have developed tools for aiding to improve the language learning. These tools are a consequence of the defined pedagogical goals and the linguistic concepts involved, as exposed above. Along the way, a sound entity-relationship model which embodies these linguistic concepts has been established, and it is well suited for developing lexical data bases with a rich structure.

Putting to work in service these represents a hope for educational achievement, and the experiences will guide the additions and modifications to improve their efficacy.

These tools can be enhanced in several ways. To mention only a few, firstly, both the user and author tool can be deployed in a Web context in order to allow centralized information for queries, and, more important, allow collaborative work. Secondly, they can be extended with phonetic search. Finally, the author tool data base control can be improved with the identification of not defined words in textual definitions, which can help for completeness.

Classification of meanings, such as we have emphasized before, is important for two challenging applications. First, to integrate a terminological data base into a multilingual knowledge base. And second, for information retrieval. A sound conceptual model is necessary to integrate a terminological data base into a multilingual lexical one. We have in mind lexical knowledge bases based on ontologies (e.g., MikroKosmos [MikroKosmos]), rather than monolingual on line lexical resources (e.g., WordNet [Miller,1995].) The conceptual model we develop here is coherent with the concept of ontology. Therefore, the implementation of a terminological data base from that conceptual model must facilitate its integration into an ontology based lexical knowledge base. So we can assume the implementations to be built from this model will accomplish the conditions to be rightly used as essential steps of information retrieval operations.