EVALUATION OF REUSE AND MAINTENANCE IN HYPERMEDIA
APPLICATIONS FOR EDUCATION: VALIDATION OF METRICS

Emilia Mendes,
Rachel Harrison and
Wendy Hall

e-mail: {mexm95r,rh,wh}@ecs.soton.ac.uk
fax: 0044 1703 592865

This paper reports the results of applying metrics to hypermedia authoring under the SHAPE research project. The aim of SHAPE is to help authors develop high quality large hypermedia applications for education. The quality characteristics considered are the reusability of information, the maintainability of applications and the authoring effort. Although a number of metrics for hypertext systems have been proposed, we believe that many of the measures proposed in the past lack the necessary mathematical and/or empirical justification. The metrics proposed in this paper have been developed using the Goal-Question-Metric approach, and adhere to the representational theory of measurement. We describe the development of the metrics and the results of a quantitative empirical study which compares two different hypermedia authoring systems.
 
 

INTRODUCTION

We regard measurement as important for three basic activities:

Measurement can be used to: i) support project planning; ii) determine the strengths and weaknesses of the current processes and products; iii) provide a rationale for adopting/refining techniques; iv) evaluate the quality of specific processes and products; v) assess the progress of a project during its course; vi) take corrective action based on this assessment; and finally vii) evaluate the impact of such action [Basili et al., 94].

The literature has plenty of examples of projects whose budgets and schedules overran. Software engineers have addressed software engineering problems by continually looking for new techniques and tools to improve process and product, but methodological improvements which lack corresponding empirical validation cannot be considered scientifically valid [Fenton & Pfleeger, 96].

For anyone who has been involved in software engineering it is clear that for a long time there has been little interest in any sort of evaluation to prove the usefulness of a method or tool, as pointed out by Fenton et al.:

"many research findings published can be characterised as "analytical advocacy research". That is, the authors describe a new concept in considerable detail, derive its potential benefits analytically, and recommend the concept be transferred to practice. Time passes, and other researchers derive similar conclusions from similar analyses ... Yet practitioners often seem unenthused: something important is missing from this picture: rigorous, quantitative experimentation" [Fenton et al., 94].

The representational theory of measurement seeks to formalise owr intuition about the way the world works. That is, the data obtained as measures should represent attributes of the entities observed, and manipulation of the data should preserve relationships observed among the entities. Thus, intuition is the starting point for all measurement.

In Section 2 we present a survey of hypertext metrics already proposed and in Section 3 we compare those proposals and offer further discussion. In Section 4 we present the research project SHAPE and describe how we developed metrics applied to hypermedia authoring. Finally, in Section 5 we give our conclusions and comments on future work.
 

METRICS APPLIED TO SHAPE

The SHAPE project

SHAPE [Mendes & Hall, 97] is an acronym for a research project, carried out at the University of Southampton, and stands for Southampton Hypermedia Authoring Paradigm for Education. The aim of SHAPE is to aid authors in the development of high quality large hypermedia applications for education. For us, the quality characteristics considered are reusability of information, maintainability of applications and authoring effort.

Instead of defining improvements to be applied to an authoring tool and later verifying if they are adequate we decided to use a more consistent and systematic approach, which is to apply metrics in order to identify how adequate an authoring tool is for the maintainability of applications, information reuse in applications and the level of authoring effort required.

The principles of the metrics we developed are based on Fenton et al.’s framework for software measurement [Fenton & Pfleeger, 96], and on the guidelines from the DESMET project [Kitchenham, 96], [Kitchenham, 93], in the field of software engineering. Both have been extensively used in experiments in the software engineering field [Harrison et al., 95], [Daly, 96], [Briand et al., 96], [Briand et al., 97], [Basili & Rombach, 88], [Basili et al., 94], [McDonell, 91].

We have planned two evaluations for SHAPE. The first was a quantitative evaluation and the second will be both quantitative and qualitative. In Section 4.2 we describe and present the results of the first evaluation.

Design of the First Evaluation

For the first evaluation the stated hypotheses were:

H1-0: Microcosm applications are more maintainable and their information more reusable than applications built using a standard WWW environment

H1-L: The use of a link server allows both a better maintainability of applications and reusability of information than embedded ones

H1-G: Generic links allow a better maintainability and reusability of information than buttons

 
We have chosen to compare Microcosm [Davis et al., 92] to the Web [Berners-Lee et al., 94] because they propose different and almost opposite ways of representing and managing links, and this seems to have a big influence on authoring [Hill et al., 95]. Microcosm is an open environment, characterised by the separation of link structures from the information being linked [Hill et al., 95]. The WWW, on the other hand, provides a simple point-to-point linking model based upon embedded links.

Procedure

The survey involved the use of questionnaires that were answered by either Microcosm or Web developers. A survey offers the following advantages [Kitchenham, 96]: i) reaches a lot of users; ii) makes use of existing experience; iii) makes use of standard statistical analysis techniques; and iv) confirms that an effect generalises to many projects/organisations. However, surveys can only confirm association not causality, and are prone to bias.

Both questionnaires had three sections: experience, maintainability and reusability. For each section the questions were proposed with the objective of collecting the necessary data to test the hypotheses.

The experience section was based on a composition of two aspects: Entities and Hypermedia. The maintainability section was also based on a composition of three aspects: Entities, Hypermedia and Maintainability. The reusability section was based on a composition of three aspects: Entities, Hypermedia and Reusability.

In order to prepare both the Maintainability and the Reusability sections we had to consider possible tasks accomplished by authors in the development of hypermedia applications for education.
 

The Pilot Study

Before sending the questionnaires to both Microcosm and Web authors we carried out a pilot study because it provides an opportunity to learn from mistakes without ruining the main study [Preece et al., 94]. Feedback from colleagues prompted some changes to the questionnaire concerning ambiguous questions, unusual tasks, definitions in the appendix and the number of questions.

The Data Analysis

The survey results were analysed using standard statistical techniques to determine whether the two sets of questionnaires (from Microcosm and Web authors) were from different populations. Results from the Kruskal-Wallis one-way analysis of variance, using a level of significance of 5% are shown in Tables 5-8. For the results presented in Table 9 we used Spearman’s correlation with one-tailed test of significance.

The level of experience of the two groups of users (Microcosm and the Web) was found to be the same, as was the structure of the hyperdocuments [Tab. 4].
 


Table 4 - Type of structure used by both groups

Structure
Microcosm
Percentage %
Web
Percentage %
Sequential
01
5,5 
01
04
Hierarchical
12
67
18
64
Network
04
22
07
25
No answer
01
5,5
02
07
 
18
100
28
100

In the questionnaire there were fifteen tasks related to actions in maintenance and reuse. The first nine are common tasks involved in maintenance and reuse and the last six are more unusual tasks, but also important considering maintenance and reuse. They are described below:
 
 

For each of these tasks, authors were asked: i) the level of difficulty to accomplish those tasks, on a scale from 1 (very easy) to 5 (very difficult) and ii) the time it would take, in minutes, using 10 different intervals given.

When comparing tasks involving point-to-point links in both Microcosm and the Web we found that in 33% of the answers the medians for the level of difficulty were lower for Microcosm than for the Web and in 46% of the answers the time was shorter.

In 46% of the answers the time spent in both Microcosm and the Web was the same. But Web authors needed to use an auxiliary set of tools in order to accomplish the tasks in a reasonable time and with a low level of difficulty. This was not necessary using Microcosm.

Even with 7 answers where the level of difficulty was higher for Microcosm than for the Web there was no corresponding increase in the time spent to accomplish the tasks. As Microcosm is an openhypermedia system, the author has to edit the linkbase many times in order to maintain links. This task can be considered more difficult than changing links on the Web, but, as shown by the data, there was no overhead on the time spent.

When comparing tasks involving point-to-point links in both Microcosm and the Web we also found 8 answers with a statistically significant difference. Four showed advantages for the Web and four showed advantages for Microcosm. The medians for tasks involving Microcosm point-to-point links, Web point-to-point links and the corresponding level of significance are presented in [Tab.5]:

Table 5 - Medians for tasks involving point-to-point links in Microcosm and the Web, with corresponding level of significance.


Question
Attribute 
Median point-to-point Microcosm
Median point-to-point Web
Level Significance 
02
Time
1
2.5
0.04*
05
Difficulty
2
1
0.00*
06
Difficulty
2
1
0.03*
08
Time
1
3
0.03*
12
Difficulty
1
2
0.04*
13
Difficulty
1
2
0.00*
14
Difficulty
3
1.5
0.03*
15
Difficulty
2
1
0.00*
*denotes that the result is statistically significant at the 5% level

Questions 5, 6, 14 and 15 represent simple tasks, but for Microcosm authors involve the editing of the linkbase in order to update the information about the links. We understand that this was the reason for a higher level of difficulty using Microcosm. But, even with a higher level of difficulty, no statistically significant differences were found when comparing the time involved in the same tasks.

Questions 2 and 8 showed a statistically significant difference in the time spent in accomplishing the tasks. The time was higher using the Web. Questions 12 and 13 also showed a statistically significant difference in the level of difficulty spent in accomplishing the tasks. Again the level of difficulty was higher using the Web. Questions 12 and 13 would be easily accomplished (in Microcosm) using generic links for the former question and local links for the latter question. Here we can see that when the applications require the definition of links to be valid within the whole application or within a particular document, the use of point-to-point links on the Web increases both the time involved and the level of difficulty in accomplishing the task.

For 13 questions that were not specifically designed to consider tasks that would be better suited for generic or local links, Microcosm authors were asked to estimate the time and level of difficulty in accomplishing the tasks if the links were either point-to-point or generic.

When comparing the answers given for generic links to those given for point-to-point links on the Web we found 8 questions (10 answers) with a statistically significant difference. All the 10 answers showed advantages for generic links. The medians for generic links, medians for point-to-point links on the Web and the corresponding level of significance are presented in [Tab. 6]:
 


Table 6 - Medians for tasks involving generic links and point-to-point links, with corresponding level of significance.

Question
Attribute
Median Generic Microcosm
Median point-to-point Web
Level Significance 
03
Time
1.0
1.5
0.00*
04
Time
0.5
1.0
0.04*
05
Difficulty
1.0
1.0
0.00*
08
Time
1.0
3.0
0.00*
09
Time
1.0
2.0
0.03*
10
Time
1.0
2.0
0.07**
12
Time
2.0
3.0
0.07**
  Difficulty
1.0
2.0
0.00*
13
Time
1.0
2.0
0.08**
  Difficulty
1.0
2.0
0.00*
*denotes that the result is statistically significant at the 5% level

**denotes that the result is statistically significant at the 10% level


 

We can see that in 62% of the questions considered, generic links allowed either a shorter time or lower level of difficulty, when compared to accomplishing the same tasks involving point-to-point links on the Web.

The only question (question 13), that compared tasks involving local links to point-to-point links showed a statistically significant difference in favour of local links. The median for local links, median for point-to-point links on the Web and the corresponding level of significance are presented in [Tab. 7]:
 


Table 7 - Medians for tasks involving local links and point-to-point links, with corresponding level of significance.


Quest Attribute
Median Local Microcosm
Median Point-to-point Web
Level Significance 
13 Time
1
2
0.00*
  Difficulty
1
2
0.08**
*denotes that the result is statistically significant at 5% level 

**denotes that the result is statistically significant at 10% level

We found values of Gamma higher than 0.50 not only for the four independent variables presented in table 7, but also for the number of links and the structure of the application. Values for Gamma equal or higher than 0.50 show that there exists an association between the variables compared.
 
 

CONCLUSIONS

We have presented our approach to the development of metrics within the SHAPE research project and how they were evaluated.

The metrics were proposed to measure the maintainability and reusability of hypermedia applications for education, so that we could evaluate whether a particular hypermedia application for education was more or less maintainable or reusable than another application. Therefore, the metrics proposed are not restricted to a particular hypermedia system since they can be used to measure the maintainability and reusability of any hypermedia applications.

In order to investigate the metrics proposed we collected the data using applications developed with both Microcosm and the Web.
 
 

The data collected showed strong evidence that the link representation, link type, highlighting of anchors, structure of the application and the author’s experience can strongly influence the maintainability of the application and the reusability of information.

We also found some evidence that the number of documents, compactness and stratum can also influence the maintainability of the application and the reusability of information.

The next evaluation will be both quantitative and qualitative and its aim will be to measure the development effort involved in the development of a hypermedia application using both Microcosm and the Web.
 
 

REFERENCES
 

[Adams & Jr, 97] Adams, W. J., Curtis A. Carver Jr. (1997) "The Effects of Structure on Hypertext Design", Proceedings of ED-MEDIA’97, Calgary, Canada, June.

[Basili et al., 94] Basili, V., G. Caldiera and D. Rombach (1994) "The Goal Question Metric Approach", Encyclopedia of Software Engineering, Wiley .

[Basili et al., 94] Basili, V., G. Caldiera and D. Rombach (1994) "The Goal Question Metric Approach", Encyclopedia of Software Engineering, Wiley .

[Basili & Rombach, 88] Basili, V. R. and H. D. Rombach (1988) "Towards a Comprehensive Framework for Reuse: A Reuse-Enabling Software Evolution Environment", Technical Report CS-TR-2158, Dept. of Computer Science, University of Maryland, College Park, MD 20742, December.

[Berners-Lee et al., 94] Berners-Lee, T., R. Cailliau, a. Luotonen, H. Frystyk Nielsen, and A. Secret (1994) "The World Wide Web", Communications of the ACM, 37/8:76-82, August.

[Botafogo et al., 92] Botafogo, Rodrigo A., Ehud Rivlin, and Ben Shneiderman (1992) "Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics", ACM TOIS, 10/2:143-179.

[Briand et al., 96] Briand, L., C Bunse, J Daly, C Differding "An experimental comparison of the maintainability of OO and structured design documents", Proceedings of EASE, March.

[Briand et al., 97] Briand, L., P. Devandu, M. Melo (1997) "An Investigation into Coupling Measures for C++", in Proceedings of ICSE’97, Boston, MA, USA, pp: 412-421.

[Calvi & DeBra, 97] Calvi, Licia & Paul DeBra (1997) "Using Dynamic Hypertext to Create Multi-Purpose Textooks", Proceedings of ED-MEDIA’97, Calgary, Canada, June.

[Daly, 96] Daly, J (1996) "Replication and a Multi-Method Approach to Empitical Software Engineering Research, PhD thesis, Department of Compyter Science, University of Strathclyde, Glasgow.

[Davis et al., 92] Davis, Hugh, Wendy Hall, Ian Heath, Gary Hill, and Rob Wilkings (1992) "Towards an Integrated Information Environment With Open Hypermedia Systems", Proceedings of the ACM Conference on Hypertext, Milan, Italy, pp. 181-190.

[Fenton et al., 94] Fenton, Norman, Shari Lawrence Pfleeger, and Robert L. Glass (1994) "Science and Substance: A challenge to Software Engineers", IEEE Software, July, p. 86-95.

[Fenton & Pfleeger, 96] Fenton, Norman E., and Shari Lawrence Pfleeger (1996) Software Metrics, A Rigorous & Practical Approach, Second Edition, PWS Publishing Company and International Thomson Computer Press, 2nd edition.

[Garzotto et al., 91] Garzotto, Franca, Paolo Paolini, and Daniel Schwabe (1991) "HDM - A Model for the Design of Hypertext Applications", Proceedings of Hypertext’91, ACM Press, San Antonio, Texas, December, pp. 313-328.

[Garzotto et al., 93] Garzotto, Franca, Paolo Paolini, and Daniel Schwabe (1993) "HDM - A Model-Based Approach to Hypertext Application Design", ACM Transactions on Information Systems, 11/1:1-26.

[Garzotto et al., 94] Garzotto, Franca, Luca Mainetti, and Paolo Paolini (1994) "Analysing the Quality of Hypermedia Applications: A Design-Oriented Framework", Workshop on hypermedia design and development, Edinburgh, September 18.

[Garzotto et al., 95] Garzotto, Franca, Luca Mainetti, and Paolo Paolini (1995) "Hypermedia Design, Analysis, and Evaluation Issues", Communications of the ACM, Special Issue on Hypermedia Design, August.

[Glass, 94] Glass, Robert L. (1994) "The Software-Research Crisis", IEEE Software, November, pp: 42-47.

[Harrison et al., 95] Harrison, R., L. G. Samaraweera, M. R. Dobie, and P. H. Lewis (1995) "Estimating the quality of functional programs: an empirical investigation", Inf. Softw. Technol., 37/12: 701-707.

[Hatzimanikatis et al., 95] Hatzimanikatis, A. E., C. T. Tsalidis, and D. Christodoulakis (1995) "Measuring the Readability and Maintainability of Hyperdocuments", J. of Software Maintenance, Research and Practice, 7:77-90.

[Hill et al., 95] Hill, Gary, Wendy Hall, D. De Roure, and L. Carr (1995) "Applying Open Hypertext Principles to the WWW", in Proceedings of the International Workshop on Hypermedia Design '95, Montpelier, France.

[Kitchecham, 93] Kitchenham, Barbara (1993) "DESMET METHODOLOGY: Guidelines for Evaluation Method Selection", DESMET Project Deliverable D2.3.1, The National Computing Centre Ltd, October 1993.

[Kitchenham, 96] Kitchenham, Barbara Ann (1996) "Evaluating Software Engineering Methods and Tool, Part 1: The Evaluation Context and Evaluation Methods", Software Engineering Notes, 21/1:11-15, January.

[McCall et al., 77] McCall, J.A., P. K. Richards, and G. F. Walters (1977) "Factors in Software Quality", RADC TR-77-369, 1977.

[McDonell, 91] MacDonell, S. G. (1991) "Rigor in Sofware Complexity Measurement Experimentation", in J. Systems Software, 16:141-149.

[Mendes, 97] Mendes, M. Emilia. X. (1997) "SHAPE - Southampton Hypermedia Authoring Paradigm for Education", transfer Thesis from MPhil to Ph.D., Department of Electronics and Computer Science, University of Southampton, UK.

[Mendes & Hall, 97a] Mendes, M. Emilia X. and Wendy Hall (1997) "An empirical study of hypermedia authoring for education", in Proceedings of the CAL97 Conference, Exeter, UK.

[Mendes & Hall, 97b] Mendes, M. Emilia X. and Wendy Hall (1997) "The SHAPE of Hypermedia Authoring for Education", to be published in Proceedings of ED-MEDIA & ED-TELECOM 97, Calgary, Canada.

[Preece et al., 94] Preece, Jenny, Yvonne Rogers, Helen Sharp, David Benyon, Simon Holland, and Tom Carey (1994) Human-Computer Interaction, Addison-Wesley Publ.

[Rivlin et al., 94] Rivlin, Ehud, Rodrigo Botafogo, and Ben Schneiderman (1994) "Navigating in Hyperspace: designing a structure-based toolbox", Communications of the ACM, 37(2): 87-96.

[Rivlin et al., 94] Rivlin, Ehud, Rodrigo Botafogo, and Ben Schneiderman (1994) "Navigating in Hyperspace: designing a structure-based toolbox", Communications of the ACM, 37(2): 87-96.

[Yamada et al., 95] Yamada, Shoji, Jung-Kook Hong, and Shigeharu Sugita (1995) "Development and Evaluation of Hypermedia for Museum Education: Validation of Metrics", ACM Transactions on Computer-Human Interaction, 2(4): 284-307, December.