Discovery simulations and the assessment of intuitive knowledge

Janine Swaak1, Ton de Jong2, & Wouter van Joolingen

1Telematica Instituut, P.O. Box 589, 7500 AN Enschede, the Netherlands, email:

2University of Twente, email:


The present work takes a thorough look at the relations between the features of discovery simulations —i.e., it is argued that discovery simulations are ‘rich’, have a relatively low transparency, and require active involvement of learners—, the learning processes elicited —i.e., it is explained how discovery simulations are suited to support data-driven, partly implicit learning—, the knowledge that results —i.e., it is reasoned that data-driven elements in discovery that are partly implicit lead to intuitive knowledge—, and the methods used to measure this knowledge. To complement this conceptual investigation, a series of five experimental studies is described. In all five studies, learners were pre- and post-tested with several theoretically grounded knowledge measures. Central to the set of tests was a test with the objective to measure intuitive knowledge. One conclusion of these experimental studies is that assignments contribute most clearly to the instructional effectiveness of simulations. Another conclusion is that the ‘intuitive knowledge tests’ seem able to measure the results of learning with discovery simulations. A third conclusion is that it is not completely clear whether these tests measure intuitive knowledge.

This contribution is about learning with discovery simulations. More precisely, the current work investigates the instructional effectiveness of discovery simulations and examines what learners do and do not gain from interacting with discovery environments.

Learning in discovery environments, such as simulation environments, is supposed to lead to the acquisition of knowledge that is qualitatively different from knowledge that is acquired in more traditional instructional situations. In traditional instruction, learning usually involves explicit transfer of a body of knowledge, with an emphasis on the analytical aspects of the domain and on the reproduction of the knowledge taught. The learning process invoked by discovery environments may be less explicit, and the resulting knowledge may have a less explicit character. Reviews of studies with discovery simulations do not show a clear and consistent picture concerning their results (see also de Jong & van Joolingen, 1998). Thomas and Hooper (1991), for example, classified and analysed simulation studies. The conclusion most relevant for the present work is that: "the effects of simulations are not revealed by tests of knowledge (....)" (p. 479). They further conclude that simulations can best serve as "experiencing programs" which give students the opportunity "to gain an intuitive understanding of the learning goal" (p. 499). Thomas and Hooper do not, however, indicate what they mean by ‘tests of knowledge’ and ‘experiencing programs,’ why and how they think an intuitive understanding is gained, and how the intuitive understanding can be assessed. In other words, there appears to be a mismatch between the context of learning (e.g., discovery simulations), the learning taking place, and the qualities of the knowledge acquired. Moreover, there is a lack of attention given to the methods assessing these qualities of knowledge.

The objective of the present work is to have a closer look at the relations between the features of discovery simulations, the learning processes elicited, the knowledge that results, and the methods used to measure this acquired knowledge.

To perform this investigation, a series of five experimental studies was carried out. In all five studies, learners were pre- and post-tested with several theoretically grounded knowledge measures. These tests were developed after a thorough examination of the nature of learning with discovery simulations and the properties of the results of discovery. Central to the set of tests was a test with the objective to measure intuitive knowledge.


Features of discovery simulations

In the current studies, discovery simulations were developed with the SMISLE and SimQuest authoring systems. The core of the instructional environments in the current work always entails a simulation of a physics topic. The goal of the learners is to discover relations between variables of the simulated domain. The physics topics are: collisions between two particles, harmonic oscillations (mass-spring system), and electrical circuits. In the simulations, learners can manipulate variables and perceive the consequences of their manipulations in dynamic outputs. Furthermore, the simulation environment of the current work is enriched with instructional measures including explanations, assignments, and model progression. ‘Explanations’ (in the studies) give a description of a variable in the domain or give an equation used in the domain. ‘Assignments’ are small exercises that aim to help students to structure their learning and point them to important phenomena in the domain. For example, some assignments ask the student to investigate the relation between two variables of the domain by manipulating them in the simulation. Other assignments require students to predict a situation based on the relation between two variables in the domain. ‘Model progression’ means that the domain is offered in small subsequent steps, where in each step new variables are added to the model. This instructional measure also offers structure for the learning process (see White & Frederiksen, 1990).

Apart from (physics) topics and instructional support that may be specific for the simulations of the current work, generic features of simulations can be discerned. This work maintains that discovery simulations have three generally valid characteristics.

1) ‘richness’ : First, it is postulated that these types of learning environments can be described as ‘rich’ environments. In a rich environment:

  1. a high amount of information can be extracted by the learner,
  2. this information can be obtained in several ways,
  3. the information is usually displayed in more than one representation; a dynamic, graphical representation of the output is generally present next to animations and numerical outputs. More specifically, this latter component can be described as ‘perceptual richness’.

2) Low transparency: A second characteristic of simulation-based environments concerns the relatively low transparency of the learning environment (as compared to text books or hypertexts etc.). The less transparent the discovery environment, the less the learner has a ‘direct view’ of the variables and relations. And consequently, the more information is to be inferred or extrapolated. Only by adding instructional measures such as explanations (see above), can the transparency of the simulation environments be enlarged.

3) Active interaction: The third characteristic of the learning context of this work involves the interactive aspect of the discovery simulations. The learning session entails an interaction with a simulation-based environment. Learners are not supposed to passively absorb information on the domain from the computer screen, but rather they are expected to perform several different actions (i.e., do experiments) to make up their own ‘meaningful’ learning session. Thus, learning with the discovery environment may accurately be described as an active ‘experience’.

Across studies, experimental conditions were created that differed with respect to the kind and amount of instructional measures available and the type of physic topic simulated. The discovery simulations are all supposed to have the general features of simulations as mentioned. The next section argues how these features affect the discovery behaviour of learners. Then it is explained how aspects of discovery learning, in turn, affect the types of knowledge students acquire.


Processes of discovery learning and intuitive knowledge

Already Bruner (1961) stressed the differences between discovery, to which he reckoned "all forms of obtaining knowledge for oneself by the use of one’s own mind" (p. 22), and expository teaching, in which the teacher presents knowledge to a student recipient (p. 23). Bruner does not restrict "discovery to the act of finding out something that before was unknown to mankind" (p. 22). He goes on to say that discovery is "in its essence a matter of rearranging or transforming evidence in such a way that one is enabled to go beyond the evidence so reassembled (…)". For discovery to be meaningful, the processes that make up the empirical cycle (see e.g., de Groot, 1961) should take place. Using processes like collecting and classifying facts, stating hypotheses, making predictions, and interpreting outputs of experiments, learners infer knowledge from the information given. This is essential, as a coherent knowledge base is not directly available in ‘discovery situations’ and knowledge is to be inferred. While inference processes are believed to play a role in all learning, they are an indispensable feature of discovery learning. The central role of inference processes distinguishes discovery from learning in which reception, assimilation, and reproduction of knowledge are considered pivotal.

Research into discovery learning is not new. A number of researchers have studied discovery learning processes in the context of computer simulations (e.g., Friedler, Nachmias, & Linn, 1990; Glaser, Schauble, Raghavan, & Zeitz, 1992 Schauble, Klopfer, & Raghavan, 1991; see de Jong & van Joolingen, 1998, for an overview), and substantial research is available on scientific reasoning and discovery skills in general (e.g., de Groot, 1961, Lawson, 1985; ). One conclusion from these studies was that two approaches to discovery learning could be distinguished: a top-down method, or concept-driven way, and a bottom-up approach or data-driven way of discovery. In the concept-driven approach (prior) knowledge plays a central role. In the data-driven approach features of the environment (e.g. a simulation interface) are of central importance.

Some scientists go one step further and couple the relative merit of either (prior) conceptions or features of the environment with two ways of information processing: As soon as features of the environment gain in importance it might be advantageous to pursue implicit learning next to explicit or reflective learning. For example, Norman (1993) reasons that some environments lead one toward an ‘experiential’ mode of learning while in other contexts learning can be characterised as ‘reflective.’ The experiential mode of learning is "one of perceptual processing (…) pattern-driven or event-driven (…)"(p. 26). The reflective mode of learning is "that of concepts, of planning and reconsideration" (p. 25). Norman argues that "rich, dynamic (…) environments (…) lead one toward the experiential learning mode"(p. 25). More specifically, Rieber (1996) explains that external representations like animations, may trigger implicit learning. Rieber (1996) explains that whereas animations may represent information and relationships resembling natural phenomena, animations usually do not make the information and relationships explicit. Rieber hypothesises, that without special effort (on the side of the learner or in the instruction), learning from animation may be implicit (p. 7).

The learning modes as described above seem also to relate to the transparency of the environment with which learners perform a task. Reber, Kassin, Lewis, and Cantor (1980) argue that for low salient (i.e., aspects of the problem are hidden) complex tasks, an implicit learning mode is best suited, while high salient, relatively simple tasks are best performed in an explicit way (see also Svendsen, 1991, for an interpretation of saliency in human-computer interaction).

In this work it is argued that the rich, low transparent, interactive simulation environment is suited to support the data-driven, partly implicit processes of discovery. In this context, the data-driven processes can more specifically be described as action- and perception-driven processes. Discovery is always a combination of concept-driven and action- and perception-driven processes. It is argued that especially the action- and perception-driven elements in discovery, that are partly implicit, lead to ‘intuitive’ knowledge.

At this point it is useful to explain that knowledge can be characterised by its amount (i.e., the amount of knowledge acquired during working with the simulation), its type, (e.g., conceptual or operational knowledge), and its qualities (de Jong & Ferguson-Hessler, 1996). Learning does not only and does not necessarily lead to the accretion of knowledge, but can also ‘tune’ the knowledge, thereby changing its qualities (e.g., Anderson, 1987, Norman, 1993). Quality, as used here, does not refer to correctness of knowledge, but to other characteristics, such as intuitivity (also referred to as compiledness or automation), level of organisation, and abstraction level (e.g., Anderson, 1987; de Jong & Ferguson-Hessler, 1996). Within this terminology, this work focuses on the intuitive quality of conceptual knowledge. The intuitive quality (or compiledness) of knowledge tells something about the access of the knowledge in memory and can usually be contrasted with the declarativeness or discreteness of knowledge (Brown, 1993). Knowledge in a declarative format is easy to make explicit, while compiled or intuitive knowledge on the other hand is hard to verbalise.

In the next section a more detailed description of intuitive knowledge and its acquisition is given.


Discovery simulations and intuitive knowledge

Despite the under-representation of serious efforts to assess intuitive knowledge, research on interacting with complex simulation systems (e.g., Berry & Broadbent, 1988; Broadbent, et al. 1986; Hayes & Broadbent, 1988; Leutner, 1993), complemented with literature on intuitive knowledge (e.g., de Groot, 1986; Fischbein, 1987; Westcott, 1968; Polanyi, 1966), sketches at least five basically stable notions on the intuitive quality of knowledge.

The first is that the intuitive quality of knowledge is only acquired after using knowledge in perceptually rich, dynamic situations. It is postulated that if knowledge is used in rich contexts, implicit learning processes are elicited which lead to intuitive knowledge. This idea is in agreement with Fischbein’s perspective on the acquisition of intuitions. He states that they "can never be produced by mere verbal learning (........) but that they "only can be attained as an effect of direct, experiential involvement of the subject in a practical or mental activity" (Fischbein, 1987, p. 95).

A second finding is that intuitive knowledge is difficult to verbalise. A rather important hypothesis is that in the interaction with a simulation environment learners are invited to follow an implicit learning mode which leads to knowledge that is hard to verbalise (for related opinions on implicit learning and resulting knowledge see Berry & Broadbent, 1984; 1988; 1990; Reber, e.g., 1989; 1993). In a similar vein, Fischbein (1987) states that intuitions are implicit. They are based on complex selection and inference processes, which are, to a large extent, believed to be unconscious to the individual.

A third feature of intuitive knowledge entails the importance of perception. Though visualisation is critical, Fischbein states that an intuition is not just a perception, but more like a theory. He calls an intuition "the analog of perception at the symbolic level." According to Fischbein the visualisation may or may not be mediated by an external representation. Fischbein, moreover, recalls that "what one cannot imagine visually is difficult to realise mentally" (p. 103). The importance of perception is underscored by de Groot (1986, p. 74) who writes that "Intuition as a function is in many respects akin to visual perception. (….). Intuitive processing appears to be based on abilities akin to pattern recognition and pattern understanding."

The role of anticipations make up the fourth notion related to intuitive knowledge (e.g., Fischbein, 1987, p. 61). With regard to anticipation, de Groot (1986, p. 71) refers to Jung’s description of intuition as "was etwas werden kann" (i.e. "what something could become") and states that "this anticipatory element is always present". He explains that "Intuitive expectations anticipate what will or may happen; intuitive judgements or evaluations anticipate the outcome of a more complete argument (…)."

The fifth concept is that the access in memory of knowledge with an intuitive quality is different from the access in memory of knowledge without this quality. It is speculated that this differential access exists next to differences in verbalisation. It is hypothesised that the action- and perception-driven elements in learning ‘tune’ the knowledge and give it an intuitive quality. Though intuitive knowledge is hard to verbalise, (and as a consequence in explicit tasks sometimes labelled ‘inert’) the access to knowledge with an intuitive quality is assumed to be ‘smoother’ than the access to knowledge with a more declarative quality. In other words, the intuitive quality causes the access to the knowledge in memory to be more efficient.

To recapitulate, active experience is indispensable for the acquisition of intuitive knowledge, and low verbalisability, perception, quickness, and anticipation are the most frequently cited observations in relation to intuitive quality of knowledge.

A certain coherence can be indicated between the characteristics of discovery simulations, implicit learning, and the acquisition and features of intuitive knowledge (see Figure 1 for an overview). Nevertheless, several questions remain unanswered. So far there is no agreement on the exact nature of the learning processes involved in the acquisition of intuitive knowledge. Even more remains unclear about the precise representation of intuitive knowledge

    characteristics of discovery simulations

    discovery learning

    acquisition of intuitive knowledge

    features of intuitive knowledge

    perceptually ‘rich’


    in rich dynamic environments

    perception, visualisation

    low transparency

    partly implicit

    extrapolation, inference partly unconscious

    hard to verbalise

    active experience involved


    direct experiential involvement, mere verbal learning not sufficient

    quickly available

    ‘quick perception of anticipated situations’

    Figure 1. Overview of postulated relations between characteristics of simulation based discovery environments, discovery learning, and the acquisition of intuitive knowledge and features of intuitive knowledge.

However, most researchers agree that, whatever the exact nature of the processes involved in the acquisition and whatever the precise representation of intuitive conceptual knowledge, the processes involved in the manifestation of the intuitive quality of knowledge can be described as a ‘the quick perception of anticipated situations.’

At this point, the work has discussed why and how features of discovery environments trigger discovery learning that is different from expository teaching, and as a result may lead to knowledge with different qualities. Also, the main construct of the current work, intuitive knowledge has been described It is generally acknowledged that defining constructs is different from measuring them (e.g., Haladyna, 1994). There is an apparent challenge in translating theories of learning into procedures to develop tests, and in validating constructs (Glaser, 1990, Nichols & Sugrue, 1997, Pellegrino, 1992, Snow, 1989; 1990, and Snow & Lohman, 1989; 1993).

In the first phase of the current work, the construct ‘intuitive knowledge’ was delineated. In the next phase, a test format to measure intuitive knowledge is developed, and empirical data are gathered following an experimental design. It should be evident that in this work a theory driven or "construct-centered" approach (Nichols & Sugrue, 1997) to assessment is taken.


The what-if test and the explicit knowledge tests

Based on the analysis of intuitive knowledge being characterised by a ‘quick perception of anticipated situations’, the test format that was developed for the present work is presented by following the words in the definition:

Quick’: Response times to the items were included as an important indicator of the degree to which conceptual knowledge has an intuitive quality. It is believed that intuitive quality tunes knowledge and reflects a more efficient access to knowledge.

Perception’: In the item format, perception is central, and contrasted with the emphasis of many other ‘traditional’ tests on verbalisation. In the items, therefore, pictures, graphical or diagrammatic representations are used accompanied by minimal necessary textual information.

Anticipated’: It is argued that anticipation is important for intuitive knowledge. The items consist of situations (see ‘Situations’) in which values of variables are given. A value is then changed and a new situation is to be predicted or anticipated.

Situations’: The items consist of a question and possible responses. In the question part, a description of a situation is given along with a change in that situation. In the response parts, descriptions of possible predicted situations are given. In other words, an item contains a situation, an action, and possible post-situations (or a condition, an action, and possible predictions). The condition-part is described by variables, which are given a value. In the action-part a value of one variable is changed and in the prediction-part possible new values of one of the variables of the condition-part are displayed. Situations constitute states of a simulated domain and are always made up of several variables.

The items, in which the ‘quick perception of anticipated situations’ is applied, are said to have a what-if format. Figure 2 displays two exemplary what-if items. In the following, the what-if format is elaborated upon, and the test procedure is explained.

    Figure 2. Exemplary what-if items.

An important aspect of the procedure is that learners are not only asked to give a correct response, but they are also required to do so as quickly as possible. In the studies, latency corresponds to the time each item is displayed on the screen. The items have multiple choice format with three response alternatives. The task is computer administered. Learners cannot go back to previously responded items. The moment learners click with the mouse on the alternative of their choice, the item disappears from the screen and the next item pops-up. Latency is measured as the time (in seconds) learners need to read and respond to the item. Latency is used as a measure of the extent to which the items are answered in an intuitive way. Quicker (correct) answers are supposed to better reflect intuitive knowledge than slower (correct) answers.

Learners are pre- and post-tested with parallel what-if tests. The parallel versions of the tests are created in such a way that a one to one mapping exists between the what-if pre-test and what-if post-test items (i.e., item 1 of the pre-version corresponded with item 1 of the post-version, item 2 of the pre- with item 2 of the post-version etc.). Parallel items cover the same content and have a similar difficulty.

Across the five studies several types of tests are used, but learners across the studies always completed, next to the what-if at pre and post-test, a definitional pre- and post-test. The tests for definitional knowledge concerned declarative conceptual knowledge. In other words, the objective was to measure discrete items of knowledge and declarative information, knowledge that was not connected and principled.

In two of the five studies, learners were additionally post-tested with hypotheses lists. On the hypotheses list, which was only presented after learners had worked with the environment, learners were confronted with pairs of variables present in the simulation. For each pair, learners had to state a relation between the variables, they thought valid. The hypotheses list aims, like the what-if tests, at the organisational level of relations between specified variables. The level of detail of the two tests can be considered identical. However, the two measures contrast with respect to the demand they place on the verbal skills of the learners, on the explicitness required, and on the format used (selected response vs. constructed response, computer-administered vs. paper & pencil).


The set-up and predictions of the studies

The main objective of the current work was to investigate the instructional effectiveness of discovery simulations. Interrelated with this objective was the goal to test the hypothesis that discovery simulations trigger the acquisition of intuitive knowledge. The intuitive knowledge, in turn, was to be measured with tests designed following the what-if format.

A first general prediction across the studies was that interacting with discovery simulations would result in gains in intuitive knowledge, as measured with the what-if tests, and not (so much) in increased explicit knowledge. The gain in intuitive knowledge was to be reflected in increased what-if correctness scores and decreased what-if item response times. A second general prediction was that the discovery simulations with more instructional measures would better support discovery learning (compared to simulations with less instructional measures) and have higher what-if test scores. It was tentatively predicted that assignments would have a positive impact on the acquisition of intuitive knowledge, and that explanations would relate to higher explicit knowledge scores, but not to higher what-if scores.

The studies investigated both convergent and discriminant validity and used different types of measures to assess the postulated constructs. Besides gathering data on acquired knowledge, all of the actions learners made while interacting with the simulation were registered. This provided data on the use of the simulation, and the use of the supportive measures that were present.

A series of five experimental studies was carried out. In the first four studies Collisions 95, Oscillations 95 and 96, and Circuit 97 simulation environments were compared that differed with respect to the type and amount of support added. In the fifth study Collisions 98 a simulation was compared with a hypertext environment. Table 1 gives an overview of the studies.

    Table 1. Overview of the studies



    level of education

    type of education

    no. of what-if items

    no. of factual items


    Collisions 95


    first year university

    biology & computer science



    Oscillations 95


    first year university

    social sciences



    students not familiar with domain of oscillations

    Oscillations 96


    first year university

    technical science, physics



    the simulation used has more explanations and feedback to assignments than the one used in Oscillations I; students had completed introductory course on dynamics

    Circuit 97


    middle vocational

    technical training



    Collisions 98




    57 of 112 worked with simulation


In the following the results of the five studies are compared. From the last study, only the results of the simulation condition are included.


The results of the studies

In the five studies, what-if pre- and post-tests were applied and gains in both correctness and item response time were measured. Also, definitional knowledge tests were used and their correctness scores were collected. In, Figure 3, Figure 4, and Figure 5 an overview is given of the results of the what-if tests and the definitional tests. To make sensible comparisons, percentages instead of raw scores are used. For Collisions II, only the results of the simulation condition (and not the hypertext condition) are included in the overviews.

In all studies a significant gain in what-if correctness scores was found. Moreover, this gain was –in terms of effect sizes– substantial in all but one study (i.e., Circuit I).

    Figure 3 Percentage of correct items on the what-if pre-test and what-if post-test.

In addition, Figure 4 shows substantial gains in what-if item response times for all studies, but Oscillations 95. Furthermore, in all studies, but Collisions 98, the effect size of either what-if correctness or what-if time gain was larger than the effect size in definitional test gain (see Figure 5).

With respect to what-if item response times, it should be noted that no standards of ‘quick’ or ‘slow’ responses or of ‘small’ or ‘large’ time gains were available. While the what-if format was used in all of the studies, features of the specific items and the particular students resulted in different average response times. Yet, a standard gain could have been obtained by having a control group of students completing the pre- and post-tests without working with a discovery simulation.

    Figure 4. Average response times on the what-if pre-test and what-if post-test items.

The what-if pre-test correctness scores in Figure 3 underscore the view that the topic of harmonic oscillations is difficult for the learners. In both Oscillations 95 and Oscillations 96, the pre-test scores are at chance level. In Oscillations 95, this low score was not surprising as the subjects were not familiar with the domain of oscillatory motion, and none of the subjects had a major in physics. However, Oscillations 96 included first year physics students who had just finished an introductory course on dynamics. Apparently, while the percentage correct on the definitional test was 67%, the course had not prepared students for the what-if test. In the other studies the what-if pre-test scores were above chance level and in all, but in Collisions 98, the what-if pre-test scores were lower than the definitional pre-test scores.

    Figure 5. Percentage correct items on the definitional pre-test and definitional post-test.

Also, the relation between correctness and item response time of the what-if test was investigated. The main measure consisted of the correlation between what-if post-test correctness scores and what-if post-test time scores computed across items, within subjects (see Table 2).

    Table 2. Overview of what-if post-test correctness x what-if post-test time correlations. For Collisions 95 the results of the simulation condition are taken; for the other studies, the results across conditions are presented.


    Collisions 95

    Oscillations 95

    Oscillations 96

    Circuit 97

    Collisions 98






    ** p < .01, * p < .05

Table 2 shows that in Circuit I a trade-off was present between correctness and time for the what-if post-test items. This trade-off indicates that quicker answers had a higher chance of being incorrect, or, inversely, that the slower answers had a higher chance of being correct. The first interpretation may mirror guessing, the second interpretation suggests that the what-if items were answered in a reflective or thoughtful manner. A trade-off is difficult to reconcile with conceptions of intuitive knowledge, as postulated in this thesis. Latency was used as a measure of the extent to which the items were answered in an intuitive way. Quicker (correct) answers were supposed to better reflect intuitive knowledge than slower (correct) answers. A trade-off between correctness and time was also found in the simulation condition of Collisions 98. For the other studies, no such trade-off was found.

In order to investigate whether the what-if tests measure another type of knowledge than explicit knowledge, the what-if correctness and time scores were correlated with scores on the definitional knowledge tests and the hypotheses lists. The main correlations are given in Table 3.

Table 3. Overview of what-if post-test correlations with the explicit knowledge measures in the five studies. For Collisions ’98 the results of the simulation condition are taken, for the other studies the results across conditions are presented.


    what-if post-test correctness x definitional test

    what-if post-test correctness x hypotheses lists

    Collisions 95



    Oscillations 95



    Oscillations 96



    Circuit 97



    Collisions 98



    ** p < .01, * p < .05

In all studies, the correctness scores of the what-if tests correlated substantially with the scores of the definitional tests. It should be noted that no consensus exists on when two measures are considered interchangeable. Usually, two measures with correlations below .75 are not treated as being interchangeable. Furthermore, part of the correlation can be explained by the identical test format of the two types of tests (i.e., overlap by variance due to identical test formats instead of identical constructs). Still, the correlations between the correctness scores of the definitional test and the what-if test for Collisions 95 and for Collisions 98 raise the question whether the tests within these studies measured sufficiently different constructs.

In Oscillations 95 and Oscillations 96 the what-if post-test correctness scores were correlated with the hypotheses lists scores. No significant correlations between the what-if scores and the number and precision of correct hypotheses were found. This low correlation may indicate that the tests measured different constructs. This is interesting and seems to be in line with the hypothesis that intuitive knowledge is acquired directly, that is, without a declarative stage. However, the variance of the hypotheses lists scores was low. In other words the low correlation may be a results of this low variance (instead of the supposed differences in measured constructs).

In an explorative way scores on the what-if test were correlated with information from the logfiles. The correlations between the what-if tests and the use of assignments, explanations and number of runs with the simulation resulted in one consistent pattern across the studies. The pattern indicates that the more assignments were performed, the quicker the what-if response times. The other correlations were not consistent across the studies.



One conclusion of these experimental studies is that assignments contribute most clearly to the instructional effectiveness of simulations. In other words, instructional support that enhances the rich, interactive, low transparent character of simulations appears to do best. Another conclusion is that the what-if tests seem able to measure the results of learning with discovery simulations. A third conclusion is that it is not completely clear whether the what-if tests measure intuitive knowledge or ‘quick perceptions of anticipated situations’. It should however be noted that a coherent theory on the acquisition of intuitive knowledge is not available, let alone clear criteria for intuitive and explicit knowledge or for implicit and explicit learning. Therefore, demonstrating the existence of intuitive knowledge is hard. Nevertheless, it is worth the effort as the idea of intuitive knowledge is appealing. Overwhelming evidence for implicit learning and intuitive knowledge in everyday life exists. The instructional field has to wait until instructional settings become as ‘rich’ and require the same activity and spontaneous involvement as real life.

This work started by referring to a review study by Thomas and Hooper (1991), who concluded that results of simulations are not revealed by "tests of knowledge (....)" (p. 479), and that simulations can best give students the opportunity "to gain an intuitive understanding of the learning goal" (p. 499). Maybe they were right. However, it should be clear by this time that if not a more theory-driven or construct-centered stance is chosen, the comparability of research results remains problematic.

New conceptions about the psychological structures and processes involved in learning should go hand in hand with research on approaches to assessment. Together they might support validation and use of these concepts in applied instructional settings. Unless this approach is taken, assessment of learning outcomes will be superficial and may severely under-represent human cognitive abilities.