Supplemental paper for Journal of Memory and Language, Vol 43(3), Oct. 2000.
Grounding Symbols and Computing Meaning:
A Supplement to Glenberg & Robertson (2000)
David A. Robertson
Arthur M. Glenberg
and
Members of the Honors Seminar
in Cognitive Psychology (1997/98)
University of Wisconsin-Madison
Arthur M. Glenberg (email)
University of Wisconsin-Madison
This paper is a supplement to Glenberg & Robertson (2000), which concluded that a high-dimensional theory of meaning (Landauer & Dumais, 1997) was unable to distinguish between meaningful and nonsense descriptions of novel situations, whereas people found it trivially easy to discriminate. Herein we report several experiments that replicate and extend these results. In particular we demonstrate that people quickly determine the meaning of novel descriptions when they are sensible, rather than having to engage in complex problem-solving, and that processing time reflected in reading times bears a lawful relationship with metacognitive judgments of sensibility. Finally, we present further evidence that Landauer & Dumais model fares no better when the materials are figurative rather than literal.
Keywords: Meaning, language, embodiment, computational models, Latent Semantic Analysis, Hyperspace Analogue to Language, metaphor, figurative language
Note: This work benefited substantially from contributions of the students in the Honors Seminar in Cognitive Psychology (1997/98) who helped with the design of the experiments, construction of the stimuli, and data collection. Those students were Christopher Amadon, Brianna Benjamin, Jennifer Dolland, Jeanette Hegyi, Katherine Kortenkamp, Erik Kraft, Nathan Pruitt, Dana Scherr, Sara Steinberg, and Brad Thiel. We are also thankful for contributions from Joshua Alexander, Michael Kaschak, and Adam Sadler. This work was supported in part by a grant to Arthur Glenberg from the University of Wisconsin-Madison Graduate School Research Committee, Project 990288, and a Nathional Institute of Mental Health (T32-MH18931) predoctoral traineeship to David Robertson.
Grounding Symbols and Computing Meaning: A Supplement to Glenberg & Robertson(2000)
Many psychological theories are based on the assumption that meaning and thought are based on the manipulation of abstract symbols, in much the same way as digital computers. This approach has been formalized in the form of two computer-based models that attempt to derive and represent meaning from statistical analyses of patterns of language use in large corpora. The Hyperspace Analogue to Language (HAL, Burgess & Lund, 1997) and the Latent Semantic Analysis model (LSA, Landauer & Dumais, 1997). Glenberg & Robertson (2000) investigated the adequacy of these high-dimensional theories both analytically and empirically and contrasted these theories of meaning with a theory that does not make use of abstract, arbitrary symbols.
This paper is a supplement to Glenberg & Robertson (2000), and includes the stimuli for the experiments reported in Glenberg & Robertson (2000) along with additional experiments. The structure of this paper is as follows. First we present two reading-time experiments designed to demonstrate that the reported ease with which participants could understand novel uses of conventional materials and linguistic innovations is reflected in processing speed as well as in off-line assessments. We assess the degree to which LSA computations can account for human performance and judgments. Then in Experiment 3 we present an experiment that assesses LSAs ability to discern meaning of figurative language.
Experiment 1
Consider the following scenario its three continuations (Afforded & Related, Afforded, and Non-Afforded).
Scenario: Phil was trying to get a barbecue going early in the morning for a tailgater. He got dizzy from blowing on the coals, but they still werent burning well.
Afforded & Related: Phil grabbed a bellows and used it to fan the fire.
Afforded: Phil grabbed a map and used it to fan the fire.
Non-Afforded: Phil grabbed a rock and used it to fan the fire.
The Afforded & Related sentence with "bellows" makes sense as does the sentence with "map." But why doesnt "rock" work? Note that all three continuations are grammatical. All three follow traditional selectional restrictions (e.g., the actor must be an animate noun). All three are easy to break into propositions. Nonetheless, when asked to rate the sensibility of the sentences on a scale of 1-7 (Glenberg & Robertson, 2000), people rated the Afforded sentence (mean rating = 4.6) to be almost as sensible as the Afforded & Related sentence (6.3), whereas the Non-Afforded sentence was clearly unacceptable (1.2). Furthermore, as demonstrated in Glenberg & Robertson (2000 Experiments 1 and 2), LSA cosines failed to predict the sensibility judgments.
One might propose that "world knowledge" is used to discriminate between sensible continuations and nonsense continuations. In some way that must be correct, but the sort of world knowledge that we propose in our theories of cognition is not sufficient. That is, often world knowledge is conceptualized as pre-stored propositions or facts such as "bellows are used to fan fires," "maps are used to find your way," and "rocks are heavy." Here are three reasons why this sort of world knowledge does not help for these examples. First, in making up the scenarios, the experimenters intentionally tried to come up with novel situations in which people would not have had relevant specific experience (world knowledge). For example, in one scenario, a character uses an upright vacuum cleaner as a coat rack, and in a second scenario a character stands on a tractor to be able to reach the top of a wall he is painting. Have you ever done those things? It is unlikely that you or anyone else has; nonetheless, the sentences seem sensible.
Second, we were able to demonstrate using the LSA procedure (Landauer & Dumais, 1997) that the important concepts in the Afforded sentence ("map" and "fire") are just as unrelated as are the important concepts in the Non-Afforded sentence ("rock" and "fire"). Across the scenarios used in the experiment, the average LSA cosine was .24 for Afforded & Related condition, indicating that "fire" and "bellows" tend to appear in similar contexts. However, the average LSA cosines were only .06 for the Afforded and .06 for the Non-Afforded conditions, indicating that words such as "map" "rock" and "fire" tend to appear in orthogonal contexts.
But, are the meanings generated in Experiment 1 of Glenberg & Robertson (2000) due to usual reading and comprehension strategies, or are they a special sort of problem-solving engaged when people face unusual sentences? For example, it may be that the sensibility of the Related sentence "Phil grabbed a bellows and used it to fan the fire." is immediately obvious. In contrast, the sensibility of the Afforded sentence "Phil grabbed a map and used it to fan the fire.," may need to be worked out on the basis of propositional world knowledge. Thus, a reader might reason, "A map is printed on sheets of paper. Paper is sometimes flexible and may be folded into many shapes, but can be held to be somewhat rigid. One shape is the shape of a fan. Things in the shape of a fan can be used to fan a fire." So it is possible that the sensibility judgments produced by subjects in Glenberg & Robertson (2000, Experiments 1 & 2) reflected effortful complex problem-solving. This experiment was designed to determine if that was the case.
In this experiment, people read the scenarios from Experiment 1 of Glenberg & Robertson (2000) one sentence at a time from a computer screen. The context-setting sentences were followed by just one of the critical sentences, and which critical sentence was counterbalanced over participants. The time to read the critical sentence was measured. If people need to engage in an inferential process in order to understand the Afforded sentences, then the Afforded sentences should be read more slowly than the Related sentences. On the other hand, if understanding even ordinary sentences requires a meshing of affordances, then time to read the Afforded and Related sentences should be comparable. Both alternatives predict slow reading time for the Non-afforded critical sentences.
Method
Participants. The 35 participants were students enrolled in Introductory Psychology classes at the University of Wisconsin-Madison. They received extra credit in exchange for their participation.
Materials and design. The materials were the 18 scenarios used in Experiment 1 that were slightly modified. At the end of each scenario we added a "continuity" sentence that helped to bring the narrative to a close even when the Non-afforded sentence had been read. In addition, after the presentation of a scenario, a question appeared, and the participant was instructed to type a brief answer to the question. These questions were intended to encourage relatively careful reading. Examples of the continuity sentences and the questions are given in Table 5. Finally, we constructed two scenarios that were used for practice.
Table1: Two Example Scenarios for Experiment 1
Setting: Marissa forgot to bring her pillow on her camping trip.
Afforded: As a substitute for her pillow, she filled up an old sweater with leaves.
Non-afforded: As a substitute for her pillow, she filled up an old sweater with water.
Related: As a substitute for her pillow, she filled up an old sweater with clothes.
Setting: Mike was freezing while walking up State Street into a brisk wind.
He knew that he had to get his face covered pretty soon or he would get frostbite.
Unfortunately, he didnt have enough money to buy a scarf.
Afforded: Being clever, he walked into a store and bought a newspaper to cover his face.
Non-afforded: Being clever, he walked into a store and bought a matchbook to cover his face.
Related: Being clever, he walked into a store and bought a ski-mask to cover his face.
Note: Central concepts are italicized; distinguishing concepts are in boldface.
Each Participant read six scenarios in each of the three conditions (Afforded, Non-afforded, and Related). Over the participants, the 18 scenarios were presented in the three conditions approximately equally often. Participants were instructed to press the space bar on the computer keyboard to advance from sentence to sentence. The time between presentation of the critical sentence and the press of the space bar was used as the dependent variable.
Results
The mean times (in seconds) to read the critical sentences were 3.68, 4.19, and 3.57, for the Afforded, Non-afforded, and Related sentences. The effect of condition was significant, F1(2, 68) = 6.65, MSE=.58. The mean for the non-afforded sentences was significantly greater than the other means, t1(35) = 3.52. However, the difference between the means for the Afforded and Related sentences was not significant, t1(35) = .66.
We performed several analyses to determine what might be accounting for the effect of condition. These analyses were performed on mean reading times computed for each sentence (rather then each participant) so that we could track contributions of conditions, frequency of the distinguishing concepts, and LSA values. First, the conditions were significantly different, F2(2, 17) = 19.57, MSE=.27. The mean for the Non-afforded sentences was significantly greater than the other means, t2(17) = 4.06, but the means for the Afforded and Related sentences did not differ, t2(17)=.70. The next analysis determined if this effect might be confounded with three other variables, number of syllables in the distinguishing concepts, log word frequency of the distinguishing concepts, and LSA cosine between the distinguishing concepts and the central concept. First, we removed between-scenario variability using a regression analysis. Second, variability associated with number of syllables, log word frequency, and LSA cosines was also removed. Third, an ANOVA was conducted on the residuals. The effect of conditions was still significant, F2(2, 51) = 7.92, MSE=.18. Also, the mean of the Non-afforded sentences was significantly greater than the other means, t2(51) = 2.85, whereas the difference between the means for the Afforded and Related sentences was not significant, t2(51) = .98.
The final analysis asked if LSA cosine values could account for reading times within the experimental conditions. We computed the correlations between reading time and LSA values separately for the Afforded, Non-afforded, and Related sentences. In addition, we computed the correlations between reading time and the Envisioning ratings from Experiment 1 of Glenberg & Robertson (2000). These data are presented in Table 2. Note that LSA was never a significant predictor of reading time, and sometimes the direction of the correlation was reversed. In contrast, the Envisioning ratings from Experiment 1 of Glenberg & Robertson (2000) predicted the reading times from in this experiment for both the Afforded and the Related sentences. Envisioning ratings failed to predict the reading times for the Non-afforded sentences, that is, those sentences which, from the participants point of view, were nonsense. The failure to obtain significant correlations between LSA values and the reading times cannot be due to lack of variability in the LSA values (see Glenberg & Robertson, 2000 Experiment 1), nor can they be due to lack of variability in the reading times given the correlations with the Envisioning ratings.
Table 2: Correlations of sentence reading times with LSA cosines and Envisioning ratings (from Glenberg & Robertson, 2000, Expt. 1). Correlations are Pearson r.
Note that reading times were correlated with envisioning ratings, but not with LSA cosines.
|
|
LSA Sentence-to-Setting |
LSA Central to Distinguishing |
Envisioning Ratings |
|
Afforded |
.02 |
.06 |
-.44* |
|
Non-Afforded |
.25 |
-.38 |
.20 |
|
Related |
.14 |
-.25 |
-.44* |
(*p = .07.)
Discussion
Participants read the Afforded sentences in about the same amount of time as they read the Related sentences. Thus, there is no reason to believe that participants were engaged in any unusual reading strategies or inferential activities when engaging with the Afforded sentences. Of course, that fits well with our intuitions: Reading about a person using a sweater stuffed with leaves as a pillow, or a newspaper to cover his face, does not seem unusual or strained. In contrast, participants read the Non-afforded sentences much slower. It is likely that when faced with these sentences, participants did have to struggle to find a meaning, and judging from Glenberg & Robertson (2000, Experiments 1 & 2), they were not always successful in this struggle.
A second result was that the LSA cosines did not significantly predict reading times within each condition. In contrast, the Envisioning ratings from Glenberg & Robertson (2000, Experiment 1) were significant predictors when the sentences made sense. An easy inference is that a person attempts to simulate (c.f., Barsalou, 2000) the meshing of affordances described by a sentence. If that simulation is successful, then the reader understands the sentence because it can be envisioned. If the simulation is not successful, then the reader tries to reinterpret the words and phrases, or simply concludes that the sentence is incomprehensible.
Experiment 2a
A hallmark of language is the principle of productivity. Speakers can form novel utterances and listeners can generally understand utterances they have never heard before. One intriguing aspect of language use is the formation of novel uses of words, such as when nouns are used as verbs, known as "denominal verbs". Denominal verbs are plentiful; Clark & Clark (1979) cataloged more than 1300 denominal verbs. Additionally they noted that new denominal verbs are often made up on the spot to serve a particular language function. Glenberg & Robertson (2000, Experiment 3) presented denominal verbs in context to participants and obtained paraphrases for the denominal verbs as well as grammaticality and sensibility judgments of the sentences containing the denominal verbs. One third of the denominal verbs used in the experiment were conventional denominal verbs, with which all fluent speakers should be familiar. Another third of the denominal verbs were innovations that are not part of standard English. The remaining third of the denominal verbs were conventional denominal verbs used in innovative manner (e.g., to bottle in "We were stoned and bottled by the spectators as we marched down the street." Clark & Clark, 1979).
Table 3: Example Stimuli for Expt 2 (and Glenberg & Robertson, 2000, Expt 3). LSA Sentence-to-Setting cosines, followed by mean paraphrase accuracy are in parentheses.
Conventional Verb (slimed), Afforded
Kenny sat in the tree house and patiently waited. He clutched the jar of green ooze in his hand, and watched the approaching school bus move closer to his house. The teenage girl stepped off and walked towards the tree house unaware of the little boy above her taking the cap off the jar. Kenny waited until she was directly beneath him, and an evil grin spread across his face. Then, Kenny slimed his sister. (.21, .99)
Semi-innovative Verb (booked), Afforded
Lori loved her new table, until she noticed that everything she placed on it slid off to the left. The left back leg was lower than all the others. She could not imagine how to fix the slant. Then she spotted a pile of hard covered books in the corner. She booked the leg. (.61, .96)
Semi-innovative Verb (booked), Non-afforded
Lori was having a really bad day. She could not find her textbook and she was late for class. Frantically, she ran over to the table where there was a pile of books. On the way, she banged her leg on the chair. She booked the leg. (.62, .13)
Innovative Verb (magazined), Afforded
Sebastian was perusing the latest issue of Newsweek when he was disturbed by a most annoying buzzing noise. He looked around the room to determine the source of this disturbance, and saw that a fly was patrolling the vicinity. Its incessant buzzing was making Sebastian insane. He had no choice but to terminate with extreme prejudice. So, he rolled up his Newsweek and waited patiently. When the fly came to rest on the coffee table in front of Sebastian, he recognized his opportunity. He magazined it. (.45, .96)
Innovative Verb (magazined), Non-afforded
Sebastian was perusing the latest issue of Newsweek. He became disturbed as he read an article about rising rates of home invasions in his vicinity. Sebastian decided to follow the advice of a security expert quoted in the magazine by purchasing a home security alarm. The salesman at the electronics store thought Sebastian was insane when he insisted on having the alarm installed that very day, but agreed when Sebastian threatened to terminate the sale. The alarm woke Sebastian when it began buzzing one evening. He recognized his opportunity.
He magazined it. (.42, .32)
Glenberg & Robertson (2000, Experiment 3) presented the denominal verbs to readers in contexts. Conventional denominal verbs were always presented in contexts that supported the conventional interpretation. The innovative and semi-innovative denominal verbs were presented in one of two types of contexts. Afforded contexts suggested a goal that could be accomplished by meshing the affordances of the object named by the denominal verb with affordances of actions and other objects in the situation being described. Non-afforded contexts were written to include many of the same words as the Afforded context but did not suggest goals that were compatible with the affordances of the objects.
The texts were rated by participants for grammaticality and sensibility. Participants also wrote paraphrases of the sentences containing denominal verbs. Paraphrases were scored on whether the paraphrase matched the intended meaning of the denominal verb. The results were that denominal verbs presented in afforded contexts were judged as sensible, whereas denominal verbs presented in non-afforded contexts were judged as nonsensical. Additionally, paraphrases from afforded contexts showed strong agreement with the intended meaning whereas paraphrases from non-afforded contexts were rarely in accordance with the intended meaning of the denominal verb. However, LSA cosines of the materials failed to distinguish between afforded and non-afforded contexts. Moreover, the LSA cosines for afforded contexts were negatively, rather than positively correlated with peoples sensibility judgments.
Here we present data from a reading time experiment using the same materials as Experiment 3 in Glenberg & Robertson (2000). The motivation is to assess whether there is a large processing time increase for people to comprehend innovative use of denominal verbs. If there were a large increase in reading times, that could indicate that people have to invoke special processing strategies to make sense of innovative words.
Method
Participants. The 38 participants were students enrolled in Introductory Psychology classes at the University of Wisconsin-Madison. They received extra credit in exchange for their participation.
Procedure. Materials and design were as in Glenberg & Robertson (2000) Experiment 3. Stimuli were presented to subjects sentence-by-sentence in the center of a computer screen. Participants pressed a key to advance the screen and were instructed to read at their normal pace. At the end of each text, participants were asked to make sensibility judgments as in Glenberg & Robertson (2000) Experiment 3. Reading times were recorded.
Data Processing. Because sentence lengths varied for the critical sentences involving the three denominal verb types, it was necessary to account for this in the statistical analyses. So instead of basing the analysis on participants mean reading times for materials in the conditions, we computed length-adjusted reading times as followed. Sentence-lengths were mean-centered across conditions and were used as the predictor for a least-squares line for each condition on a within-subjects basis. The intercept then is an estimated reading time for that subject in that experimental condition for a sentence of average length in the experiment. These intercepts were used as length-adjusted reading times and were subjected to analysis of variance.
Results and Discussion
In contrast to predictions from LSA cosines, length-adjusted mean reading times showed differences depending on context [F(4, 148)=28.53, MSE=1033382]. Additionally, sensibility ratings replicated the findings of Glenberg & Robertson (2000). Furthermore, sensibility ratings were related to reading times.
Innovative denominal verbs and semi-innovative denominal verbs were read much quicker when presented in afforded contexts (Fs = 40.62 and 29.09, respectively). In fact semi-innovative verbs in the afforded contexts were not reliably slower than conventional denominal verbs, F(1, 37)=1.90. Innovative denominal verbs were read somewhat slower than conventional denominal verbs even in the afforded context [F(1, 37) = 6.64, p<.02], although this effect was smaller than the effect of context manipulation [F(1, 37)=4.87, p<.03].
Table 4: Mean sensibility ratings and length-adjusted reading times from Experiment 2 (estimated standard error of the mean in parentheses). Sensibility ratings were made on a scale of 1 (virtual nonsense) to 7 (completely sensible).
|
|
Sensibility Ratings |
|
|
Conventional |
6.44 (.08) |
2563 (117) |
|
Non-afforded Semi-innovative |
2.59 (.20) |
4142 (225) |
|
Afforded Semi-innovative |
4.87 (.19) |
2884 (178) |
|
Non-afforded Innovative |
2.63 (.18) |
4650 (325) |
|
Afforded Innovative |
5.54 (.20) |
3164 (183) |
Sensibility judments replicated the findings of Glenberg & Robertson (2000). Sentences with denominal verbs were rated as much more sensible in afforded contexts than in non-afforded contexts (F(1, 37)= 617.33. However, the LSA cosines are equivalent for the afforded and non-afforded contexts.
Sensibility ratings were related to reading times. We computed within-subject correlations between reading times and participants subsequent sensibility ratings. The average within-subject correlation was -.25, which is moderate, but significantly different from zero (t=-5.59, p<.001). The average amount of time taken by subjects to make their sensibility ratings was marginally negatively related to their ratings (-.09, t=-2.16, p=.05). Thus metacognitive judgments of sensibility are related to on-line measures of processing speed.
Experiment 2b
The results of Experiment 3 in Glenberg & Robertson (2000) do not indicate whether the Afforded context facilitates interpretation of the Innovative denominal verbs or whether the Non-afforded context confuses the participants. Experiment 2b was conducted to decide between these alternatives. The participants followed the same instructions as for Experiment 3 in Glenberg & Robertson (2000), but the 18 critical sentences were presented without any context. According to the Indexical Hypothesis (Glenberg & Robertson, 1999), the Afforded context allows interpretation of the innovative denominal verbs. Hence, when the sentences are presented in a No-context condition (in Experiment 2b) performance (e.g., Sensibility ratings) should drop. There should be little difference between the Non-afforded context condition (in Glenberg & Robertson, 2000, Expt 3) and the No-context condition (in Experiment 2b): The affordances necessary for interpretation are missing in both cases.
Method
Participants. The 18 participants were students enrolled in Introductory Psychology classes at the University of Wisconsin-Madison. They received extra credit in exchange for their participation.
Materials. The 18 critical sentences were identical to those used in Glenberg & Robertson (2000). The instructions to the participants were identical, except that references to "the last sentence" were deleted.
Results and Discussion
The data of main interest are presented in Table 5. Consider first the data for the Conventional verbs. Because the meaning of these verbs is known and we used the verbs in their standard sense, interpretation should not depend greatly on appearing in context. In fact, the data in the No-context condition (Experiment 2b) are similar to the data from the Afforded condition in Glenberg & Robertson (2000) Experiment 3.
Table 5
Sensibility Ratings, Grammaticality Ratings, and Paraphrase Scores for Experiments 2b and Glenberg & Robertson (2000, Expt. 3). Standard Errors are in parentheses.
|
Sensibility Rating |
Grammaticality Rating |
Paraphrase Accuracy |
|
Conventional verbs |
|||
|
Afforded |
5.67 (.12) |
5.22 (.18) |
.99 (.01) |
|
No Context |
5.23 (.20) |
5.33 (.26) |
.77 (.05) |
|
Semi-innovative verbs |
|||
|
Afforded |
3.78 (.27) |
4.18 (.26) |
.96 (.02) |
|
Non-Afforded |
2.29 (.21) |
4.06 (.25) |
.13 (.03) |
|
No Context |
2.26 (.15) |
4.44 (.48) |
.21 (.03) |
|
Innovative verbs |
|||
|
Afforded |
4.12 (.24) |
4.18 (.24) |
.96 (.02) |
|
Non-Afforded |
2.06 (.16) |
3.92 (.27) |
.32 (.03) |
|
No Context |
2.69 (.19) |
4.42 (.39) |
.37 (.05) |
Next, consider the data for the Semi-innovative and Innovative verbs. According to the Indexical Hypothesis, the meaning of innovative denominal verbs is determined by meshing the affordances of the base noun with other affordances and the action-based goals to produce a coherent conceptualization. If the other affordances and the action-based goals are not specified (as in the No-context condition), performance should suffer. As predicted by the hypothesis, performance in the No-context condition is far below performance in the Afforded condition for Sensibility ratings and for the paraphrases for both the Semi-innovative and the Innovative verbs. In contrast, there are much smaller differences between the No-context condition and the Non-afforded condition. Hence, we can conclude that the Afforded context facilitates interpretation of innovative verbs, as opposed to the Non-afforded context interfering with interpretation.
Experiment 3
The stimuli in Glenberg & Robertson (2000) all referred to relatively concrete situations and objects. Burgess and Lund (1997) have claimed that one of the advantages of HAL and other high-dimensional theories of meaning is that they ground abstract concepts in exactly the way they ground concrete concepts: in the language stream. Might the theories fare better if the experimental stimuli used more abstract language, or were about more abstract concepts? To answer these questions we conducted an experiment using proverbs. We began with 14 proverbs taken from Gibbs, Strom, and Spivey-Knowlton (1997) who had chosen proverbs that were highly familiar. We added the proverb, "A stitch in time saves nine" for a total of 15 proverbs. For each proverb we wrote a Paraphrase sentence (modeled on the "Figurative Definitions" given in Gibbs et al.s Table 3), an Opposite sentence, that is a sentence whose meaning we judged was opposite to the meaning of the proverb, and an Abstract Literal sentence (based loosely on the "Literal Alternatives" in Gibbs et al.s Table 2). The abstract literal sentence was meant to convey the same meaning as a literal interpretation of the proverb, but using more abstract concepts than those used in the proverb. Two examples of the sets of sentences are presented in Table 6.
Table 6: Example Stimulus Set for Experiment 3 (LSA Sentence-to-Proverb Cosines are in Parentheses)
Proverb: Scratch my back and I'll scratch yours.
Paraphrase: If you do me a favor I will do you a favor in return. (.42)
Opposite: If you are nice to me I will take advantage of you. (.46)
Abstract literal: If you alleviate my skin irritation I will do the same for you. (.43)
Proverb: Don't throw the baby out with the bath water.
Paraphrase: In your haste to complete a task, don't forget the main goal. (.36)
Opposite: Don't worry about the main goal while working on the process. (.37)
Abstract Literal: Don't toss the contained object out with the container. (.36)
We wrote the three types of sentences to minimize differences in the LSA cosines comparing each sentence with the proverb. Thus, from the LSA point of view, the similarity between the sentences and their proverbs were equated. The question of interest was whether people would also judge the similarity of the sentences to the proverbs as equivalent.
Method
Participants. The 27 participants were students enrolled in Introductory Psychology classes at the University of Wisconsin-Madison. They received extra credit in exchange for their participation.
Materials. As described above, we wrote a Paraphrase sentence, an Opposite sentence, and an Abstract Literal sentence for each of the 15 proverbs. For each proverb, we determined the LSA cosine between the proverb and each of the three critical sentences. The mean LSA cosines for the Paraphrase, Opposite, and Abstract Literal sentences were, .20, .21, and .21, respectively. The differences were not significant, F < 1, MSE = .001.
Three forms were constructed. On the first form, the proverbs were typed in an arbitrary order, and following each proverb were the three sentences. Across the 15 proverbs on the form, the three conditions appeared approximately equally often as the first, second, and the third sentences. The order of the proverbs was the same on all of the forms, but order of the three sentences following each proverb was counterbalanced over the three forms.
The participants were instructed to "First, read the proverb and get a good sense of what it means. For example, if a friend said the proverb to you, what would your friend be trying to communicate? Then, read each of the statements following the proverb and judge how well the statement conveys the same meaning as the proverb." Participants rated the similarity using a scale from 1 (no similarity) to 7 (exact similarity).
Results
The mean similarity ratings were 6.34, 1.63, and 2.78 for the Paraphrase, Opposite, and Abstract Literal sentences, respectively. These means were significantly different, F1(2, 52) = 215.15, MSE = .65, F2(2, 28) = 123.57, MSE = .73. People have no difficulty discriminating the similarities of the sentence types, although LSA suggests there are no differences in similarity.
We also correlated the LSA cosines and the average ratings. For the Paraphrase, Opposite, and Abstract Literal conditions, the correlations were .22, .05, and -.37, respectively. None of the correlations was significant. Failure to find significant correlations was not because of little variability in the cosines. In fact, the LSA cosines were highly correlated amongst themselves, ranging from .97 (correlation between the Opposite and Abstract Literal cosines across the 15 proverbs) to .98 (correlation between the Paraphrase and Abstract Literal cosines). Similarly, the ratings were sensibly intercorrelated. Namely, the correlation between the ratings for the Paraphrase and Opposite conditions was -.66, and the Abstract Literal condition was essentially uncorrelated with the Paraphrase ratings (r = -.06). Thus, there is stable variability in both the LSA cosines and the ratings. It is simply the case that the LSA cosines do not predict the human ratings.
Discussion
Although LSA could not discriminate between the various conditions, people had no difficulty doing so. Furthermore, within each condition, the LSA cosines (supposedly a measure of similarity of meaning) were uncorrelated with the judged similarity of meaning. To the extent that the proverbs and their paraphrases are describing abstract situations (e.g., in your haste to complete a task, don't forget the main goal), we have evidence that high-dimensional theories do not adequately ground abstractions.
The results also present a challenge to embodied theories of meaning and the Indexical Hypothesis (Glenberg & Robertson, 1999). Namely, the set of actions consistent with a literal reading of the proverb is more similar to the set of actions consistent with the Abstract Literal sentence than with the Paraphrase sentence. Nonetheless, the Paraphrase sentence was judged to have the more similar meaning. We think that the results can be explained, but the explanation requires some speculation. First, we propose that proverbs are often used to ground abstract ideas; that is what makes proverbs valuable. For example, the relatively abstract concept "in your haste to complete a task, don't forget the main goal" is difficult to represent in an action-based framework because there are few prototypical actions associated with "complete a task" and "main goal." Thus, the abstract concept is difficult to understand. As an aid to understanding this abstract concept, we have been taught a way to ground it in a concrete situation, "Dont throw the baby out with the bath water." This proposal is similar to Lakoffs (1987) analysis of language describing abstract concepts. For example, according to Lakoff we understand the abstract concept "love" in terms of more concrete, grounded metaphors such as "love is a journey," and thinking in terms of this metaphor is what accounts for language about love such as, "They are at the beginning of their relationship," or "Their relationship is rocky [like a rough road]." As another example, Lakoff suggests that the abstract logical concept "p or not-p" is grounded in the concrete understanding of containers, so that something is either inside (p) or not-inside (not-p) the container.
Second, this proposal requires that we be able to recognize a proverb as a proverb so that it can be treated non-literally. This requirement is easily met given the frequency of proverbs and their pithy, easy-to-remember phrasing. In fact, Giora and Fein (1998) have demonstrated that familiar figurative language may be processed both literally and more figuratively early in the derivation of meaning, and then the contextually correct interpretation is selected. Also, an unfamiliar proverb-like metaphor can be interpreted in one way or in the opposite way (Keysar & Bly, 1995).
Third, the proposal requires that we be able to compare an embodied representation produced in understanding the proverb with the representation of the current situation to know that the proverb is apt. Barsalou (in press) describes in detail how one set of perceptual symbols can be compared to another.
General Discussion
The results of these experiments are obvious to humans (to the point that reviewers suggested they could be omitted from the main paper). However, the results of these experiments are hopelessly opaque to LSA cosines. Where LSA fails to discern sensible from nonsensical, our subjects effortlessly read the afforded passages and stumbled over the non-afforded passages.
For a full theoretical treatment of why LSA failed, and why any theory similar to LSA is doomed to fail, please see the parent paper (Glenberg & Robertson, 2000). Understanding figurative language and innovations in language is an intrinsic part of language capacity and should be included in theories of language processing. Also, as demonstrated in Experiment 1, people can also make sense of situations in which common objects are used in novel ways. So intelligence and meaning are not so much based on what has been said before (the basis of HAL and LSA), but rather what you can do with the objects (more specifically the affordances of the objects) being described.
References
Barsalou, L. (in press) Perceptual symbol systems. Behavioral and Brain Sciences.
Burgess, C., & Lund, K. (1997). Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes, 12, 177-210.
Clark, E. & Clark, H.H. (1979). When nouns surface as verbs. Language, 55, 767-811.
Gibbs, R. W., Strom, L. K., & Spivey-Knowlton, M. J. (1997). Conceptual metaphors in mental imagery for proverbs. Journal of Mental Imagery, 21, 83-110.
Giora, R., & Fein, O. (1998, July). Familiar and less familiar ironies: The graded salience hypothesis. Paper presented at the eighth annual meeting of the Society for Text and Discourse, Madison, WI.
Glenberg, A.M., & Robertson, D.A. (2000). Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning. Journal of Memory and Language.
Glenberg, A.M., & Robertson, D.A. (1999). Indexical understanding of instructions. Discourse Processes.
Keysar, B., & Bly, B. (1995). Intuitions of the transparency of idioms: Can one keep a secret by spilling the beans? Journal of Memory and Language,34, 89-109.
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.
Appendix A: Materials for Experiment 1
Appendix B: Materials for Experiment 2a and 2b