COMPETENCE AND CONFIDENCE:
SELF-ASSESSMENT OF ACCURACY ON COGNITIVE TASKS
Dr. Seth F. Winterholler
© Seth Winterholler 2006
This dissertation is a cognitive study that examines student’s self-assessed accuracy on difficult and easy questions and correlates the data to the proportion of correct answers. It supports that individuals, while generally overconfident in their judgment of accuracy on questions, become more overconfident in their judgments as the questions become more difficult.
To my daughters, Jade and Savannah, who made the biggest sacrifices in enduring my absences and supporting my passion for psychology. Thank you for your patience and support
Seth F. Winterholler
I would like to acknowledge my parents, for supporting me through the discouraging setbacks that occurred during this project, especially my father, who passed away just prior to the publication.
Table of Contents
List of Tables vii
List of Figures viii
CHAPTER 1: INTRODUCTION 10
Introduction to the Problem 10
Background of the Study 10
Statement of the Problem 11
Purpose of the Study 11
Significance of the Study 12
Research Questions 13
Definition of Terms 14
Nature of the Study 16
CHAPTER 2: LITERATURE REVIEW 18
Introduction to the Literature Review 18
Theoretical Orientation for the Paper 19
Review of Research Literature and Methodological Literature 20
Synthesis of Research Findings 37
CHAPTER 3: METHODOLOGY 40
Purposes of the Study 40
Research Design 40
Target Population 41
Selection of Participants 41
Extraneous Variables Affecting Data 42
Data Collection 43
Data Analysis 43
Expected Findings 44
CHAPTER 4: DATA COLLECTION AND ANALYSIS 45
CHAPTER 5: RESULTS, CONCLUSIONS, AND RECOMMENDATIONS 51
Further Research 53
Perceptions and Reflections about the Process 57
APPENDIX A. INFORMED CONSENT FORM 67
APPENDIX B. CUMULATIVE CONFIDENCE SCORES 68
APPENDIX C. CONFIDENCE RESPONSE SHEET 69
APPENDIX D. COMPETED TEST INSTRUMENT 70
List of Tables
Table 1: Scatterplot of Confidence Scores 50
List of Figures
Figure 1: T-test: Paired Two Sample for Means 49
CHAPTER 1: INTRODUCTION
Background of the Study
Curiosity in an individual’s ability to evaluate the accuracy of their decisions, or metacognition, dates back more than 100 years (Fullerton & Cattell, 1892). Research into the area of confidence arises from studies in experimental cognitive psychology (Harvey, 1997). Studies show that individuals are generally overconfident in their self-assessment of confidence in proportion to the questions that they correctly answer (Lichtenstein et. al., 1982; Trafimow & Sniezek, 1994; Wright, 1982, Baranski & Petrusic, 1999; Koriat, 1998). However, if the individual is very aware of their ability the assumption is that their confidence level would be a very accurate account of performance.
Three theories suggest why individuals are generally overconfident. One states that overconfident is a trait. This is the cognitive bias/heuristics model (Kahneman & Tversky, 1996, Griffen & Tversky, 1992; Kahneman, Sloic, & Tversky, 1982). Under this theory, individuals are simply unilaterally overconfident in all decisions.
Another suggests that individuals make choices and then look to confirm their answers from environmental cues. This is the (PMM) Probabilistic Mental Model (Gigerenzer, Hoffrage, & Kleinbolting, 1991).
A final theory suggests that an individual creates a bias during the memory retrieval process (Koriat et. al., 1980). In this model, the more an individual struggles with the recall process the less confident they become. These processes are not necessarily mutually exclusive (Sniezek, et al., 1990).
While individuals may have a tendency to generally be more confident than accurate, the question exists; is there any relationship between confidence and the difficulty of the questions themselves? Is the accuracy of an individual’s confidence level salient as to their degree of overconfidence, or does the accuracy of confidence levels vary disproportionately between easy or difficult questions?
Statement of Problem
It would appear that having the capacity to accurately assess the likelihood of one’s judgments as being correct improves ones ability for successful life choices and outcomes. Logically, overconfidence could lead to poor decisions that are not based in fact, while underconfidence could lead to not fulfilling ones potential in a number of life areas. While a good deal of research has been conducted regarding the tendency for individuals to be overconfident in the accuracy of their judgments, little research has been conducted to determine if this phenomenon is unilateral between easy and difficult questions or if there is a tendency toward higher or lower confidence levels in easy verses difficult questions.
Purpose of Study
The purpose of this study was to investigate the correlation between easy and difficult questions in relation to the accuracy of self-assessed confidence ratings and to discuss possible causes, advantages and disadvantages. The answer to this question has ramifications upon how we view the accuracy of an individual’s interpretation of their ability on cognitive tasks.
Respondents being unilaterally confident in difficult and easy questions more closely supports the validity of the cognitive bias/heuristics model. Conversely, respondents demonstrating a higher degree of overconfidence in either easy or difficult questions more aptly supports the PPM and retrieval models.
The knowledge that individuals are more accurate in their confidence levels on easy or difficult questions or are unilaterally overconfident, is useful to a variety of fields. There is keen interest in the field of psychology in knowing whether participants make accurate metacognitive judgments. To know this, one must assess whether strong and weak reports of knowing, respectively, tend to be paired with accurate and inaccurate performance on a primary task. This assessment is a constant feature of human metacognition research (e.g., Hart, 1966; Metcalfe & Shimamura, 1994; Nelson, 1992; Nelson & Narens, 1980). Continuing along these lines of research, this study will explore if individuals are more accurate in their assessment of easy verses difficult questions.
In the field of business, it is important to know there is an increased probability individuals are likely to be overconfident in erroneous decisions based on the difficulty of the question. In educational settings, it may have significance to know that individuals are more overconfident in difficult questions, which could lead to the development of study or data retrieval strategies, to bring confidence levels more accurately in line with competence levels.
Significance of the Study
Individuals make judgments on a daily basis, based on the confidence of the accuracy of their mental data or cognitions. Individuals with overconfidence may optimistically believe that their judgments are accurate when they are not, thus jeopardizing the outcome of their actions based on judgments. Individuals who are underconfident may tend to be ‘stalled’ and not act based upon their underconfidence in the accuracy of their judgment.
There is limited research in metacognition concerning the effects of easy and difficulty questions on confidence judgments. The majority of studies focus on supporting the idea that individuals are overconfident in their confidence judgments, and to a lesser extent have attempted to explain a rationale for this general tendency toward overconfidence.
The patterning for overconfidence sheds light as to why individuals are overconfident and will support existing or possibly new theory, but equally important is that a pattern or lack of patterning in confidence judgments would be significant as stand-alone knowledge that can be applied to educational, cognitive, psychological and business settings.
While an abundance of data appears to exist to support that individuals are generally overconfident, there appears to be a paucity of data in regard to what would seem the natural question: Are individuals more accurate in their confidence levels at easy or difficult levels of questioning?
The research question for this study: Will overconfidence increase as questions become more difficult and performance accuracy decreases?
Conversely, the research hypothesis is, H1: Overconfidence increases as questions become more difficult and performance accuracy decreases.
The null hypothesis (Ho) is: Overconfidence does not increase as questions become more difficult and performance accuracy decreases. In this study, .05 is the rejection level or significance level. Should the probability obtained under Ho be less than or equal to .05 the null hypothesis will be rejected.
Definition of Terms
The following is a list of operational definitions and terms which are crucial to understanding this study.
- Competence: The state or quality of being correct, right or accurate.
- Confidence: The quality or state of being certain of a judgment.
- Difficulty value: “Measure of discriminating power of a test item in terms of the specified group who answer the item correctly” (Wolman, 1973, p. 400).
- Difficult Questions: Those questions that are hard to deal with, manage or overcome These questions are operationally defined, by difficulty value, as those most questions often answered incorrectly by the group. This study will utilize an item analysis of difficult questions and compare or correlate to accuracy of easy verses difficult questions.
- Easy Questions: Those questions requiring little thought effort or reflection. These questions are operationally defined, by difficulty value, as those questions most often answered correctly by the group. This study will utilize an item analysis of easy questions.
- Metacognition: Thinking about thinking, or cognition about cognition (Shields, et. al., 2005).
- Overconfidence: A belief that exceeds an accurate estimate of the correctness of a given answer. In this study overconfidence is statistically represented by the difference between the mean of the correct answers and the mean of self assessed probability of correctness, with a positive deduction.
- Underconfidence: A belief that is lower than an accurate estimate of the correctness of a given answer. In this study, underconfidence is statistically represented by the difference between the mean of the correct answers and the mean of self assessed probability of correctness, with a negative deduction.
This study did not attempt to manipulate the causes of confidence or overconfidence in individuals. It therefore cannot statistically determine causation of overconfidence or underconfidence in the accuracy of cognitive tasks, merely the existence of a correlation, followed by a discussion of possible causes.
Individuals self-report their believed accuracy on exam questions. As such the results may not be entirely accurate as an individual’s self-report is not always accurate, and some individuals may project distorted images of themselves in self-reports (Cohen & Swerdlik, 1999).
Additionally the examination was only given in English, to college students enrolled in Psychology 101. So by consequence of the convenience sample utilized, all respondents have the financial and cognitive wherewithal to attend college. The study obtained no data on race, age or culture.
Due to these limitations, cross-cultural, and age related generalizations may suffer some impact as a result of the population sampling being used.
An assumption of this study is that confidence or competence is generally trait oriented, not state oriented. Research supports that intelligence is commonly a salient quality (Dougherty & Haith, 1997; Lamp & Krohn, 1990), although emotional states may have some impact on recall ability (Kelemen & Creeley, 2001). Similarly, confidence research supports salient features of the accuracy of confidence regardless of the impact of emotional states on ability (Kelemen & Creeley, 2001; Dunlosky & Nelson, 1992, 1994, 1997; Kelemen, 2000, Kelemen & Weaver 1997).
A second assumption is that awareness of confidence/competence testing does not increase or decrease the accuracy of self-reporting. For this assumption there is no remedy, because at some level individuals must respond to report confidence levels and thus generate data that then can be analyzed. Without a response, there is no data and no study.
A third assumption is that data are not skewed by the voluntary nature of the survey. Specifically the assumption is that there is no connection between volunteering to participate in this study and having greater or lesser accuracy in self-assessment as a by-product of self-assurance/confidence. The assumption is that individuals who volunteer for studies such as this one are not more or less accurate in self-assessment of ability as a by-product of a personality trait linked to volunteering.
Nature of the Study, or Theoretical/Conceptual Framework
This is a cognitive study involving confidence and competence on multiple-choice questions. While individuals are generally overconfident in confidence judgments there is a question if overconfidence increases as questions become more difficult and performance accuracy decreases.
The study will support if individuals are unilaterally overconfident in easy and difficult questions, and then review possible theory as to why they do or do not exhibit unilateral confidence levels. The study will then review current theory including the cognitive bias/heuristics model, the PPM model, and the retrieval bias model to theorize which of these models appears to have been supported by the data.
CHAPTER 2: LITERATURE REVIEW
Introduction to the Literature Review
Overconfidence is the inaccurate belief that ones judgments are correct beyond the actual level of accuracy, and most studies suggest that individuals are overconfident in the judgments of their accuracy (Lichtenstein, Fischoff & Phillips, 1982; Koriat, Lichtenstein, & Fischhoff, 1980; Trafimow & Sniezek, 1994, Wright, 1982; Schraw & Dennison, 1994; Schraw, 1997). The phenomenon of overconfidence is generally accepted, and has been termed by some as the “illusion of knowing” (Glenberg, Wilkinson, & Epstein, 1982).
Organization. This research looks at if individuals are more accurate in their self-assessed accuracy judgments with easy or difficult questions. The following literature review discusses theory behind how individuals make confidence judgments, reviews research literature and methodological literature pertinent to confidence judgments, synthesizes previous findings and critiques some of the research methods used in previous studies.
Search items. In developing the literature review the author used the following keyword search: cognitive monitoring, confidence accuracy, confidence judgments, decision making, judgment calibration, judgments of learning, metamemory, metacognition, overconfidence, self-assessment, state-dependent memory, unskilled and unaware. The databases were drawn from Capella University Library (ebrary), EBSCOhost, ERIC Database, Psychology Journals Database, The Tennessee Electronic Library, Infotrac, Infotrac Expanded Academic ASAP, Infotrac Reference Center Gold, Infotrac Professional Collection, National Newspaper Index, Proquest Pychology Journals, E.G. Fisher Public Library and Gale Virtual Library.
Theoretical Orientation for the Paper
Three theories suggest why individuals are generally overconfident. The first suggests that individuals mistranslate their subjective beliefs into numbers. They are overconfident as a trait. This is known as the cognitive bias or heuristics model (Kahneman & Tversky, 1996, Griffen & Tversky, 1992; Kahneman, Slovic, & Tversky, 1982). Under this theory, individuals are unilaterally overconfident in decisions. According to this premise, individuals have general cognitive biases of their confidence level in any given situation, which they then use to mediate intuitive guesses.
The second theory suggests that individuals make choices and then search for additional clues to confirm their answer. This is known as the Probabilistic Mental Model (Gigerenzer, Hoffrage, & Kleinbolting, 1991). The Probabilistic Mental Model (PMM) suggests that environmental knowledge provides additional cues, each with certain cue validity. This cue validity provides a probability of accuracy, and the basis of confidence in the PMM.
The third theory suggests that an individual creates a bias during the memory retrieval process. The individual makes a choice, and creates a confidence level associated with that choice based on the ease of the retrieval of information (Koriat et. al., 1980). In this model the more an individual struggles with the recall process the less confident they become in the correctness of the data that they recall.
These processes are not necessarily mutually exclusive (Sniezek, et al., 1990). Often an individual would integrate the techniques when generating a level of confidence or overconfidence for their answer.
Standkov and associates developed findings that suggest a confidence factor in individuals that is independent and mediates the accuracy of metacognitive judgments in self-assessment (Crawford & Stankov, 1996a, 1996b; Kleitman & Stankov, 2001; Stankov, 1998, 1999, 2000). Similarly, Schraw and associates developed research findings that support confidence ratings as related to ability are associated more closely to themselves than to self-assessed accuracy scores on the same measures (Schraw, 1994, 1997; Schraw & Dennison, 1994; Schraw & Roedel, 1994).
Review of Research Literature and Methodological Literature
Review of research on the topic. Metacognition is about one’s ability to determine levels of cognitive oversight. It is taken to demonstrate an individual’s awareness of the processes of mind. Researchers link metacognitive states to subjective self-awareness because uncertainty and doubts are subjective and personal. Metacognition is taken to be one of the individual’s most sophisticated cognitive capacities (Koriat, 1993; Metcalfe & Shimamura, 1994). Confidence has been the subject of a variety of studies within the field of psychology. Pallier & Dolph, 1997).
The study of confidence judgments plays an important role in psychological studies. The subject influences areas within psychology, which include judgment and decision-making, social psychology, educational psychology, and cognitive psychology, and has created interrelated research in cognitive neuropsychology (Shimamura & Squire, 1988) and abnormal psychology (Pappas et al., 1992). Two groups of confidence judgments that have drawn great attention from the field of psychology are those associated with predictions about the future retrieval of events, called judgments of learning (JOLs), and those associated with assessments about past retrieval, called retrospective confidence judgments (RCJs). When one makes a JOL, one is assesses the possibility that they will be able to remember at some point, a particular item, concept, response or stimulus when cued with a related stimulus. When one makes an RCJ, one is assessing the possibility that what they have just recalled is indeed correct.
The conceptual distinction between JOLs and RCJs is well delineated. JOLs are thought to connect to thought processes that enable one to predict future memory performance. RCJs, conversely, are used to assess the accuracy of past retrieval, and are thought to come from a processes associated more directly with retrieval (Dougherty, 2001; Kelley & Lindsay, 1993), although some research supports that they are a similar process (Dougherty et.al 2005).
People often react to feelings of knowing and not knowing by pausing, thinking, and seeking hints, information or the reflecting on the strength of internal cues. These mechanisms form the basis of understanding of metacognition and uncertainty monitoring (Brown, Bransford, Ferrara, & Campione, 1983; Brown, 1991; Dunlosky & Nelson, 1992; Flavell, 1979; Hart, 1965; Koriat, 1993; Metcalfe & Shimamura, 1994; Nelson, 1992; Smith, Brown, & Balfour, 1991).
Research suggests that confidence levels are affected by environmental test bias (Reynolds & Brown, 1984), genetic factors (Jensen, 1985), and the location of the answer on the survey in multiple-choice questionnaires (Higgman & Gerrard, 2005; Bar-Hillel & Attali, 2002). Confidence also has a correlation to the capacity to reflect on ones answers (Cutler & Wolfe, 1989), auditory and visual factors (Horn & Stankov, 1982) and tactile factors (Roberts, Stankov, Pallier & Dolph, 1997). Higgman and Gerrard (2005) Developed research that reveals cognitive errors produced by speeded responding, causing confusability among alternative answers in the study. Respondents that were aware of a greater time constraint for answering were less accurate in their metacognitive judgments.
Caffeine, a substance that is associated with elevated mood, has an effect on memory in human participants, but not metacognition (Kelemen & Creeley, 2003). This research found no difference in self-reported confidence assessments between the participant group and the placebo group. The research suggests that while memory may be state-dependant, metacognition is not.
Duration of exposure can impact confidence levels (Memon, Hope & Bull, 2003). In short-exposure conditions, witnesses who made a correct identification of individuals involved in a simulated crime were more confident than incorrect individuals. However, in longer exposure conditions, the confidence ratings of accurate and inaccurate witnesses were not different, and theoretically, witness confidence was increased in this study when the individual was led to consider that they had a longer exposure with the individual and reasoned that they should be able to identify the individual easier (Wells & Bradfield, 1998).
Research (Pallier, 2003) also supports a mild difference in confidence ratings by gender. While the accuracy scores for males and females remain the same, males are slightly more overconfident than females.
Confidence also has a correlation to age (Baranski & Petrusic, 1999). Adults age 60 to 80 are more likely to have errors in confidence accuracy, which is theoretically associated with a reduction in cognitive resources. It is postulated these individuals rely more heavily on an automatic process (heuristic) than they do a conscious effort (Jacoby, 1999; Mandler, 1980; Searcy et al., 1999; Searcy et al., 2000).
Not all individuals are overconfident. Depressed individuals appear to have more accurate confidence levels than non-depressed individuals (Stone, Dodrill & Johnson, 2001; Alloy and Abramson, 1979), suggesting a subjective belief (depression) bias. Depressed individuals demonstrate underconfidence in their metacognitive judgments (Stone, Dodrill & Johnson, 2001). These authors suggest that lack of depression presents a distorted view of the world, where depressed individuals are both more accurate and realistic. They suggest that realism can lead to depression. They also suggest that depressed individuals have more accurate insights into the world, and doing away with their cognitive biases is counterproductive.
Research in metacognition has also compared humans and monkeys ability to determine confidence judgments. (Shields, Smith, & Washburn, 1997; J. D. Smith et al., 1995; J. D. Smith, Shields, Allendoerfer, & Washburn, 1998; J. D. Smith, Shields, Schull, & Washburn, 1997; J. D. Smith, Shields, & Washburn, 2003a, 2003b). In these studies researchers provide a range of controlled trial difficulties with an uncertainty response. Animals produced a data pattern that was identical to that which humans produced when instructed to use the uncertainty response to cope with uncertainty.
Studies (Koriat et al., 1980; Koku & Qureshi, 2004) have reduced overconfidence by asking individuals to describe why their answers may not be correct. The authors suggest that individuals may be overconfident because they focus on positive rather than negative evidence. The results suggest that individuals are overly positive and tend to focus on positive reasons why the answer given may be correct rather than reasons that their answers may be inaccurate (Koku & Qureshi, 2004).
In a particularly fascinating study by Kruger and Dunning (1999), the authors suggest that overconfidence to some degree occurs because individuals unskilled in metacognition have two problems. They both make decisions that are wrong, but because they are incompetent in metacognition, they do not have the ability to recognize they are wrong. These researchers concluded in four studies that the participants who scored in the bottom quartile on tests of humor, grammar, and logic made gross overestimates in self-assessed confidence as compared to ability. While test scores placed these respondents in the 12th percentile, the respondents self-assessed their scores to be in the 62nd percentile. The authors concluded that this miscalibration was linked to deficits in metacognitive ability, or the skill to distinguish correct answers from wrong ones. The authors argue that the skills that ground a capacity for cognition are the skills required to assess proficiency in metacognition in others. Their contention is that individuals who score poorly in these tests and overrate their ability, lack ability for metacognition. Researchers then developed a study that supports, improving the metacognitive skills of the respondents improved their metacognitive capability. This improvement in skills, assisted respondents to recognize the confines of their ability.
Review of methodological literature. The research by Kruger and Denning, Unskilled and unaware of it (1999) the authors used quantitative analysis, with a non-experimental design in the first study. The respondent group was developed from 65 Cornell University undergraduates from a variety of courses in psychology who earned extra credit for their participation. Students were given a 30-item questionnaire on “funny” jokes and professional comedians rated the jokes as being funny or not. Participants rated each joke on the same 11-point scale used by the comedians. Afterward, participants compared their “ability to recognize what's funny” with that of the average Cornell student by providing a percentile ranking. In this and in all subsequent studies, the percentile rankings were able to range from zero (the very bottom) to 50 (exactly average) to 99 (the very top).
Each participant received a percentile rank based on the extent to which his or her joke ratings correlated with the ratings provided by the professionals (with higher correlations corresponding to better performance). On average, participants put their ability to recognize what is funny in the 66th percentile, which exceeded the actual mean percentile (50, by definition) by 16 percentile points.
Authors used a paired t test to correlate the data from the study. The data was developed into mean percentages and a correlational analysis was utilized to show if there was a significant difference between sample means. The methodology is somewhat similar to the current study on Competence and confidence.
In their second study, participants were 45 Cornell University undergraduates from an introduction to psychology course. Respondents earned extra credit for their participation in the study. Data from one participant (which would have been participant 46) was eliminated because the respondent failed to complete the data set.
In this study respondents were told that the research studied logical reasoning skills. Participants completed a 20-item logical reasoning test that was created using questions taken from a Law School Admissions Test (LSAT) test guide. After taking the test, respondents made three self-assessments regarding their ability and test performance. First, they compared their “general logical reasoning ability” with that of other students from their psychology class by giving their percentile ranking. Second, they estimated how their score on the test would compare with that of their classmates, again on a percentile scale. Finally, they estimated how many test questions (out of 20) they thought they had answered correctly. The order in which these questions were asked was counterbalanced in this and in all subsequent studies.
The order in which specific questions were asked did not affect any of the results in this or in any of the studies.
Participants in this study overestimated their logical reasoning ability relative to their peers. On average, participants placed themselves in the 66th percentile among students from their class, which was significantly higher than the actual mean of 50, one-sample t(44) = 8.13, p < .0001. Participants also overestimated their percentile rank on the test, M percentile = 61, one-sample t(44) = 4.70, p < .0001. Participants did not, however, overestimate how many questions they answered correctly, M = 13.3 (perceived) vs. 12.9 (actual), t < 1. As in Study 1, perceptions of ability were positively related to actual ability, although in this case, not to a significant degree. The correlations between actual ability and the three perceived ability and performance measures ranged from .05 to .19, all ns.
Participants in the bottom quartile (n = 11) who overestimated their logical reasoning ability and test performance to the greatest extent. Although these individuals scored at the 12th percentile on average, they nevertheless believed that their general logical reasoning ability fell at the 68th percentile and their score on the test fell at the 62nd percentile. Their estimates not only exceeded their actual percentile scores, ts(10) = 17.2 and 11.0, respectively, ps < .0001, but exceeded the 50th percentile as well, ts(10) = 4.93 and 2.31, respectively, ps < .05. Thus, participants in the bottom quartile not only overestimated themselves but believed that they were above average. Similarly, they thought they had answered 14.2 problems correctly on average—compared with the actual mean score of 9.6, t(10) = 7.66, p < .0001
In the third study research was conducted in two stages. The first stage was a reproduction of the first two studies using the domain of grammar. In study 3 participants were asked to complete a test assessing their knowledge of American Standard Written English. They were also asked them to rate their overall ability to recognize accurate grammar, how their test performance compared to that of their peers, and how many items they had answered correctly on the test.
The participants were 84 Cornell University undergraduates who received extra credit toward their course grade for taking part in the study. The basic procedure was similar to those of Study 2. Participants judged whether the underlined portion of a sentence was grammatically correct or should be changed to one of four different rewordings displayed.
After completing the test, participants compared their general ability to identify grammatically correct standard English with that of other students from their class on the same percentile scale used in the previous studies. Similar to study 2, participants estimated the percentile rank of their test performance among their student peers, as well as the number of individual test items they had answered correctly.
The research in the third study suggests that participants scoring in the bottom quartile grossly overestimated their ability relative to their peers. Whereas bottom-quartile participants (n = 17) scored in the 10th percentile on average, they estimated their grammar ability and performance on the test to be in the 67th and 61st percentiles, respectively, ts(16) = 13.68 and 15.75, ps < .0001. Bottom-quartile participants also overestimated their raw score on the test by 3.7 points, M = 12.9 (perceived) versus 9.2 (actual), t(16) = 5.79, p < .0001.
As in previous studies, participants falling in other quartiles overestimated their ability and performance much less than did those in the bottom quartile. However, those in the top quartile again underestimated themselves. Whereas their test performance fell in the 89th percentile among their peers, they rated their ability to be in the 72nd percentile and their test performance in the 70th percentile, ts(18) = −4.73 and −5.08, respectively, ps < .0001. Top-quartile participants did not, however, underestimate their raw score on the test, M = 16.9 (perceived) versus 16.4 (actual), t(18) = 1.37, ns.
In study 4 participants included 140 Cornell University undergraduates from a human development course. Participants earned extra credit toward their course grades for participating. Data from 4 additional participants were deleted because they failed to complete the data measures.
Participants completed the study in groups of 4 to 20 respondents. On arriving at the laboratory, participants were told that they would be given a test of logical reasoning as part of a study of logic. The test contained ten problems from the Wason selection task. Each problem described four cards (e.g., A, 7, B, and 4) and a rule about the cards. Participants then were instructed to indicate which card or cards must be turned over in order to test the rule.
Similar to the previous studies, after taking the test, respondents were asked to rate their logical reasoning skills and performance on the test relative to their classmates on a percentile scale. They also gave a self-assessment of the number of problems they had solved correctly.
Next, a random selection of 70 participants were given a short logical-reasoning training packet. This packet described techniques for testing the veracity of logical syllogisms such as the Wason selection task. The remaining 70 participants encountered an unrelated filler task that took about the same amount of time (10 min) as did the training packet.
Afterward, participants in both selections completed a metacognition task in which they reviewed their own tests and pointed out which problems they thought they had answered accurately and which inaccurately. Participants then re-estimated the total number of problems they had answered correctly and compared themselves with their peers in terms of their general logical reasoning ability and their test performance.
Prior to training, participants displayed a pattern of results strikingly similar to that of the previous three studies. First, participants overall overestimated their logical reasoning ability (M percentile = 64) and test performance (M percentile = 61) relative to their peers, paired ts(139) = 5.88 and 4.53, respectively, ps < .0001. Participants also overestimated their raw score on the test, M = 6.6 (perceived) versus 4.9 (actual), t(139) = 5.95, p < .0001. As before, perceptions of raw test score, percentile ability, and percentile test score correlated positively with actual test performance, rs(138) = .50, .38, and .40, respectively, ps < .0001.
Once again, individuals scoring in the bottom quartile (n = 37) were oblivious to their poor performance. Although their score on the test put them in the 13th percentile, they estimated their logical reasoning ability to be in the 55th percentile and their performance on the test to be in the 53rd percentile. Although neither of these estimates were significantly greater than 50, t(36) = 1.49 and 0.81, they were considerably greater than their actual percentile ranking, ts(36) > 10, ps < .0001. Participants in the bottom quartile also overestimated their raw score on the test. On average, they thought they had answered 5.5 problems correctly. They actually answered an average of 0.3 problems correctly, t(36) = 10.75, p < .0001.
The level of overestimation decreased with each step up the quartile ladder. As in the previous studies, participants in the top quartile underestimated their ability. Whereas their actual performance put them in the 90th percentile, they thought their general logical reasoning ability fell in the 76th percentile and their performance on the test in the 79th percentile, ts(27) < −3.00, ps < .001. Top-quartile participants also underestimated their raw score on the test (by just over 1 point), but one must keep in mind they all received perfect scores.
Confidence judgments by humans and rhesus monkeys (Shields, Smith, Guttmannova,& Washburn, 2005), uses ninety-two undergraduates from the University at Buffalo, State University of New York, to participate in a single, 45-min session. Participation in the study is mandatory and fulfills a course requirement.
In this study, an IBM-compatible computer generated stimuli on a color monitor. The participants used a standard analog joystick to move a cursor to indicate their responses and confidence ratings.
Each trial consisted of a primary discrimination response and a secondary confidence rating. The task was a density discrimination test in which participants judged whether a 200- x 100-pixel box on the screen was sparsely or densely filled with illuminated pixels. If the box contained less or more than 1,164 pixels, it was defined to be a sparse or dense stimulus, correspondingly. The respondent was then to move the cursor to the letter S or to the letter D. Thirty density levels between 864 and 1,152 pixels (Levels 1-30) were designated as sparse stimuli, and 30 density levels between 1,176 and 1,569 pixels (Levels 32-61) were designated as dense stimuli. The participants made discrimination responses by moving the cursor to the D or S but received no feedback.
One second after the discrimination response, the unselected response icon (S or D) disappeared and two colored bars appeared under the selected icon. If the D was selected, then pink and blue bars appeared below the D to the left and right, respectively. If the S was selected, then pink and blue bars appeared below the S to the right and left, respectively. The participants used the pink and blue bars, to express high and low confidence about the discrimination response they had just given. After making correct responses, the pink bar earned the participants 2 points, signaled by two computers created whooping sounds and the addition of 2 points to respondents score. The blue bar earned one point that was signaled by one whooping sound and the addition of 1 point to respondent scores. After they made incorrect discrimination responses, the pink bar cost participants 2 points and two 10-second timeout penalties. The blue bar cost participants 1 point and one 6-second timeout penalty. Timeouts were accompanied by a low buzzing sound. The screen then cleared and a new trial began after reward or timeout.
The study compares accuracy and confidence scores for the 92 undergraduate students to the response patterns of two rhesus monkeys. The monkeys were given food incentives for correct answers. The results suggested that one monkey responded similar to humans and one monkey responded with the highest risk taking strategy possible.
The research uses a t test to statistically analyze the similarity in data patterns for self-assessed confidence ratings between humans and rhesus monkeys.
Depressive cognition: A test of depressive realism verses negativity using general knowledge questions (Stone, Dodrill, & Johnson, 2001), looks at the claim that depressed individuals are more realistic than non-depressed individuals. The study tested participants in an initial screening to sort out would be participants who were either not depressed or fell into a category that would include anxious individuals. This was done using the MASQ test (Watson & Clark, 1991). Participants included males and females, but in unknown proportions. All subjects participated as part of a course requirement. The research was able to identify 19 individuals who were depressed in each of three semesters, and they constitute the participant group. The study used a between subjects design, in which participant were asked to answer 40 multiple choice questions from facts derived from a standard almanac. Participants were not allowed to advance unless they answered the question at hand. The questionnaire questions all had 2 possible answers, so participants were able to assess that their confidence rate was between 50% and 100%. This study conducted all tests with two-tailed independent samples t-tests.
Additional research by Lin-Agler, Moore, & Zabrucky (2004), Effects of personality on metacognitive self-assessments, uses the t test to correlate data between personality and metacognition.
The study by Higgman and Gerrard (2005), which suggests respondents are less accurate in their metacognitive judgments when placed under a strong time constraint, used a pretest-posttest design and ANOVA for statistical analysis.
The study by Dougherty et. al Using the past to predict the future, (2001) similarly uses an ANOVA design to look at multiple variables within the research parameters.
Interestingly enough, research (Smith, 1998) has been conducted on self-assessed confidence judgments using qualitative methods in a verbal format rather than the traditional quantitative methods and numerical format.
Critique of Research Methods.
Confidence judgments by humans and rhesus monkeys (Shields, Smith, Guttmannova,& Washburn, 2005), uses ninety-two undergraduates from the University at Buffalo, State University of New York, to participate in a single, 45-min session. Participation in the study is mandatory to fulfill a course requirement, which presents a problem in that respondent’s participation is forced. Additionally there are is a methodological weakness with the study. There can be extremely limited generalizations made from the work, because only two monkeys were used for the study and only one monkey responded similar to that of human responses.
Depressive cognition: A test of depressive realism verses negativity using general knowledge questions (Stone, Dodrill, & Johnson, 2001), has an assortment of weaknesses associated with the study. The first being the small sample groups coupled with a t-test give the study limited power. The participants that the study uses are mandatory participants, and the study forces the respondents to participate in psychological testing, in which they are clinically diagnosed, as part of that mandatory participation. At that point, 19+ students are identified as depressed and used for further mandatory study. Certainly, the study has questionable ethical integrity. Academically the study has additional weaknesses. The study does support that depressed participants are less overconfident than non-depressed participants. The problem that arises is that it is common knowledge that individuals are generally overconfident. If one were to reverse the logic on the study, the flaw becomes relatively clear. If individuals were generally under-confident, and we found a study group of manic individuals, would one not expect that by virtue of their manic grandiosity they might tend to be more confident? This study does just that, suggesting that there is an advantage to depression/dysphoria.
The study by Higgman and Gerrard (2005) suggesting that respondents are less accurate in their metacognitive judgments when placed under a time constraint uses ANOVA statistical analysis. For the study that they pursued it is quite appropriate, because they sought to analyze five different aspects of confidence judgments. There are four cell frequencies: A = unchanged, initially correct items; B = unchanged, initially incorrect items; C = changed, initially-correct items; and D = changed, initially-incorrect items. Because changing an initially incorrect answer could result in either a correct or incorrect second response on any test involving more than two options, “e” is also an option for comparison. This value is a subset of D, and represents the number of initially-incorrect items that are correct after a change.
One of the weaknesses of the design is the relatively small respondent group, which varies from 23 to 25 individuals used in each of the groups. With these small numbers, the power of the study suffers and thus generalizability is impacted. This is especially true when one takes into account that the study uses only Canadian graduate students at the University of Monreal.
The study by Doughtery, et. al (2001), similarly uses the ANOVA format, but was able to recruit more respondents. There were 60 participants in the group for the first research project, and 184 respondents recruited for group in the second research project. The larger respondent group increases the power of the study substantially, however the study does suffer somewhat because all of the respondents of both groups were undergraduate students at enrolled in introductory psychology at the University of Maryland and their participation was mandatory, they participated to fulfill course requirements.
In the research by Koku and Qureshi, Overconfidence and the performance of business students on examinations (2004), the suggestion to focus on the negative, namely “describing the reasons that the respondents answers may not be correct,” makes a not-so-subliminal suggestion to respondents that the ‘answers they give are not correct.’ In this study, that suggestion may have played an active role in creating a bias toward the results the respondents gave.
In Psychodiagnostic Confidence (Smith, 1998), the author uses qualitative analysis and has respondents verbally describe their confidence levels as they develop a confidence level in a diagnosis for a client. The argument can be made for the methodology. One advantage to this methodology is that individuals prefer to communicate through description rather than numerically because they are more accustomed to using description (Menz, Druzdzel & Mazur, 1991). People also often misunderstand the numerical probabilities and the statistical interpretation of probabilities (Brun & Teigen, 1998). Beyond this, people are more familiar with the rules for language than they are with statistics and probability. Based on that familiarity they may be more accurate in their descriptions than with numeric information (Zimmer, 1983). Still the coding methods used in this study have some weakness due to the subjectivity of the two researchers who code the data, interrater reliability is not strong and there were several discrepancies between the individuals who coded the data as to where the coded data ”fit.” The power of the study suffers due to the small sample size (36 respondents). The small sample size is related to the phenomenal amount of effort required to code hours of voice recorded data from the verbiage of 36 respondents on 64 dimensions.
In the Kruger and Dunning (1999) study, Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments, the authors suggest that overconfidence to some degree occurs because individuals unskilled in metacognition make both decisions that are wrong, and since they are incompetent in metacognition, they do not have the ability to recognize they are wrong. The outcomes of the first study show a distinct overestimation in self-confidence scores between respondents in the lowest quartile. Outcomes also show a distinct underestimation in self-confidence score in the highest quartile. The comparison the authors avoid making is that the lowest quartile ability (percentage of actually correct scores) scored at the 62nd percentile and the highest quartiles group ability rated at 86th percentile. A difference of 24% between high and low groups. However, both groups estimated their scores to be at the 68th percentile. In the second study with the mean of scores being at 50 participants generally estimated their performance between 60 and 75. High scores demonstrated underconfidence and low scorers demonstrated gross overconfidence. In the third study estimates for perceived ability ran similar to the first two studies at 65 to 71, and in the fourth study self-assed accuracy runs from 55 to 76 percentile while actual accuracy for the low group is 13 and for the high group is 90.
Clearly, Kruger and Dunning’s data for self-assessed confidence scores, in all four-test studies relates more closely to themselves than to the respondents actual test scores. In these studies the self-assessed test scores are much more closely calibrated to a general figure around 68%, with individuals that score poorly being overconfident, and high test scorers being underconfident, suggesting a general cognitive bias.
With this in mind, it is difficult to support the author’s conclusions that ‘We have shown that respondents who lack the knowledge to perform well are often unaware of the fact. We attribute this lack of knowledge to incompetence in metacognitive skill. The same incompetence that leads them to make wrong choices also deprives them of the know-how necessary to recognize competence, be it their own or anyone else's.’ Actually, the “incompetent” respondents were more closely calibrated for accuracy in the first study than those who made the “right” choices. The research appears to support theory of the cognitive bias model, or possibly the heuristic model.
Synthesis of Research Findings
A possible weakness of the Koku and Qureshi (2004) research develops from the suggestion of reducing overconfidence by asking individuals to describe reasons that the respondent’s answers may not be correct. This possibly creates a bias within respondents that their answers were wrong.
The Kruger and Dunning (1999) study, recorded respondent confidence scores for both “incompetent” in the lowest quartile group and high scoring individuals and highest quartile group gave their self-assessed accuracy at 68%. In their other studies there were also closer correlations of self-assessed confidence levels to self-assessed confidence levels, rather than high or low quartile groups. This suggests that individuals are somewhat unilaterally overconfident.
Standkov has considerable findings that that suggest a confidence trait in individuals that is self-regulating and mediates the accuracy of metacognitive judgments in self-assessment (Crawford & Stankov, 1996a, 1996b; Kleitman & Stankov, 2001; Stankov, 1998, 1999, 2000). Schraw and associates also developed research findings that support confidence ratings as related to ability relate more closely to themselves than they do to self-assessed accuracy scores on the same measures (Schraw, 1994, 1997; Schraw & Dennison, 1994; Schraw & Roedel, 1994).
Many research designs in metacognitive studies utilize some version of a t test for statistical analysis. This statistical utility of the t test is used because many of the studies are correlational designs and the researcher wants to determine the strength of difference, if any, between one aspect and another. The other primary statistical tool used for analysis of data in confidence judgments is the ANOVA, which is used to compare more than two variables within the design. Qualitative designs are rarely used in studies of this type, because the types data tend to lend themselves more closely to qualitative methods.
The research by Kruger and Dunning (1999) supports that individuals may use a cognitive bias model, which then can be compensated for by strategies that either assist the individual by giving them more cues as to weather they were overconfident or underconfident. They additionally generated data that suggests, at least in one study, that confidence levels between what they deemed “incompetent,” or the lowest quartile group, and high scoring individuals or highest quartile group, both gave their self-assessed confidence rating at 68%. This suggests that individuals are unilaterally overconfident.
Standkrov (Crawford & Stankov, 1996a, 1996b; Kleitman & Stankov, 2001; Stankov, 1998, 1999, 2000) and Schraw (Schraw, 1994, 1997; Schraw & Dennison, 1994; Schraw & Roedel, 1994) both have extensive research that support confidence as being a relatively salient trait in individuals.
The research by Koku & Qureshi (2004), suggest that a depressed state may cause respondents to be less overconfident and more accurate. This may also support a heuristics or cognitive bias model that is then mediated to some degree by a lack of confidence related to depression.
Many of the studies on metacognition and confidence judgments use an ANOVA design because they are comparing multiple types of data, while other studies use a t test as they look for a difference in means between two factors. The t test is a simpler statistical tool than the ANOVA. The t test conveys if the variation between mean of two groups is significant. The ANOVA on the other hand, is useful in comparing multiple types of data.
Qualitative studies in metacognition are relatively rare, although there is some support for the use of qualitative design based on the subjective nature of confidence judgments and the greater fluency that individuals have with language over numeric data and statistical analysis. In the research used in Psychodiagnostic Confidence (Smith, 1998), the methods used in the qualitative analysis suffer somewhat from methodological difficulties that are generally associated with qualitative analysis, such as interrater reliability, the subjective nature of the data, and relatively small numbers of respondents. The advantage to the qualitative method is the depth of the description used in the data of the study, and that respondents are able to describe in their own words the events and process related to the development of their confidence judgment.
While individuals may have a tendency to generally be more confident than accurate, the question exists; is there any relationship between confidence and the difficulty of the questions themselves? Is the accuracy of confidence levels in an individual salient in their degree of overconfidence, or does the accuracy of confidence levels vary disproportionately between easy or difficult questions?
CHAPTER 3: METHODOLOGY
Purposes of the Study
The purpose of this study was to determine if overconfidence increases, as questions become more difficult. It was an examination of the difference of accuracy between self-assessed confidence levels of individuals in easy verses more difficult questions.
This was a quantitative research design of non-experimental type (Leedy, 1997). The objective was to study the phenomenon of the accuracy of self-report confidence levels as they exist in individuals, in regard to easy and difficult questions. The objective was not to manipulate confidence levels of individuals, or create changes in confidence levels, but only to record confidence levels as they exist.
This research was designed to discover if there was a difference in the accuracy of participants self-reported confidence levels between easy and difficult questions.
Participants in this study responded to a multiple-choice examination (Appendix D). The examination was a test written by the research assistant (who was the professor of the course) regarding information contained in a required lecture. The test was not a required part of the course, nor was participation in the research. Participation in the test had an ancillary advantage to the professor of the course to determine if students were retaining the material of the lecture. Participants answered a series of questions on a questionnaire style test, and on a separate sheet of paper, they included their self-assessed subjective probability that they had chosen the correct answer (Appendix D). This was their measure of confidence. Examinations were then graded for the accuracy or correctness of the answers. Confidence levels were then developed into a mean and correlated to the mean accuracy of the responses. Respondents were allowed to indicate the lowest possible confidence level at 20% (given statistical probability with five possible choices) and allowed to increase at 20% intervals up to the maximum of 100%.
Scores on the examination were used to measure performance. The research developed a difficulty value, or “measure of discriminating power of a test item in terms of the specified group who answer the item correctly” (Wolman, 1973, p. 400). From this data, “easy” and “difficult” questions were determined. Scores of confidence ratings on easy questions were then analyzed in comparison to scores of confidence ratings on difficult questions to correlate any possible differences in scores and determine if there was any difference in mean confidence ratings and accuracy in easy and difficult questions.
The sample population consisted of 98 college students from College of Southern Idaho. Of this sample group 70 individuals, or 71%, chose to participate and generate usable data. While there were no parameters, nor assumptions placed upon the survey group as to race, sex, ethnic origin or economic background; all individuals responded in English, and by consequence of the convenience sample utilized, all respondents had the financial and cognitive wherewithal to attend college.
Selection of Participants
A convenience sample of college students at College of Southern Idaho was utilized. The convenience sample of psychology students was developed in cooperation with Dr. R. Simonson, Ph.D., Assistant Professor of Psychology at the College of Southern Idaho, who acted as the research assistant. The sample consisted of students participating in his psychology classes with participation of students being on a voluntary basis. The voluntary nature of participation was explained to the students verbally and they additionally had a paper copy of informed consent (See Appendix A) that they signed confirming that they understood the voluntary nature of participation.
This is a correlational study of non-experimental type. The experimenter did not attempt to manipulate confidence scores within individuals; merely record the relationship that exists. Variables would include the accuracy scores and the confidence ratings, both of which are discrete.
Extraneous variables affecting data
Confidence could be somewhat state-related and not trait related. If so data would be scattered. Sample population may not represent general population, even within the designated age parameters. Sample population may not have reported accurately, skewing or scattering data.
The test that was utilized was related to a lecture on cognition, the lecture being a scheduled part of the introductory psychology course that the participants were enrolled in. Participants marked answers on a multiple-choice questionnaire, and marked confidence levels on a separate form (Appendix C).
All potential participants were informed of the voluntary nature of the study. The research assistant (course instructor) explained the voluntary nature of the study, distributed the informed consent forms, and described the process of including a confidence score. Respondents were administered the test. They completed and return the test questionnaire with a self-assessment of accuracy on individual items. Questions were determined as easy or difficult by the previously mentioned determination of difficulty value. Top 5 easy and top 5 difficult questions were used as easy or difficult category. The number of correct answers on the questionnaire was compared to self-assessed confidence level in the answer given. Correlation was made between greater self-assessed accuracy of easy or difficult questions.
Students were given a cognitive test via questionnaire/examination. Instructor/research assistant, at the end of testing procedure, collected questionnaires. They were graded and subsequently reviewed as to the correctness of the individual’s confidence judgments. Researcher graded the exams. Participant names were not included on the response sheet to keep participant identity anonymous from the researcher.
This research was a correlational study. Student t test for paired samples was utilized to determine the strength of difference, if any, in the accuracy of participants’ confidence levels between easy and difficult questions. The paired t test is generally used when measurements are taken from the same subject. The student t test for paired samples is the standard test for matched-pairs of data. Measures of absolute accuracy where student utilize a percentage system to connote their confidence level, have been shown to yield normal distributions under testing conditions with items of moderate difficulty, whereas gamma (a measure of relative accuracy in comparison to other test questions) yielded skewed distributions under the same conditions (Nietfeld, Enders, & Schraw, 2002). In the current research, the matched pairs of data are confidence levels on easy and difficult questions as answered from the same student.
The power calculations for the t test utilized in this research is greatly improved as a result of using a related sample. As such for a power of .80 with a significance level of .05 and a medium effect size (y=.50), only 32 participants would be required. With participation of 100 students this study will be able to have a power of .80 with a significance level of .05 and a smaller effect size (y=.28).
The purpose of this study is to determine what differences may exist in self-reported confidence levels in relation to accuracy in cognitive tasks. Generally speaking individuals tend to have overconfidence in their predictions, however the accuracy of these predictions may change depending on the difficulty of the questions themselves, or levels of overconfidence may be relatively salient, and be at a constant rate between easy and difficult questions. The expected outcome is that when individuals are more accurate they know they are more accurate, and that when they are less accurate they would tend to be less accurate at knowing that they are less accurate. With this in mind, the expected outcome of this study supports a finding that; as personal accuracy decreases, overconfidence increases.
CHAPTER 4: DATA COLLECTION AND ANALYSIS
The study was able to reject the null hypothesis and accept the hypothesis that, Overconfidence increases as questions become more difficult and performance accuracy decreases.
Of the projected possible group of 98 students enrolled in Psychology 101 from College of Southern Idaho, 71 students volunteered to participate as part of the convenience sample. These students were in attendance on the day of the testing, and chose by informed consent to participate. The students were instructed to not place their names on the test, serving the research in maintaining confidentiality of participants. Of the participants choosing to take part in the study, one individual did not respond with any confidence scores. The participant produced no data and was eliminated from the total of participants, leaving 70 participants who generated usable data. The study was able to generate data from 70 of 98 possible students/respondents or 71.4% of the whole population.
Testing took place in the College of Southern Idaho, Psychology 101 classroom. Volunteers were informed as to the nature of the study, that they were part of a convenience sample and the nature of a convenience sample. They filled out informed consent forms (Appendix A), a copy of which they retained for their records. Additionally they were verbally told the nature of informed consent by the research assistant. All participants understood that participation was voluntary and that they would receive no incentive for participation and conversely there would be no consequence for not participating. There was no time restriction on taking the test or filling out confidence scores.
Participants filled out a fifteen item multiple-choice test. Questions were determined to be easy or difficult by difficulty value: “Measure of discriminating power of a test item in terms of the specified group who answer the item correctly” (Wolman, 1973, p. 400). The study used the 5 easiest and 5 most difficult questions of the total 15 questions. At this point 1050 items were used to determine five easy and five difficult questions (70 participants X 15 questions).
An accurate answer mean (AAM) value for each of the five easy and five difficult questions was developed, by simply determining the question numbers that students most often answered wrong (difficult) and correct (easy), and then developing the accurate answer mean (AAM) and mean of self-assessed confidence (MSAC) for those items. The mean easy and difficult data is an amalgam of answers on the easy questions (70 participants X 5 questions) and difficult questions (70 participants X 5 questions), or 700 items.
The self-assessed confidence (SAC) scores (70 participants X 10 scores = 700 items) were developed into mean percentages (MSAC) for each question (5 easy and 5 difficult). Accurate answers that the participants gave were also developed into a mean of accurate answers (MAA). The mean percentage of accurate answers participants gave on the 5 easy and 5 difficult questions were then subtracted from the mean of self-assessed confidence (MSAC) scores to develop a difference in means (DM) between self-assessed confidence and accurate answers.
(MSAC – MAA = DM)
The difference in means (DM) percentages represents the participant’s accuracy in the confidence judgments they made. A scatter plot (Table 1) was completed on the amalgamated participants difference in means (DM) in the confidence judgments of the five easy and five difficult questions. The scatterplot places numeric data into two-dimensional space. Microsoft Excel was utilized for developing scatterplots (and later correlational data) to see if there was a relationship between the difference of means of easy questions and difficult questions. The Y-axis represents the percentage of accuracy. Zero on the Y-axis represents 100% accuracy, in other words there is 0% difference in between the mean of self-assessed confidence (MSAC) and the mean of accurate answers (MAA). Positive scores show overconfident mean values and negative scores show underconfident mean values. Blue represents the percentage of difference in mean accuracy on difficult questions, and pink represents the percentage of difference in mean accuracy in easy questions.
Looking at the scatterplot (Table 1), the means are clearly delineated. There is a distinct difference in means (DM) between the difficult questions shown in blue, and easy questions shown in pink. The participants, when answering questions that are easy, tend to have better calibration to accuracy in their confidence judgments. When they approach easy questions where they are more accurate, their confidence judgments are also relatively accurate. The second trend that is noticeable is that when the participants approach difficult questions, where they are less accurate, their calibration for accurate self-assessment is also significantly less accurate and distinctly overconfident. The raw data in Appendix B, cumulative confidence scores, additionally supports this trend.
This research chose to use a student t-test for paired samples. Previous research supports the use of the t test as well as the ANOVA for statistical analysis of similar metacognitive studies. An ANOVA is useful when comparing multiple data types. Had this study, for instance, looked at factors such as gender, age, ethnicity, etc., in relation to the respondent’s accuracy of confidence judgments in easy and difficult questions, an ANOVA would be more appropriate. In this study the hypothesis only looks at if there is a difference in respondent confidence accuracy between easy and difficult questions. This research chose to use a student t-test for paired samples to determine the strength of the correlation for overconfidence in easy and difficult questions. (Figure 1).
t-Test: Paired Two Sample for Means
Hypothesized Mean Difference
t Critical one-tail
t Critical two-tail
The student t test for paired samples was chosen because the data were quantitative in nature, and the hypothesis looks for differences, if any, between the difference in means for easy and difficult questions. The variables were dependant, meaning that the study did not look to manipulate confidence ratings but rather discover what confidence levels are, as they exist in individuals. The variables were categorical, as participants did not have an infinite and continuous spectrum of confidence. They scored their confidence on a scale from 20% to 100%. The samples for difficult and easy questions were taken from the same individuals, thus each participant in the study contributed 2 sets of scores of confidence levels for 5 easy and for 5 difficult questions.
In this study there were 4 degrees of freedom, given that there were 5 paired sets of data. The t-statistic was 27.326 and the t-Critical two-tail was at 2.777 (27.326>2.77) with .05 significance. Statistically this demonstrates a relatively large effect size.
Data from the study were able to support findings as follows:
1. Individuals are generally somewhat overconfident in self-assessed accuracy on both easy and difficult questions.
2. There is a correlation between self-assessed confidence scores and the difficulty of the questions answered
3. Individuals appear to be relatively well calibrated for self-assessed accuracy on easy questions.
4. Individuals appear to have relatively poor calibration for self-assessed accuracy on difficult questions.
5. That overconfidence increases when questions become more difficult and individual performance accuracy decreases.
CHAPTER 5: RESULTS, CONCLUSIONS AND RECOMMENDATIONS
The present data strongly suggest that while individuals may have a general tendency to be more confident than accurate, there is there a relationship between overconfidence and the difficulty of the questions themselves. The relationship suggested by this study supports, that as questions become more difficult, individuals are more overconfident. As a tangential subject, it appears individuals generally know that they are less accurate, but have poor calibration as to how inaccurate.
The study was able to reject the null hypothesis and accept the hypothesis: Overconfidence increases as questions become more difficult and performance accuracy decreases.
By looking at the raw data scores totals of confidence (Appendix B) respondents were not unilaterally overconfident. Total raw scores for easy questions were 22,850 and difficult questions were 19,100. These numbers seem to support the PPM and retrieval models which state that each cue for memory retrieval has a certain cue validity that provides the individual with a confidence rating based on the ease of retrieval. Thus, overconfidence is a byproduct of encoding and retrieval stages of memory. Respondents in this study knew the difficult questions were more difficult. However, overall confidence levels do not appear to have as much distinction as one might think. Respondent’s confidence levels on difficult and easy questions are relatively similar, and lends support to the cognitive bias and heuristics model, which state that errors in confidence judgments occur because of general cognitive biases or heuristics. The models for confidence judgments likely have varying degrees of interplay with each other in determining an individual’s confidence level.
The totals of the raw data scores suggest that on the difficult item questions, the participants generally knew that the questions were more difficult and that the possibility for error is greater, and likewise on the easy questions the participants knew they had a greater likelihood that their answers were correct, as seen by confidence raw score totals. In the difficult questions the confidence ratings given by participants are decidedly lower than the easy questions. However, the difference is not as substantial as one might imagine and the difference between the raw confidence score total of the five easiest questions and the raw confidence score total of the five most difficult questions is only 14.6% (67.6% – 53% = 14.6%), while the difference between the actual score total of the five easiest questions and five most difficult questions (64.4% – 10.8% = 53.6%) is 53.6%. Certainly there is a statistical difference, in confidence scores on easy and difficult questions at 14.6%, showing that individuals know when the questions are more difficult, but not a strong difference in real world terms, and not nearly as strong as the difference between confidence levels and the accuracy level of the respondents scores at 53.6%.
This finding was somewhat of a surprise to the research, and supports the findings by Schraw and associates which established theory that confidence ratings as related to ability connect more closely to themselves than they do to accuracy scores on the same measures (Schraw, 1994, 1997; Schraw & Dennison, 1994; Schraw & Roedel, 1994). It is also similar to the research by Kruger and Dunning (1999) in which self-assessed confidence levels for accurate and inaccurate response groups were both at 68% (in this study 67.6 % for easy questions and 53.6 for inaccurate). Such a relationship lends support the heuristics model for self-assessed accuracy. Respondents in this study had a high degree of what would appear to be cognitive bias, with their confidence levels being much more similar between easy and difficult questions than imagined.
Still, participants did not perceive confidence levels as being unilaterally the same on difficult and easy questions. They did seem to know generally, but not always, that difficult questions are indeed more difficult.
The t-test supports the hypothesis. Their calibration is decidedly overconfident on the difficult questions. Respondents in this study were fairly attune to knowing when they are accurate and to what degree, but they were generally overconfident and to a high degree, when questions were more difficult. This data is useful to the field of cognitive psychology and specifically metacognition as stand-alone knowledge in that individuals are grossly overconfident on difficult questions. The data suggesting miscalibration on difficult items may be useful in educational fields as well, suggesting that individuals may need to utilize additional or different strategies to assess retrieved data on difficult questions.
Beyond the simple correlation that overconfidence increases on more difficult questions, this research leads to the next logical question. Why are individuals so poorly calibrated on difficult questions? One of the reasons that overconfidence may have presented so highly in the difficult questions is suggested by the heuristics model, which states that individuals are unilaterally overconfident. Respondents were not unilaterally overconfident in this study. However, the raw cumulative scores of confidence levels on difficult questions at 53% are closer to the raw cumulative scores of confidence levels for easy questions at 67.6%, than the raw cumulative scores of confidence levels for difficult questions at 53% are to the actual cumulative scores of the difficult questions at 10.8% (Appendix B).
The extremely low accuracy level of individual’s answers on the difficult questions suggests another possibility for why individuals are so poorly calibrated on confidence judgments on difficult questions. All of the scores of answers on the difficult questions were lower than the lowest possible, statistical probable choice, which is 20%. This suggests that on the difficult questions the respondents could have randomly guessed and done better.
Theoretically, with the respondent’s scores of accuracy varying from 4 % item accuracy up to 17%, there may be another variable at play. Possibly the respondents thought that they had some glimmer of knowledge, which led them to believe one way or another, and that glimmer of knowledge was wrong more so than a random guess. This is one area for further research.
Additional research may center on how to improve the accuracy of self-assessment in cases of difficult questions or methods for respondents to develop more refined study or data retrieval and assessment techniques.
Other research might look at possible weaknesses of this design and incorporate known research variables. The location of the answer on the survey can effect confidence levels in multiple-choice questionnaires (Bar-Hillel & Attali, 2002). Research could be designed to determine if those effects had an impact on the data of the present study. Confidence also has a correlation to age (Baranski & Petrusic, 1999). With that in mind, additional research may incorporate a probability sample which would more aptly represent the general population and whose data might be more generalizable than the convenience sample used in this study. However, it is important to keep in mind that individuals generally become more overconfident with age (Jacoby, 1999; Mandler, 1980; Searcy et al., 1999; Searcy et al., 2000), and thus the college students who were participants, likely represent lower confidence ratings than an older population might.
Much of the current research analyzed and included in the literature review (chapter 2) is resigned to smaller convenience sample groups. Any study that was able to incorporate a probability sample with a large respondent group would have greater generalizability than much of the current research.
A study which compared narrative descriptions (qualitative data) to percentage ratings (quantitative data) from respondents might prove interesting. This would be valuable as a balancing measure, since available research on confidence judgments is almost entirely quantitative in nature.
Research could be developed correlating extremely easy questions to determine if respondents continued to be overconfident on easy questions where the respondents accuracy levels on easy questions are 90-100%. The research would look at if respondents were well calibrated for self-assessment of accuracy at this level, or if self assessed confidence scores on extremely easy questions remained in the 60-70% level, additionally supporting the heuristics model.
In looking at future research one should consider there are retrospective confidence judgment (RCJ) and judgments of learning (JOL) that possibly include different encoding and retrieval processes. These processes might include auditory, tactile, olfactory and visual types of data. Under retrieval bias models (Koriat et al., 1980), the retrieval of these types of data may be effected by the nature of the data itself. There is sparse information as to the possible impact the type of data has on confidence.
Research could investigate how to bring an individual’s self-assessed accuracy scores more closely in line with an individual’s accuracy ratings. Participant’s responses in the current study on confidence and competence were not at identical levels for easy and difficult questions. This study supports that self-assessed accuracy has at least some skill factor. Based on that skill factor, research into self-assessed accuracy could look at fine-tuning this skill. One possible way to fine-tune this skill, as most skills, would be to have participants practice the skill and receive feedback so they might adjust responses accordingly. Participants would self-assess their confidence levels. Researchers would correlate participant’s performance accuracy levels to participant’s self-assessed scores, and give participants feedback as to how accurate they were in this task. I would theorize, while individuals make confidence judgments all day long, they do not often analyze their precise confidence level, or for that matter, receive feedback specific to their confidence levels as they relate to their individual accuracy.
While this research supports that overconfidence increases as questions become more difficult and performance accuracy decreases, other research may focus on the question if overconfidence is necessarily a negative trait. In balancing arguments if one should bring confidence levels more closely in line with an individual’s accuracy, one tackles a bevy of social-moral debates regarding benefits and emotional, psychological, and psychosocial costs.
Overconfidence in one’s accuracy may have a variety of pitfalls, but as one discusses future research needs there is the thought that overconfidence may not always have negative connotations. Other research in confidence accuracy might look at advantages that act as reinforcers to support being overconfident, especially in embarrassing or difficult situations, leadership positions, or unknown situations where there may be a high possibility for error, or a high consequence for error. The famed military leader General George S. Patton once stated, “A good commander will never express an opinion. You can be wrong, but never be in doubt when you speak. A commander knows!” (Cohen, 1999, pg. 43). Perhaps Patton had some psychological insight. In certain situations, overconfidence may have a variety of very distinct advantages socially, financially, and as a leadership need. To this end, research one could develop research to isolate positions in which overconfidence is viewed as an asset, an expectation or a needed quality. This study could be qualitative in nature, quantitative or a mixed methods research design.
Perceptions and Reflections about the Process
The expected finding that individuals are less accurate in their metacognitions as questions become more difficult was supported by the data of the research, which was no surprise. What was a somewhat startling was to what extent in this study they were overconfident on difficult questions, at 44.2% overconfident. Conversely, it was startling that as a group the participants were so accurate in their self-assessed confidence scores on easy questions at 3.2% overconfidence.
Another aspect that was surprising was that the confidence ratings as related to ability on the difficult questions were associated more closely to the confidence levels of the easy questions than they were to the accuracy scores of the difficult questions. It supported the heuristics model. As I reflected on the results, I realized that I had included the heuristics model in the literature review more as background information that, at the time, I had felt rounded out the literature review. Moreover, I thought the heuristics model would not be well supported.
Finally, I had an odd bias related to the term miscalibration. Overconfidence is a miscalibration of an individual’s self-assessed accuracy. It is a belief that exceeds an accurate estimate of the correctness of a given answer. By definition, this is true. However, as I completed the research I realized that I had begun the research with a thinking error. It was the notion that the miscalibration of being overconfident necessarily has negative connotations for the individual. I believed that miscalibration is an error in calibration, and an error can only have negative ramifications. That assumption isn’t necessarily true. There are a multitude of situations in which individual overconfidence is most probably an asset financially, socially and emotionally. It was only as I was looking at areas for further research that I realized there are various benefits to overconfidence.
Alloy, L. B., & Abramson, L. Y (1979). Judgment of contingency in depressed and
nondepressed students: Sadder but wiser? Journal of Experimental Psychology: General, 108, 441-485.
Axelrod, A., (1999). Patton on leadership: Strategic lessons for corporate warfare. Prentice
Hall, Paramus, NJ.
Bar-Hillel, M., & Attali, Y. (2002). "The delicate art of key balancing." Or: When
randomization is too important to be trusted to chance. The American Statistician, 564, 299-305.
Baranski, J. V., & Petrusic, W. M. (1999). Realism of confidence in sensory
discrimination. Perception and Psychophysics, 61, 1369-1383.
Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning,
remembering, and understanding. In P. H. Mussen (Series Ed.) & J. H. Flavell & E. M. Markman (Vol. Eds.), Handbook of child psychology: Vol. 3. Cognitive development (4th ed., pp. 77-166). New York: Wiley.
Brown, A. S. (1991). A review of the tip-of-the-tongue experience. Psychological Bulletin, 109,
Brun, W. & Teigen, K.H. (1998). Verbal probabilities: Ambigious, context-dependant, or both?
Organizational Behavior and Human Decision Processes, 41, 390-404.
Cohen, R.J., & Swerdlik, M.E. (1999). Psychological testing and assessment. (4th ed).
Mayfield: Mountainview, CA.
Crawford, J., & Stankov, L. (1996a). Age differences in the realism of confidence
judgments: A calibration study using tests of fluid and crystallized intelligence. Learning and Individual Differences, 6, 84-103.
Crawford, J., & Stankov, L. (1996b). Confidence judgments in studies of individual differences.
Personality and Individual Differences, 6, 971-986.
Cutler, B., & Wolfe, R. (1989). Self-monitoring and the association between confidence
and accuracy. Journal of Research in Personality, 23, 410-420.
Dougherty, T.M. & Haith, M.M. (1997). Infant expectations and reaction time as predictors of
childhood speed of processing and IQ. Developmental Psychology, 33, 146-155.
Dougherty, M.R, Scheck, P., Nelson, T.O., $ Narnens, L. (2005). Using the past to predict the
future. Memory and Cognition. 33, 1096-1116.
Dougherty, M. R. (2001). Integration of the ecological and error models of overconfidence using
a multiple-trace memory model. Journal of Experimental Psychology: General, 130, 579-599.
Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for judgments for learning
(JOL) and the delayed-JOL effect. Memory & Cognition, 20, 374-380.
Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of learning (JOLs) to the
effects of various study activities depend on when the JOLs occur? Journal of Memory and Language, 33, 545-565.
Dunlosky, J., & Nelson, T. O. (1997). Similarity between the cue for judgments of learning
(JOL) and the cue for test is not the primary determinant of JOL accuracy. Journal of Memory and Language, 36, 34-49.
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-
developmental inquiry. American Psychologist, 34, 906-911.
Fullerton, G. S., & Cattell, J. M. (1892). On the perception of small differences.
Philadelphia, PA: University of Pennsylvania Philosophy Series, No. 2.
Gigerenzer, G. (1991). How to make cognitive illusion disappear: Beyond heuristics and
biases. In W. Stroebe & M. Hewstone (Eds.), European review of social psychology (Vol. 2, pp. 83--115). Chichester, UK: Wiley.
Gigerenzer, G., Hoffrage, U., & Kleinbolting, H. (1991). Probabilistic mental models: A
Brunswikian theory of confidence. Psychological Review, 98, 506--528.
Glenburg, A.M., Wilkinson, A.C., & Epstein, W. (1982) The illusion of knowing: Failure in the
self-assessment of comprehension. Memory and cognition. 10, 579-602.
Gleitman, Fridlund, & Reisberg, (2004). Psychology (6th Ed.)
Griffin, D., & Tversky, A. (1992). The weighing of evidence and the determinants of
confidence. Cognitive Psychology, 24, 411--435.
Hart, J. T. (1965). Memory and the feeling-of-knowing experience. Journal of Educational
Psychology, 56, 208-216.
Harvey, N. (1997). Confidence in judgment. Trends in cognitive science. 1, 78-82.
Higman, P.A., Gerrard, C., 2005, Not all errors are created equal: Metacognition and changing
answers on multiple –choice tests. Canadian Journal of Experimental Psychology. 59, 28-35.
Horn, J. L., & Stankov, L. (1982). Auditory and visual factors of intelligence.
Intelligence, 62(6), 165--185.
Jacoby, L. L. (1999). Ironic effects of repetition: Measuring age-related differences in memory.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 3-22.
Jensen, A. R. (1985). The nature of Black-White difference on various psychometric
tests: Spearman's hypothesis. Behavioral and Brain Sciences, 8, 193-263.
Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgments under uncertainty: Heuristics
and biases. Cambridge, UK: Cambridge University Press.
Kelemen, W. L. (2000). Metamemory cues and monitoring accuracy: Judging what you know
and what you will know. Journal of Educational Psychology 92, 800-810.
Kelemen, W. L., & Creeley, C. E. (2001). Caffeine (4 mg/kg) influences sustained attention and
delayed free recall but not memory predictions. Human Psychopharmacology: Clinical and Experimental, 16, 309-319.
Kelemen, W. L., & Weaver, C. A., III. (1997). Enhanced metamemory at delays: Why do
judgments of learning improve over time? Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1394-1409.
Kelly, C. M., & Lindsey, D. S. (1993). Remembering mistaken for knowing: ease of retrieval as
a basis for confidence in answers to general knowledge questions. Journal of Memory & Language, 32, 1-24
Kleitman, S., & Stankov, L. (2001). Ecological and person-oriented aspects of metacognitive
processes in test-taking. Journal of Applied Cognitive Psychology, 15,321-341.
Koku, P., & Qureshi, A. (2004). Overconfidence and the performance of business students on
examinations. Journal of Education for Business, 79, 217-225.
Koriat, A. (1993). How do we know that we know? The accessibility model of the feeling of
knowing. Psychological Review, 100, 609-639.
Koriat, A. (1998). Illusions of knowing: The link between knowledge and metaknowlege. In
V.Y. Yzerbyt, G. Lories, and B. Dardene (Eds.) Metacognition: Cognitive and social dimensions (pp. 16-34). London: Sage.
Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of
Experimental Psychology: Human Learning and Memory, 6, 107-118.
Krueger, J., & Denning, D., (1999). Unskilled and unaware of it: How difficulties in recognizing
one's own incompetence lead to inflated self-assessments. Journal of Personality and
Social Psychology, Vol. 77 (6), Dec. 1999. pp. 1121-1134.
Lamp, R.E. & Krohn, E.J. (1990). Stability of the Stanford-Binet fourth edition and K-ABC for
young black and white children from low income families. Journal of Psychoeduational
Assessment, 8, 139-149.
Leedy, P. (1997). Practical research: Planning and design. (pp. 189-228) Saddle River, N.J.:
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The
state of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 306-334). Cambridge, England: Cambridge University Press.
Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review,
Memon, A., Hope, L., & Bull, R (2003). Exposure duration on eyewitness accuracy and
confidence. Brit. Journ. Psych. 94, 339-355.
Merz, J.F., Druzdel, M.J. & Mazur, D.J. (1991). Verbal expressions of probability in informed
consent litigation. Medical Decision Making. 11, 253-281.
Metcalfe, J., & Shimamura, A. (1994). Metacognition: Knowing about knowing. Cambridge,
MA: Bradford Books.
Nietfeld, J. L., Enders, C. K., & Schraw, G. (2002). A Monte Carlo comparison of two
measures of monitoring accuracy. Educational and Psychological Measurement.
Nelson, T. O. (Ed.). (1992). Metacognition: Core readings. Toronto: Allyn & Bacon.
Nelson, T. O., & Narens, L. (1980). A new technique for investigating the feeling of
knowing. Acta Psychologica, 46, 69-80.
Pallier, G. (2003). Gender differences in self-assessment of accuracy on cognitive tasks. Sex
Roles: A Journal of Research, 12, 265-287.
Pappas, B. A., Sunderland, T., Weingardner, H. M., Vitiello, B., Martinson, H., & Putnam, K.
(1992). Alzheimer's disease and feelingof-knowing for knowledge and episodic memory. Journals of Gerontology: Psychological Sciences, 47, P159-P164.
Reynolds, C., & Brown, R. (1984). Perspectives on bias in mental testing. New York:
Roberts, R. D., Stankov, L., Pallier, G., & Dolph, B. (1997). Charting the cognitive
sphere: Tactile-kinesthetic performance within the structure of intelligence. Intelligence, 23, 133-155.
Schraw, G., & Roedel, T. D. (1994). Test difficulty and judgment bias. Memory and Cognition,
Schraw, G. (1997). The effect of generalized metacognitive knowledge on test
performance and confidence judgments. The Journal of Experimental Education, 65, 135-146.
Schraw, G., & Dennison, R. (1994). Assessing metacognitive awareness. Contemporary
Educational Psychology, 19, 460-475.
Searcy, J. H., Bartlett, J. C., & Memon, A. (1999). Age differences in accuracy and choosing in
eyewitness identification and face recognition. Memory and Cognition, 27, 538-552.
Searcy, J. H., Bartlett, J. C., & Memon, A. (2000). Relationship of availability, line-up
conditions and individual differences to false identification by young and older eyewitnesses. Legal and Criminological Psychology, 5, 219-236.
Shields, W., Smith, D., Guttmannova, K., & Washburn, D., (April 2005). Confidence judgments
in humans and rhesus monkeys. The Journal of General Psychology, v132 i2 165-187.
Shimamura, A. P., & Squire, L. R. (1988). Long-term memory in amnesia: Cued recall,
recognition memory, and confidence ratings. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 763-770.
Smith, D.J. (1998). Psychodiagnostic confidence. Department of Educational and Counseling
Psychology. McGill: University of Montreal.
Smith, S. M., Brown, J. M., & Balfour, S. P. (1991). TOTimals: A controlled experimental
method for studying tip-of-the-tongue states. Bulletin of the Psychonomic Society, 29, 445-447.
Smith, J. D., Schull, J., Strote, J., McGee, K., Egnor, R., & Erb, L. (1995). The uncertain
response in the bottlenosed dolphin (Tursiops truncatus). Journal of Experimental Psychology: General, 124, 391-408.
Smith, J. D., Shields, W. E., Allendoerfer, K. R., & Washburn, D. A. (1998). Memory
monitoring by animals and humans. Journal of Experimental Psychology: General 127, 227-250.
Smith, J. D., Shields, W. E., Schull, J., & Washburn, D. A. (1997). The uncertain response in
humans and animals. Cognition, 62, 75-97.
Smith, J. D., Shields, W. E., & Washburn, D. A. (2003a). The comparative psychology of
uncertainty monitoring and metacognition. The Behavioral and Brain Sciences, 26, 317-339.
Smith, J. D., Shields, W. E., & Washburn, D. A. (2003b). Inaugurating a new area of
comparative cognition research. The Behavioral and Brain Sciences, 26, 358-373.
Sniezek, J. A., Paese, P. W., & Switzer, F. S. (1990). The effect of choosing on
confidence in choice. Organizational Behavior and Human Decision Processes, 46, 264-282
Stankov, L. (1998). Calibration curves, scatterplots and the distinction between general
knowledge and perceptual tests. Learning and Individual Differences, 8, 28-51.
Stankov, L. (1999). Mining on the "no man's land" between intelligence and personality. In P. L.
Ackerman, P. C. Kyllonen, & R. D. Roberts (Eds.), Learning and individual differences: Process, trait, and content determinants (pp. 314-337). Washington, DC: American Psychological Association.
Stankov, L. (2000). Complexity, metacognition. and fluid intelligence. Intelligence, 28, 121-143.
Stone, E., Dodrill, C., & Johnson, N. (2001) Depressive cognition: A test of depressive
realism verses negativity using general knowledge questions. The Journal of Psychology, 135, 583-603.
Trafimow, D., & Sniezek, J. (1994). Perceived performing and its effect on confidence.
Organizational Behavior and Human Decision Processes, 57, 290-302.
Watson, C. S., Kellogg, S. C., Kawanishi, D. T., & Lucas, P. A. (1973). The uncertain response
in detection-oriented psychophysics. Journal of Experimental Psychology, 99, 180-185.
Wells, G. L., & Bradfield, A. L. (1998). 'Good, you identified the suspect': Feedback to
eyewitnesses distorts their reports of the witnessing experience. Journal of Applied Psychology, 83, 360-376.
Wolman, B. B. (1973). Dictionary of behavioral science. Norstrand/Reinhold. N.Y.
Wright, G. (1982). Changes in realism of probability assessments as a function of
question type. Acta Psychologica, 52, 165-174.
INFORMED CONSENT FORM
Researcher: Seth Winterholler
Participants are requested to volunteer for a study of test taking self-assessment. The title of this study is “Competence and Confidence: A Self-Assessment of Accuracy on Cognitive Tasks.” You have volunteered to participate as part of a convenience sample of students enrolled in Psychology 101. A convenience sample chooses the individuals that are easiest to reach, and by virtue of attending this course you are relatively easy to reach.
After answering questions on the keypunch form, please include your self-assessed probability that you have chosen the correct answer, on the separate sheet of paper. Please indicate the lowest possible confidence level at 20% (given statistical probability with five possible choices) and increase at 20% intervals up to the maximum of 100% (20, 40, 60, 80, 100). For example a question such as: “The War of 1812 began in:” ANSWER 1812 (100%). The length of time it will take to complete the accuracy assessment and the test is estimated to be 5 minutes.
The researcher will protect anonymity of all participants, and this data will not be released. Data will be saved in the researchers safe for 7 years and then destroyed. There is no risk in participation in this research, all participation is voluntary, withdrawing from the research has no consequences, and participation has no benefit other than adding to the general academic pool of knowledge. If you choose to participate, please print your name at the bottom of this form, sign, and date in spaces provided. The second copy of this form is for your records.
I want to thank you in advance for your participation.
Signature & Date__________________________________________________
Cumulative Confidence Scores
Cum. Easy Act % correct % Confidence %OC/UC
4980 67 71 4
4890 60 79 19
4590 64 66 2
4270 64 61 -3
4120 67 61 -6
Totals 22,850 64.4 67.6 3.2
4760 12 60 58
4060 14 58 44
3530 17 50 33
3400 7 49 42
3350 4 48 44
Total 19,100 10.8 53 44.2
Confidence Response Sheet
Confidence response sheet utilized with research by Seth Winterholler on:
Competence and Confidence: A Self-Assessment of Accuracy on Cognitive Tasks
Given the possibility of five choices, please mark your level of confidence of the accuracy of your response as 100%, 80%, 60%, 40%, or 20%, for each question answered. 100% meaning you are certain of the correctness of your answer, and 20% meaning that you have no idea as to the answer, and guessed.
Appendix D: COMPLETED TEST INSTRUMENT