CatTrax Diagnostics Version 1.43
For further info, please contact: Philip Callahan at firstname.lastname@example.org
You are free to: Share — copy and redistribute the material in any medium or format; Adapt — remix, transform, and build upon the material; in accord with:
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The intent of the CatTrax diagnostic software is to surpass simple test correctness scores and provide insight into test interpretation for improving quality of instruction and identifying areas of curricular confusion, as well as individuals who might be experiencing confusion. CatTrax may be operationally defined as having 1. data input, 2. analysis, and 3. interpreting results components.
The data input component requires that you either create or import the data that comprises a test answer key and each person's responses to each test item. A data file can be created using a spreadsheet. Alternatively, data can be imported as a comma-delimited (CSV) file, Scantron file, or Moodle output file.
The interpreting results component provides a table of participants and items, raw test scores, caution index, standard deviation, standard error of measurement, degree of difficulty, discrimination index, and item analysis distribution of responses. This information appears browsers save web page option, e.g. Firefox shows this option as Save Page As. Additionally, a click-to-download link results in comma-delimited (CSV) file format appears at the end of the CatTrax results. The CSV file contains a copy of your input data, regardless of the source, reformatted into CSV and the CatTrax results data also formatted into CSV.
Creating a CSV Data File. A spreadsheet program, such as Excel, can be used to create the input data file. The test answer key and the responses to each test item for each participant in the class are entered into an input data file. Once completed the data file is simply saved as a windows comma-delimited (CSV) file.
Spreadsheet data is arranged in rows and columns. With the exception of the first row of the spreadsheet, which will contain the answer key to the test, each row of the spreadsheet contains an participant's test record. CatTrax reserves the first three columns of the participant's record for participant specific information. The first column is reserved for an participant identifier (Id). This Id can be any alphanumeric combination. The second and third columns of the spreadsheet are again be used for any participant related information, such as participant last name and first name, or participant full name and instructor name. The fourth and subsequent columns are reserved for test item responses such that each column contains the participant's response to that item. Should an participant not respond to a test item, the spreadsheet cell may be left blank or a period (.) can be placed in the cell to serve as a visual marker. The first row of the spreadsheet is reserved for the answer key to the test. CatTrax uses this answer key to score the participant responses. Recalling that the first three columns are reserved, the answer key begins in column four. This results in your answer key being in perfect column alignment with the participants test item responses in the rows that follow. Note that the answer key entries can be anything, such as a single character A, to a full sentence. Ensure that the participant responses follow the same character formatting realizing that "The cat's meow" is not the same as "the cats meow" when the analysis occurs. Finally, save your spreadsheet in a windows CSV formatted file using the Save As option and selecting .CSV file.
Unscored Multiple-Choice Objective Test. CatTrax will score multiple-choice
objective tests and provide analysis. Analysis and scoring is accomplished by entering the answer key information
to the test items, and then the participants' unscored responses to the test items. For example, a
four item multiple-choice objective test with three participants might appear as the following:
|Unscored Objective Test|
|Sample four item multiple-choice objective test with three participants|
Scored Objective Test or Checklist Oriented Performance Test. This format can be used when test items have been previously scored dichotomously (correct or incorrect). CatTrax is able to perform a number of analyses including identifying response inconsistencies within the class. This test format is also applicable when using a checklist (rubric) where the items comprising the checklist are treated dichotomously, as either correct or incorrect.
The convention is to use a one (1) to indicate the item response was correct and a zero (0)
to indicate it was incorrect. participant test item responses would therefore be coded with either
zeros for incorrect responses or ones for correct responses. A period (.) can be used to indicate an
omitted response. If one is used to demark the correct response, then the answer key is simply a string
of ones comprising all of the test items. For example, a four item objective test for three participants
might appear as the following:
|Scored Objective Test|
|Sample four item scored objective test or checklist performance test with three participants|
Omitting an Item in the Answer Key. This situation could occur
when, upon reviewing the results, you decide you might want to omit a test item because it was
poorly written or problematic. Placing a (.) period in the item's response field in the answer
key will omit the item from analysis but retain the existing item numbering. For example, if you
decide you want to omit item 2, placing a period in that field will omit the item from further
analysis but show the results numbered as 1, 3, 4, etc. allowing the original item numbering to
|Omitting an Item|
|Sample four item test showing item 2 as omitted|
Scored Essay Test. One way to increase the reliability (reduce subjectivity) of the essay test is to outline key concepts to each test question before scoring the test. By itemizing key concepts in the essay outline, a number of elements or test items specifically addressing each key concept can be formed into a scoring rubric. For example, you could score an essay response based on: 1. correctly answers the central question asked; 2. provides supporting evidence; 3. presents information in an organized manner; and 4. uses appropriate grammar/spelling constructs. The rubric then permits one to more objectively score key essay elements for each participant as either correct (1) or incorrect (0). Or, if desired a period (.) can used to indicate an omitted response. Developing this rubric as a spreadsheet allows the spreadsheet to be used as an input data file for CatTrax. For example, after outlining an essay question, imagine four key concepts emerge. These key concepts become four test items. When scoring an participant's essay, the ordered four item check-sheet is used as an aid to record, on the spreadsheet, a score as to whether the participant correctly responded, incorrectly responded, or omitted the response to each of the four key concepts.
The answer key's function, in this instance, is to specifically
identify the character used to mark a correct response on the test. The convention is
to use a series of ones (1) corresponding to the items in the rubric to form the
answer key. Subsequent participant test item responses would then be coded with either zeros for
incorrect responses, ones for correct responses, and a period for an omission. For example,
a four-item test (assume the essay encompasses four key concepts) for three participants
might appear as the following:
|Sample four item scored essay test with three participants|
Scored Essay Test with Partial Credit. By itemizing the key concepts you wish to evaluate into an essay outline, these concepts become test items that in turn become a scoring rubric for the essay test. The rubric can be extended to address situations where, for example, partial credit might be more appropriate than simply scoring as correct or incorrect. For example, a rubric could permit each participant's response to be scored as correct (A), partial credit (C), or incorrect (0). And, if desired a period (.) could be used to indicate an omitted response. Thus, you score essay responses based on multiple characteristics and within each characteristic at multiple levels (A-correct, C-partial credit, 0-incorrect). Developing this rubric as a spreadsheet allows the spreadsheet to be used as an input data file for CatTrax. CatTrax will accept as many levels of partial credit as you wish to develop. While this example uses an essay test as an example, rubrics can be used for such situations as portfolio, surveys, or other evaluative tools having sliding response scores.
|Sample four item scored essay test with three participants|
Using a Scantron Data File. Scantron can produce a data file of participant data, but does not
produce an answer key in the data file. Given this constraint, CatTrax will read the Scantron data file, but will
also require a separate answer key file to run in conjunction with the Scantron data file. A spreadsheet program,
such as Excel, can be used to create the answer key file. Spreadsheet data is arranged in rows and columns
forming cells. The first row of the spreadsheet will contain the answer key to the test. The answer key
begins in the first column, corresponding to
the first item on the test, and subsequent items follow in order, one per cell, in a horizontal manner such
that each cell corresponds to a test item. Be sure to use the same conventions that the Scantron data
use, e.g. ABCDE or 12345, to indicate the correct responses in the key as the format might differ from
the conventions you used in writing the exam. Finally, save your spreadsheet in a CSV formatted file, with a
file name that will prompt you to remember the Scantron data file name, using the
Save As option and selecting .CSV file. For example, a six-item answer key might appear as the following:
|Six item answer key for Scantron|
Using a Moodle File. Moodle or Modular Object-Oriented Dynamic Learning Environment is a free open-source e-learning server-based environment. Moodle is also known as a Course Management System, Learning Management System, or Virtual Learning Environment. Because Moodle is often used as a course management system, it has a testing module within it that permits participants to test online. Further, these testing results can be easily exported from Moodle as test summary results in CSV file format. As a function of the export, the participants responses and correct answers can, and should be, included in the exported file. Hence, most of the work involves assuring that the participants have indeed tested online and then assuring that you have a test results export file that includes the participants responses and the correct responses. CatTrax will read the Moodle test summary file and provide the analysis in a format that can displays participant name and id, as well as Moodle's full item responses when performing item analysis.
Analysis requires selecting the data source, selecting desired results, and submitting.
1. Select the source of your input data as either CSV (default), Moodle, or Scantron. Select the location of your data file using the Data File Browse; you will want to "find" or path to the physical location where your data file is located, e.g. your computer desktop or folder. Remember, if you intend to use Scantron data, you MUST select the location of your data file using the Data File Browse AND select the location of your answer key file using the Key File - Scantron Only Browse.
|Date||File||Participants||Items||Test Mean||Test Standard Deviation|
CatTrax Test Summary. The summary statistics include the date the analysis occurred, input data file name, number of participants and items, test mean score in number of items, and test standard deviation (SD) in number of items. The standard deviation measures the extent the scores of the class tend to spread or cluster about the average or mean of the class. Tests with small standard deviations show the class is responding similarly or in a homogenous manner. Conversely, large standard deviations show greater diversity or more heterogeneous responses.
|24||Elbow||Grease||is responding inconsistently to the test items when compared to similarly ranked participants|
|39||Tom||Cat||is responding inconsistently to the test items when compared to similarly ranked participants|
|17||Roscoe||Rabbit||is in the lower 15% of test scores and responding inconsistently when compared to similarly ranked participants|
|9||Anny||Albatross||is in the lower 15% ranking of test scores|
|19||Buster||Bilgewater||is in the lower 15% of test scores and responding inconsistently when compared to similarly ranked participants|
Take Notice Participants. Take notice of the following participants listing provides a quick reference for identifying problematic participant related issues. Students who score in the lower 15% of the class, or, from a statistical perspective, score -2 standard deviations from the mean score, are identified. Additionally, students who exhibit a Caution Index (CI) value exceeding 30 are identified as responding inconsistently when compared to similarly ranked participants. The caution index, established by Sato, Harnish and Linn, assigns a numeric value based on the observed versus the expected consistency of test item responses. These cautionary participants display patterns of test responses that are atypical of their group receiving similar scores. A high CI occurs when, for example, a higher ranking participant answers easier items incorrectly; this is inconsistent with expectations as we would expect this participant to have answered these items correctly. Or, a high CI could occur when a lower ranking participant answers the hardest items correctly; again, this is inconsistent with expectations as we would expect this participant to have answered the hardest items incorrectly. Unusual participant response patterns and a resulting high caution index may be caused by guessing, confusion, unusual instructional or experiential history, copying, high anxiety, or carelessness. Questioning these low ranking and cautionary participants about potential confusion may improve the participants' potential for success in the class -- this is an opportunity to address a potentially at-risk learner at an early stage when remediation is viable.
|5||is confusing or poorly written|
|17||is confusing and not distinguishing between higher and lower ranking participants|
|9||is confusing or poorly written|
|12||is confusing or poorly written|
|6||is confusing and not distinguishing between higher and lower ranking participants|
|10||is difficult with more than half of the group responding incorrectly|
|15||is difficult with more than half of the group responding incorrectly|
|4||is difficult with more than half of the group responding incorrectly|
|7||is difficult with more than half of the group responding incorrectly|
|14||is difficult with more than half of the group responding incorrectly|
|13||is difficult, confusing, and not distinguishing between higher and lower ranking participants|
|20||is difficult with more than half of the group responding incorrectly|
Take Notice Test Items. Take notice of the following items listing provides a quick reference for identifying problematic test item related issues. Test items, identified as having a Caution Index (CI) exceeding 30, display atypical response patterns when, for example, an easier item is answered incorrectly by higher ranking participants. These items are identified as confusing or poorly written. High CI values occurring within item responses may be the result of a poorly written test item, ethnic, experiential, gender, or instructional bias or where a potential mismatch between instructional practices and content occurs and results in learner confusion. These cautionary items deserve special attention when reviewing the test items with the group. The discrimination index (Disc) can be used to determine how well a test item differentiates between high-scoring and low-scoring participants. A negative discrimination index shows the item was answered correctly more often by the low- scoring participants than by the high-scoring participants. These items are identified as not distinguishing between higher and lower ranking participants and should be examined to determine if the item is poorly written or if there is real confusion within the group regarding the learning concept the test item is attempting to measure. Additionally, test items that are particularly difficult are identified as difficult with more than half of the group responding incorrectly. Difficult items are not necessarily bad items and there are times when it is important to introduce items that are challenging. So, these items are identified to provide recognition and the possible need for group review.
|34||Fresh||Fish||0||19||1||18 - 20||1.94||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||0||1|
|24||Elbow||Grease||33||16||1.8||14 - 18||1||1||1||1||1||1||1||1||0||1||0||1||1||0||1||1||1||1||1||0||1|
|14||Deputy||Dog||20||15||2||13 - 17||0.68||1||1||1||1||1||1||1||1||1||1||0||0||1||0||1||0||1||0||1||1|
|36||Ferocious||Feline||12||15||2||13 - 17||0.68||1||1||1||1||1||1||1||1||1||1||1||0||0||0||1||1||0||1||1||0|
|3||Candy||Apple||18||13||2.2||11 - 15||0.06||1||1||1||1||1||1||1||0||1||0||1||1||0||1||0||1||1||0||0||0|
|39||Tom||Cat||33||13||2.2||11 - 15||0.06||1||1||1||0||0||1||1||0||1||1||1||1||1||1||1||0||0||1||0||0|
|40||Ground||Squirrel||14||12||2.3||10 - 14||-0.26||1||1||1||1||0||1||1||1||1||1||1||1||0||0||0||0||0||0||1||0|
|29||Waldo||Whale||20||12||2.3||10 - 14||-0.26||1||1||1||1||1||1||0||1||1||0||1||0||1||0||0||0||1||0||0||1|
|17||Roscoe||Rabbit||32||9||2.3||7 - 11||-1.2||1||0||1||1||1||0||1||1||0||0||0||0||1||1||0||0||0||0||1||0|
|9||Anny||Albatross||10||9||2.3||7 - 11||-1.2||1||1||1||1||1||0||1||1||.||1||0||0||1||.||.||.||0||0||.||.|
|19||Buster||Bilgewater||42||8||2.3||6 - 10||-1.51||0||1||0||0||1||1||0||1||0||1||0||1||0||0||0||1||0||1||0||0|
Participant and Item Table. In the context of the table, the participant summary information appears to the left and the test item information to the right of the participant information. Participants are rank ordered from highest scorer to lowest scorer on the test (Tot). Similarly, the test item information ranks the items from easiest to hardest.
For example, the first participant with the identification code (ID) of 34 correctly answered 19 items (Tot) of the items and was the highest ranking participant on this test. This participant's score of 19 is the measured score, but the participant's true ability likely falls within the RANGE of 18 to 20 as determined by adding and subtracting the participant's SEM of 1 with the measured score of 19. The standard error of measurement (SEM) is the standard deviation of the "errors" of measuring the test score for an individual. Thus, the SEM provides a range and serves as a reminder that a score should not be interpreted as an absolute. The Z or linear z-score converts the participants scores to a standard deviation format whereby positive z-scores are those above the mean, a zero (0) z-score is the mean, and negative z-scores are those falling below the mean. The participant Caution Index (CI) provides a measure to the consistency of the participant's items responses. Since the participant's CI of zero (0) is below the 30 that is used as a threshold, the individual appears to responding consistently within the context of this group. We can see this consistency by looking at the participant's item responses. A one (1) indicates the item was answered correctly, a zero (0) and red cell indicates an item was answered incorrectly, and a period (.) and red cell indicates the participant did not answer an item. Thus, participant 34 correctly answered all but item 13, one of the hardest items on the test. This response is consistent with the item difficulties and the participant's rank in the class.
The item summary information shows that item 11 was the easiest with 10 of the 11 participants in the class answering the item correctly (Tot). Item 11 shows a difficulty (Diff) of .91 or 91 percent of the class responding correctly. Item 20 was the most difficult item on the test with only 4 participants or 36 percent of the class responding correctly. Thus, the easiest items appear to the left with more difficult items appearing to the right.
A caution index is developed for both the items and participants (CI). The caution index, established by Sato, Harnish and Linn, assigns a numeric value based on the observed versus the expected consistency of test item responses. Participants 24, 39, 17, and 19 display a CI exceeding 30, the cutoff value, and are highlighted in red. These cautionary participants display patterns of responses that are atypical of their group receiving similar scores. A high CI occurs when, for example, a higher ranking participant answers easier items incorrectly; this is inconsistent with expectations as we would expect this participant to have answered these items correctly. Or, a high CI could occur when a lower ranking participant answers the hardest items correctly; again, this is inconsistent with expectations as we would expect this participant to have answered the hardest items incorrectly. Thus, to best interpret the CI, the participants' item responses must to viewed in the context of the table. For example, scanning the responses to participant 19 suggests that this person is likely guessing based on the inconsistency of correctness that ranges from easy to hard items. Unusual participant response patterns and a resulting high caution index may be caused by guessing, confusion, unusual instructional or experiential history, copying, high anxiety, or carelessness. Questioning cautionary participants about potential confusion may improve the participants' potential for success in the class -- this is an opportunity to address a potentially at-risk learner at an early stage when remediation is viable.
Items 5, 17, 9, 12, 6, and 13, display a CI exceeding 30. These cautionary items display atypical response patterns when, for example, an easier item is answered incorrectly by higher ranking participants. High CI values (exceeding 30) occurring within item responses may be the result of a poorly written test item, ethnic, experiential, gender, or instructional bias or where a potential mismatch between instructional practices and content occurs and results in learner confusion. These cautionary items deserve special attention when reviewing the test items with the group.
The discrimination index (Disc) can be used to determine how well a test item differentiates between high-scoring and low-scoring participants. The higher the positive value of the discrimination index the better the test item discriminates between high- scoring and low-scoring participants. A discrimination index of zero shows the item does not discriminate between high-scoring and low-scoring participants. And, a negative value shows the item was answered correctly more often by the low- scoring participants than by the high-scoring participants. Items showing negative values should be examined to determine if the item is poorly written or if there is real confusion within the group regarding the learning concept the test item is attempting to measure.
Item Analysis. CatTrax provides two types of item analysis. Although the statistics are performed in the same manner for each type of item analysis, the physical grouping of the participants changes. Split-half item analysis divides the testing group into two groups. Split-half is particularly useful when the group size is less than thirty. In the following example, showing item 11 on the test, the testing group of 11 participants has been divided into two groups of six participants in the higher group and five in the lower group. Reference the participant and Item table to see where a faint line bisects the list of participants and thus describes the boundary of the higher and lower ranking groups. The middle group is used when dividing the test group has resulted in several tied scores occurring at the division. These tied scores will then appear as the middle group and not be used into the item analysis calculations. Item 11 shows that all six participants in the higher group selected 3, the correct (*keyed), response and four of the lower group selected the correct response with one participant selecting response 1. This was an easy test item, with a Difficulty of 0.91 or 91 percent of the group scoring correctly, but it does weakly distinguish between the high and low ranking groups with a Discrimination value of 0.2. This scoring is consistent within the context of the group as the CI is 0.
|Diff: 0.91 Disc: 0.2 CI: 0|
Split-thirds item analysis divides the testing group into three groups. Split-thirds is applicable when the test group is larger than thirty. CatTrax attempts to divide the test group into three smaller equal sized groups consisting of high scoring, middle, and low scoring participants. But, should the borders describing these divisions between groups show tied scores on either side of the border, then tied scored participants will appear in the middle scoring group so as to make each group uniquely scoring from the other two groups. In the following example, showing item 19 on the test, the testing group of 33 participants has been divided into three groups. Split-thirds item analysis would be reflected in the participant and Item table where two faint lines would divide the list of participants to describe the boundaries of the higher, middle, and lower ranking groups. Item 19 shows that all 6 participants in the higher group selected 1, the correct (*keyed), response, 15 participants in the middle group selected the correct response with three participant selecting the distractor response 4, and six of the lower group selected the correct response with three participants selecting response 3. This was a relatively easy test item, with a Difficulty of 0.82 or 82 percent of the group scoring correctly, but it does distinguish between the high and low ranking groups with a Discrimination value of 0.33. This scoring is consistent within the context of the group as the CI caution index of 22 is less than the 30 cautionary cutoff.
|Diff: 0.82 Disc: 0.33 CI: 22|
Participant and Item Table. In the context of the table, the participant summary information appears to the left and the test item information to the right of the participant information. Participants are sorted alphabetically using the second bolded column or name field. The columns that display a light green refer to the participants test score correct and show the total score (Tot) correct with the participant's true ability likely falling within the approximate range as determined by adding and subtracting the participant's SEM. The test item information appearing to the right of the participant information is listed from first to last item. The test key appears as the first row in the item table followed by the examinees responses in the following rows. A gray cell indicates a correct response with the participants item response appearing within the cell. A red cell indicates an incorrect response with the participants item response appearing within the cell. A caution index is developed for both the items and participants (CI). The caution index assigns a numeric value based on the observed versus the expected consistency of test item responses. A CI exceeding 30, the cutoff value, and are highlighted in red.
|KEY||Date||File||Participants||Items||Test Mean||Test Standard Deviation|
|34||Fresh||Fish||0||19||1||18 - 20||1.99||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||1||0|
|24||Elbow||Grease||30||16||1.8||14 - 18||0.98||1||1||1||1||1||1||1||0||1||1||0||1||1||0||1||1||1||1||1||0|
|36||Ferocious||Feline||20||15||2||13 - 17||0.64||1||1||1||1||1||1||1||1||1||0||1||1||0||0||1||1||0||0||1||1|
|14||Deputy||Dog||16||15||2||13 - 17||0.64||1||1||1||1||1||1||1||1||1||1||1||0||0||1||1||0||1||0||0||1|
|3||Candy||Apple||21||14||2.1||12 - 16||0.31||1||1||1||1||1||1||1||0||1||1||0||1||1||0||0||1||1||1||0||0|
|39||Tom||Cat||38||13||2.2||11 - 15||-0.03||1||1||1||0||0||1||1||0||1||0||1||1||1||1||1||0||0||1||1||0|
|29||Waldo||Whale||15||12||2.3||10 - 14||-0.37||1||1||1||1||1||1||0||1||1||1||0||1||0||1||0||0||1||0||0||0|
|40||Ground||Squirrel||18||12||2.3||10 - 14||-0.37||1||1||1||1||0||1||1||1||1||0||1||1||1||0||0||0||0||0||0||1|
|17||Roscoe||Rabbit||35||10||2.3||8 - 12||-1.04||1||0||1||1||1||0||1||1||0||1||0||0||0||1||0||0||0||1||0||1|
|19||Buster||Bilgewater||47||9||2.3||7 - 11||-1.38||0||1||0||0||1||1||0||1||0||1||1||0||1||0||0||1||0||0||1||0|
|9||Anny||Albatross||11||9||2.3||7 - 11||-1.38||1||1||1||1||1||0||1||1||.||.||1||0||0||1||.||.||0||.||0||.|
Participant and Item Tables for Multiple Answer Keys. Adding additional answer keys to the primary answer key produces additional participant and item tables. Each successive table incorporates the prior answer keys in evaluating participant responses. For example, having two answer keys results in the first answer key being compared to the participant responses and production of the test results. When the next answer key is encountered, it is combined with the preceding key such that participant responses are compared across both keys. This means an participant getting a correct response in the first key is sustained as a correctly responding participant in the succeeding keys. Thus, the diagnostics are always showing cumulative results allowing better identification of extreme results.
CatTrax creates a CSV file that can be saved to your desktop. A link: Get your cattrax input data and results as a CSV file appears at the end of the results. Clicking the link will provide a prompt to open the file. Selecting Save File will save the file on your computer.
The CSV file contains your input data, regardless of the source, reformatted into CSV and the CatTrax results data also formatted into CSV. The CSV format allows a spreadsheet program to easily access the data for further analysis, graphing, and manipulation without the need for conversion. By providing the reformatted input data in CSV format, this data can be manipulated and resaved as a new CSV data file that may reanalyzed by CatTrax directly from your desktop. Data manipulation might include doing "what-if" item deletions or combining multiple classes that took the same test.