We will be administering curriculum-based assessments via Qualtrics to measure learning gains in high school students pre- instruction vs. post-instruction. In the past, we have tried a number of strategies to get matching pre- post-data with a minimum of human-hours required to find missing or incorrectly entered matches. We have not found a satisfactory way to force validated matching options. We typically have to match anywhere from a few hundred, to about a thousand pre- and post- assessments. As you can imagine, doing this manually is extremely time intensive. Here are the things we have tried so far:
- we have assigned ID numbers to students that they needed to remember for their post test. Results = not great - many forgotten numbers, numbers entered incorrectly, students entering made up numbers, etc.
- We have assigned ID numbers based on patterns (first two letters of the teacher’s last name, class period as a two-digit number, and school-based student ID number). Results = also not great - many issues with things as small as capitalization vs no capitalization, spacing, mismatching, etc.
- We have tried creating ID numbers that are only numeric and 6 digits long so that we can at least force that type of response - they consist of a two digit teacher ID assigned by the researchers and given to the teacher, a two digit class period (i.e., 1st period = 01), and a two digit student number assigned by the teacher. Results = still a lot of mismatched data
- We tried a forced validation version of #3 to try to prevent students from making things up and not entering information correctly. In the forced validation we require students to enter their two digit teacher ID, and then enter it again for validation against their first entry. We then request their class period (two digits), and have them enter it again for validation against their first entry. We do the same thing for their student id - they enter it, and then they must enter it again and it has to match what they put in. Results = slightly better than attempts 1, 2, and 3 because it makes students have to put in something that they will remember - since they have to enter it twice, and because it asks for the pieces of the ID number one section at a time. However, there are still students making up things (like entering all 01,01,01). We have rules that require numbers only, and for teacher ID numbers they have to be two digits between 01 and 49 (we won’t assign a number higher than 49 for any single study), for class periods they have to be between 00 and 09 (no school that we’ve come across yet has had more than 9 class periods), and for student ID numbers, they need to be between 01 and 60 (we haven’t had a class period with more than 60 students in it yet).
What we would love is a way to be able to have some sort of forced validation of a pre-test and post-test that still allows the data to be anonymous to us, the researchers, but requires a match with a previously entered value on a pre-test before a student can take a post-test. We cannot have access to names, email addresses, or other PII of high school students. In the standardized assessment world, test tickets are generated based on PII and assigned to students so that their test scores can be matched to their ID information. In the research world, we cannot do that.
I’m interested in ideas and suggestions for how we can more easily and accurately match pre- and post- assessment data among hundreds of test takers with some sort of forced validation. Any ideas?