Making Up Tests
It’s harder to make up tests than to take them
|
| [ Recent photo ] |
Ken Regan has been busy these last few days working on making a final exam, giving the exam, and now grading the exam.
Today Ken and I want to talk about tests.
I also have a test for you. You can jump right to our test of knowledge. Do not, please, use any search tools, especially Google.
Test Theory
Ken recently made up a final exam. We both have had to make countless tests over the years. I was never trained in how to make a good test. Nor how to make a test at all. I am still puzzled about how to do it.
Avi Wigderson once told me that Michael Rabin only asked questions on his exams that he had stated already in lectures. Is there a theory of what makes a proper test? I do not know any.
Suppose that before the exam we lectured and the students learned and
: Here
are true statements and
are false statements. A rote type question might be:
Is the statement
true or false?
Here would be in either
or
. This type of question is purely a memory problem.
A more difficult test would have questions like:
Is
true or false
Here would be equivalent to some
that is in
or in
. The equivalence between
and
would require only the application of a few simple logical rules. This is much harder for students. In the limit we could have
and
far apart, even could have it an open problem if
and
are equivalent.
Our Test
No looking at Google please.
Question 1: We all know that Dick Karp created the P=NP question. What is Dick’s middle name?
- Mark
- Manning
- Mathew
- Richard
Question 2: This year is the fifty-first anniversary of STOC. Where was the first one held?
- Marina del Rey, CA
- Massapequa, NY
- Boston, MA
- Chicago, IL
Question 3: Which of these did not happen in 1969?
- The first automatic teller machine in the United States is installed.
- The $500 bills are officially removed from circulation.
- The first The Limited store opens, in San Francisco.
- The New York Mets win the World Series.
Question 4: The first STOC conference program committee included:
- No women.
- A person named Mike.
- A person named Pat.
- All the above.
Question 5: How do you tell if you are a “theoretical computer scientist”?
- You wear flip-flops in the winter.
- You regularly attend STOC.
- You wear glasses.
- You cannot program a computer.
Question 6: “Cooking” a chess problem means:
-
Showing it is in a family of NP-complete problems on
boards.
- Showing it has two or more solutions (or no solutions).
- Showing it cannot be solved by Steve Cook.
- Showing that it cannot be solved by the best-move strategy.
Question 7: The other theory conference is called FOCS. Which of these is true about this conference:
- The name was selected by a person named Edward.
- It has never had parallel sessions.
- It was originally called Symposium on Switching Circuit Theory and Logical Design.
- The artwork for the proceedings cover is by an artist named Smith, who never published in the conference.
Question 8: What do the STOC conferences have in common with last night’s final episode of Game of Thrones?
- Both had flying horses and whistling pigs.
- No dragons were harmed during either.
- Both have left many big questions unanswered.
- Both are explained by the “Prisoner’s Dilemma” game solution.
Question 9: STOC has been held on each of these islands except:
- Long Island, NY.
- Puerto Rico.
- Crete.
- Vancouver Island.
Question 10: What term appears in the titles of three award-winning STOC/FOCS papers since 2016?
- Quantum.
- Quadratic/subquadratic.
- Quadtree.
- Quasi/quasipolynomial.
Open Problems
Answers: Note 1a means question 1 and answer 1 and so on. This is a wordpress issue.



I’m a bit surprised by Question 5. I know many people I consider as Theoretical Computer Scientists who do not regularly attend STOC, especially from this small part of the world that one can define as the complementary set of the US!
Dear B.:
It is hard to make up tests. You are right that the answer is not clear. Perhaps we could change it o has submitted papers to STOC? Or some other test?
Best
I can tell you how the LSAT does it incredibly well: they take a bunch of crap questions and analyze the answers statistically to find the best questions I have ever seen. One of these days I’ll explain how I found that out.
Dear rickfleischer:
The LSAT…is that a kind of SAT problem? Logic satisfaction? Just kidding. If you can test the questions then you should do better. Would love to know how you know this.
Best
You could have fixed your wordpress issue by fixing your problems.
Dear L:
Yes. I hoped that HTML of wordpress would support the list type I needed. Then to change it was just too much. Hope the key still worked for you.
Did you get a good score?
Best
Harold Shapiro had a rule for questions. The answer was either “0”, or “1”, or on the blackboard. Loved it when the student answers 2? He would blow up magnificently.
As a complexity theory blog I was sure you were going to go the other way and note that under appropriate hardness assumptions it’s easier to pose questions than to answer them. Or at least I would guess this to be the case but I imagine one would need to explicitly diagonalize.
Practically speaking. A large part of the difficulty in making up tests is our ambivalence about what a test should actually measure. If we were comfortable creating tests to measure pure mastery of the subject (e.g. ability to apply what they have learned in future domains) or to create a test purely to measure student effort in learning the material it would probably be easier to make such tests. However, as it is I suspect that lots of the difficulty we face in designing tests is a result of our inclination not to want to directly face these kinds of questions and make specific judgements. Therefore, rather than just asking x% questions on the material that reflect likely future applications and y% that reflect course effort we try and design questions that let us avoid explicitly making such judgements.
Dear Peter Gerdes:
A thoughtful comment. From a complexity point of view it is actually hard to make up questions. For factoring we think random examples N=pq say are all about the same in difficulty provided one is a bit careful about selecting p and q primes. But there is no proof that this is true. For SAT for example it is not easy to generate hard examples. The whole theory of transitions for random SAT comes into play here.
What I really meant was how do you ask questions that have good properties. You would like to have students not all get low score or too high a score. You would like the test to actually test something. I could go on.
Best
My first Oxford Maths Entrance paper (1964) broke the mould for the style of entrance papers, and this was the only reason I got offered a place. The style of question totally suited me as it did not favour those who had read the right books.
15 scenarios were each followed by 3 sentences. One had to prove the sentence correct or demonstrate that it was false, usually by counterexample. One’s style of answer was I guess direct evidence of the quality of one’s thinking. Amazingly, the applicant-average was ‘six correct’ – way lower than I expected.
I may be biased [and probably is: ed.] but I recommend this style of question.
Conjecture: Geometry Qs of the form ‘Prove that …’ can more easily be answered by assuming what is to be proved and seeing what that tells you. Then you can say “I should have known that anyway” an reverse-engineer a proof.