Walter Roberson, frequent contributor to the MATLAB newsgroup, posed the following question to me to use as the starter idea of a blog.
Q: Suppose there is a multiple-choice quiz, and for each question, one of the responses scores 0 points, one scores 3 points, one scores 5 points, one scores 8 points, and one scores 10 points. If the quiz has 4 questions, and assuming that each taker answers all of the questions, then which totals per taker are not possible? For example, it would not be possible to finish the quiz with a total score of 2.
If the quiz had 7 questions?
Can you generalize the code so that the number of questions can be varied by varying a single assignment?
Contents
Initialize Test Parameters
Instead of Walter's numbers, I am choosing four scores and seven test questions for this demonstration.
N = 7; Points = [0 3 5 8];
Walter's Solution
Walter also gave me an elegant solution to show. He places the point values into a cell array, and then replicates that cell by the number of test questions.
GridVals = repmat({Points},N,1);Next Walter grids the Points in N-dimensions, taking advantage of comma-separated lists for function arguments.
[Grid{1:N}] = ndgrid(GridVals{:});Spread out the cell array along the N+1-dimension and sum the values along that dimension to get all possible PointTotals.
PointTotals = sum(cat(N+1,Grid{:}),N+1);Finally, bin the PointTotals and remove bins that don't contain zeros. These remaining bins correspond to scores that are impossible to attain.
PointHist = hist(PointTotals(:),max(Points)*N+1); ImpossibleScores = find(~PointHist)-1;
The correct answer is
ImpossibleScores
ImpossibleScores =
1 2 4 7 49 52 54 55
Loren's Solution
My first solution takes advantage of nchoosek to calculate all possible combinations. After summing each one, I look for the unique values, and then remove those from the list of all possible scores.
allvals = repmat(Points,1,N); C = nchoosek(allvals,N); scores = unique(sum(C,2)); nonScores1 = 1:(max(Points)*N); nonScores1(scores(2:end)) = [];
Loren's Modified Solution
nchoosek can be slow when the number of elements in each combination is large, and I have definitely run out of memory using nchoosek as well sometimes. To offset that, I modified my solution to select unique combinations when possible to reduce the extra computation time (and memory). I won't recompute the first two lines of the algorithm here.
allvals = repmat(Points,1,N); C = nchoosek(allvals,N);
Next, winnow out the duplicate rows before proceeding.
lCpre = length(C);
C = unique(C,'rows');
lCpost = length(C);I reduced the number of cases from lCpre to lcPost, a substantial savings in this case.
[lCpre lCpost]
ans =
1184040 16384
Make scores unique as well.
scores = sum(C,2); scores = unique(scores);
Finally, get rid of the valid scores.
nonScores2 = 1:(max(Points)*N); nonScores2(scores(2:end)) = [];
Bobby's Solution
Bobby Cheng came up with a very different solution which works only if the minimum score is zero. Otherwise, the algorithm would need to be modified.
f = @(a)unique(bsxfun(@plus,a(:),Points)); allPossibleScores = Points; % with only one question for i = 2:N allPossibleScores = f(allPossibleScores); end nonScores3 = setdiff(N*min(Points):N*max(Points),allPossibleScores);
Lucio's First Solution
Lucio Cetto came along next and contributed two solutions. The first one is similar to Bobby's, but Lucio propogates logical values instead of doubles (with integer values).
Initialize the outputs so the possible test scores are set to true for the initial point values for the first question.
a = false(max(Points),1); a(nonzeros(Points)) = true;
Iterate over the number of questions remaining, and augment the score array.
for i = 1:N-1 a(bsxfun(@plus,find(a),Points)) = true; end
The locations of the false values in a are the test scores that no test achieves.
nonScores4 = find(~a)';
Lucio's Second Solution
I will leave this one as an exercise for you to figure out. It is very dense, but works quite well.
maxscorep1 = N*max(Points)+1; A = toeplitz(sparse(1,1,1,1,maxscorep1),sparse(1,Points+1,1,1,maxscorep1)); A = A^N; nonScores5 = find(A(1,:)==0)-1;
Compare Answers
Let's now compare all the methods to be sure they get the same answer.
allequal = ... isequal(ImpossibleScores, nonScores1, nonScores2, ... nonScores3, nonScores4, nonScores5)
allequal =
1
Other Methods?
I didn't time the various methods, but trust me, the ones I contributed are the slowest and the most memory-intensive. Walter's is fully vectorized as well as mine, but can be a bit intensive in memory (but still nothing like mine, as far as I can tell!). Bobby's and Lucio's are the most efficient memory-wise, since they iterate over test questions, and toss out impossible combinations as they go. Do you have other techniques to share? Please do so here.
Get
the MATLAB code
Published with MATLAB® 7.7



Here is the generating function approach:
>> p([0 3 5 8] + 1) = 1; >> q = 1; >> for k=1:7 q = conv(q,p); end >> find(~q) - 1 ans = 1 2 4 7 49 52 54 55Nice one, Tom!
here’s a more prosaic approach
a=zeros(57,1); p=[0 3 5 8]; for i=0:((4^7)-1) c=dec2base(i,4,7); j=sum(p(double(c)-47))+1; a(j)=a(j)+1; end find(a==0)-1I haven’t timed this either, but here’s the obligatory recursive version:
function validgrades = uniquegrades(numQuestionsRemaining, currentValues, pointValues) % UNIQUEGRADES Generate the possible grades for a test % % validgrades = uniquegrades(numQuestionsRemaining, currentValues, % pointValues) calculates the grades it is possible to achieve given that % there are numQuestionsRemaining questions left to score, that the list of % grades that can be obtained by the previous questions is stored in % currentValues, and that the point values for each question are stored in % pointValues. % % Example: % If there are 4 questions on a test and with each question you can % score either 0 points (for an incorrect answer) or 2 points (for a % correct answer), the possible grades you can score on the test are: % % vg = uniquegrades(4, 0, [0 2]) % % To compute the grades it is _not_ possible to achieve, you can use: % % cannotAchieve = setdiff(min(vg):max(vg), vg) if numQuestionsRemaining == 0 validgrades = unique(currentValues(:)).'; else thisQuestionPoints = bsxfun(@plus, currentValues(:), pointValues(:).'); validgrades = uniquegrades(numQuestionsRemaining-1, ... thisQuestionPoints, pointValues); endWalter’s solution is O(K^N), where K is the number of choices per question and N is the number of questions. With a slight modification, we can get a O(N^K) solution, a substantial improvement if N is assumed to vary widely.
Let k1 = score for choice 1, k2 = score for choice 2, etc, and a1 = # of times choice 1 was chosen, a2 = # of times choice 2 was chosen, etc.
Essentially, we want to all possible values of k1*a1 + k2*a2 + k3*a3 + k4*a4 that satisfy the constraint a1 + a2 + a3 + a4 = NThe constraint can be enumerated and evaluated using NDGRID, as in Walter’s solution. Note that the NDGRID output has dimension K (the # of choices per question) rather than N (the number of questions).
% setup Points = [0 3 5 8]; K = numel(Points); N = 7; % solve [A{1:K}] = ndgrid(0:N); combinations = cell2mat(cellfun(@(x)x(:),A,'unif',false)); indices = find(sum(combinations,2) == N); possibleSums = unique(combinations(indices,:) * Points'); maximumValue = max(Points)*N; setdiff(0:maximumValue,possibleSums)Answers for various N:
N=7 1 2 4 7 49 52 54 55 N=15 1 2 4 7 113 116 118 119 N=30 1 2 4 7 233 236 238 239 N=40 1 2 4 7 313 316 318 319I got out-of-memory errors for N=60. It would be interesting to see where the various algorithms break down. With Lucio’s second solution, I went up to N=100 without much of a problem, and I could probably go much higher.
Here are time results for the generating function solution for N = 100 to 1000,by 100’s, using a slightly bigger score set. The ‘/5′ operation after convolution is to avoid overflow in q for N > 400. The number of convolutions could be cut from N to log N in the usual way, which might speed things up.
result = {}; for N = 100:100:1000 tic; p = []; p([0 5 11 17 23 29] + 1) = 1; q = 1; for k=1:N q = conv(p,q)/5; end; result{end+1} = find(~q) - 1; toc; end Elapsed time is 0.042640 seconds. Elapsed time is 0.157314 seconds. Elapsed time is 0.292813 seconds. Elapsed time is 0.500169 seconds. Elapsed time is 0.852585 seconds. Elapsed time is 1.383100 seconds. Elapsed time is 2.101764 seconds. Elapsed time is 2.867327 seconds. Elapsed time is 3.846593 seconds. Elapsed time is 4.823939 seconds.The problem with any method that uses exhaustive searches via ndgrid is the method will always fail for large problems. In these cases a recursive methodology is often better. (Really, this is just a variation on my partitions code on the FEX to find the partitions of an integer.) Consider the feasibleTestScores function.
function Feasible = feasibleTestScores(scores,problemscores,nprobs) % uses recursive methodology to identify which test scores are achievable % % scores - a list of scores to test for feasibility. % All elements of scores must be non-negative integers % % problemscores - the possible (all integers) scores for each % problem in the test. % % nprobs - the number of problems in the overall test % % Feasible = logical vector, true where a score was feasible scoresize = size(scores); scores = scores(:); problemscores = sort(problemscores(:),1,'descend'); % maximum possible score to worry about maxscore = max(scores); if maxscore<min(problemscores) Feasible = false(scoresize); return elseif (maxscore == 0) && ismember(problemscores,0) Feasible = true(scoresize); return elseif ((nprobs*problemscores(1)) < min(scores)) Feasible = false(scoresize); return elseif (nprobs==0) Feasible = true(scores==0); return end % check each potential number of scores for % the largest (first element of) problemscores. % Note that we don't need to worry about % problemscores(1) being zero here. n = min(nprobs,floor(maxscore/problemscores(1))); Feasible = false(size(scores)); for i = 0:n Feasible = Feasible | ((i*problemscores(1)) == scores); k = ((i*problemscores(1)) < scores); if any(k) Feasible(k) = Feasible(k) | feasibleTestScores( ... scores(k)-i*problemscores(1),problemscores(2:end), ... nprobs-i); end end Feasible = reshape(Feasible,scoresize);Test it out:
s = 0:20; b = feasibleTestScores(s,[0 3 5 10],2);s(b) ans = 0 3 5 6 8 10 13 15 20Thus, for only two problems, we can achieve 9 of the potential scores 0:20.
What happens when we have a test with 50 problems? Do you want to generate a grid with 4^50 elements in it? NO!!! How long does the recursive solution take?
Of the potential scores tested, how many were possible?
So, could someone explain what the toeplitz function represents in this situation?
–DA
Daniel:
The key point of my solution is to use matrix exponentiation as the propagation algorithm, for this we need to start with an adjacency matrix. A non-zero entry in the (i,j) row-column indicates that the total cumulative score can change from (i-1) to (j-1) after considering the n-th question. Note that we need to make the score=index+1 because we need to have a score of zero also.
If you had the Bioinformatics Toolbox you could have propagated using “Breadth First Search”:
Here is another solution. It is based on conversion of the numbers with different bases. Unfotunatley, I don’t think it scales well for larger problems. Anyway, here it goes
numQuests = 7;
numAnswrs = 4;
Points = [0 3 5 8];
base = numQuests + 1;
minNum = numQuests;
maxNum = numQuests * base^(numAnswrs-1);
charShft = 48;
allAnswrs = …
arrayfun(@(x) double(dec2base(x,base, numAnswrs)) - charShft, …
(minNum : maxNum).’, ‘UniformOutput’, false);
indx = cellfun(@(x) sum(x) == numQuests, allAnswrs);
Feasible = unique(cat(1, allAnswrs{indx}) * Points(:));
notFeasible = setdiff(min(Points)*numQuests : max(Points)*numQuests, Feasible);
I’ve just noticed that my solution can be greatly improved by replacing this line
allAnswrs = …
arrayfun(@(x) double(dec2base(x,base, numAnswrs)) - charShft, …
(minNum : maxNum).’, ‘UniformOutput’, false);
with
allAnswrs = …
arrayfun(@(x) double(dec2base(x,base, numAnswrs)) - charShft, …
(minNum : numQuests : maxNum).’, ‘UniformOutput’, false);
A couple of observations got me to thinking about this question in general. One is that for a certain number of questions N , usually less than 6, some Points vectors return an impossible-scores vector which is the same
length (L) for any M > N. The second observation is that, once this N is found, the last (L - k) elements in the impossible-scores vector are multiples of the largest value in Points added to those same (L - k) values given for N. Thus once this N is found, the problem reduces to a simple multiplication and addition. This led me to add to Lucio’s first solution specifically for large N. These additions don’t seem to add too much time when there is no savings to be had (e.g. Points = [0 2], or small number of questions), but when it does work there can be a tremendous gain in speed, for M > N.
for example:
Points = [0 3 5 8];
tic, lucios(10000,Points),toc
tic,matts(10000,Points),toc
Even for N = 100, the augmented solution is 20 times faster on my machine.
Anyway, interesting problem!
Ilya Rozenfeld,
Your solution will fail for N>=10 because of this:
dec2base(10,11,4)
ans =
000A
We would like 000(10) but of course that is not possible.
Interesting problem.
Tom, you could also try q(q>0)=1 instead of the /5 step (avoids having q(1) go to zero).
Here’s a one-liner solution that works ok for small problems:
>> find(round(ifft(fft([full(sparse(1,Points+1,1,1,max(Points)+1)),zeros(1,max(Points)*(N-1))]).^N))==0)-1
ans =
1 2 4 7 49 52 54 55