Loren on the Art of MATLAB

December 2nd, 2008

Possible Test Scores

Walter Roberson, frequent contributor to the MATLAB newsgroup, posed the following question to me to use as the starter idea of a blog.

Q: Suppose there is a multiple-choice quiz, and for each question, one of the responses scores 0 points, one scores 3 points, one scores 5 points, one scores 8 points, and one scores 10 points. If the quiz has 4 questions, and assuming that each taker answers all of the questions, then which totals per taker are not possible? For example, it would not be possible to finish the quiz with a total score of 2.

If the quiz had 7 questions?

Can you generalize the code so that the number of questions can be varied by varying a single assignment?

Contents

Initialize Test Parameters

Instead of Walter's numbers, I am choosing four scores and seven test questions for this demonstration.

N = 7;
Points = [0 3 5 8];

Walter's Solution

Walter also gave me an elegant solution to show. He places the point values into a cell array, and then replicates that cell by the number of test questions.

GridVals = repmat({Points},N,1);

Next Walter grids the Points in N-dimensions, taking advantage of comma-separated lists for function arguments.

[Grid{1:N}] = ndgrid(GridVals{:});

Spread out the cell array along the N+1-dimension and sum the values along that dimension to get all possible PointTotals.

PointTotals = sum(cat(N+1,Grid{:}),N+1);

Finally, bin the PointTotals and remove bins that don't contain zeros. These remaining bins correspond to scores that are impossible to attain.

PointHist = hist(PointTotals(:),max(Points)*N+1);
ImpossibleScores = find(~PointHist)-1;

The correct answer is

ImpossibleScores
ImpossibleScores =
     1     2     4     7    49    52    54    55

Loren's Solution

My first solution takes advantage of nchoosek to calculate all possible combinations. After summing each one, I look for the unique values, and then remove those from the list of all possible scores.

allvals = repmat(Points,1,N);
C = nchoosek(allvals,N);
scores = unique(sum(C,2));
nonScores1 = 1:(max(Points)*N);
nonScores1(scores(2:end)) = [];

Loren's Modified Solution

nchoosek can be slow when the number of elements in each combination is large, and I have definitely run out of memory using nchoosek as well sometimes. To offset that, I modified my solution to select unique combinations when possible to reduce the extra computation time (and memory). I won't recompute the first two lines of the algorithm here.

 allvals = repmat(Points,1,N);
 C = nchoosek(allvals,N);

Next, winnow out the duplicate rows before proceeding.

lCpre = length(C);
C = unique(C,'rows');
lCpost = length(C);

I reduced the number of cases from lCpre to lcPost, a substantial savings in this case.

[lCpre lCpost]
ans =
     1184040       16384

Make scores unique as well.

scores = sum(C,2);
scores = unique(scores);

Finally, get rid of the valid scores.

nonScores2 = 1:(max(Points)*N);
nonScores2(scores(2:end)) = [];

Bobby's Solution

Bobby Cheng came up with a very different solution which works only if the minimum score is zero. Otherwise, the algorithm would need to be modified.

f = @(a)unique(bsxfun(@plus,a(:),Points));
allPossibleScores = Points; % with only one question
for i = 2:N
    allPossibleScores = f(allPossibleScores);
end
nonScores3 = setdiff(N*min(Points):N*max(Points),allPossibleScores);

Lucio's First Solution

Lucio Cetto came along next and contributed two solutions. The first one is similar to Bobby's, but Lucio propogates logical values instead of doubles (with integer values).

Initialize the outputs so the possible test scores are set to true for the initial point values for the first question.

a = false(max(Points),1);
a(nonzeros(Points)) = true;

Iterate over the number of questions remaining, and augment the score array.

for i = 1:N-1
    a(bsxfun(@plus,find(a),Points)) = true;
end

The locations of the false values in a are the test scores that no test achieves.

nonScores4 = find(~a)';

Lucio's Second Solution

I will leave this one as an exercise for you to figure out. It is very dense, but works quite well.

maxscorep1 = N*max(Points)+1;
A = toeplitz(sparse(1,1,1,1,maxscorep1),sparse(1,Points+1,1,1,maxscorep1));
A = A^N;
nonScores5 = find(A(1,:)==0)-1;

Compare Answers

Let's now compare all the methods to be sure they get the same answer.

allequal = ...
    isequal(ImpossibleScores, nonScores1, nonScores2, ...
    nonScores3, nonScores4, nonScores5)
allequal =
     1

Other Methods?

I didn't time the various methods, but trust me, the ones I contributed are the slowest and the most memory-intensive. Walter's is fully vectorized as well as mine, but can be a bit intensive in memory (but still nothing like mine, as far as I can tell!). Bobby's and Lucio's are the most efficient memory-wise, since they iterate over test questions, and toss out impossible combinations as they go. Do you have other techniques to share? Please do so here.


Get the MATLAB code

Published with MATLAB® 7.7

14 Responses to “Possible Test Scores”

  1. Tom Richardson replied on :

    Here is the generating function approach:

    >> p([0 3 5 8] + 1) = 1;
    >> q = 1;
    >> for k=1:7 q = conv(q,p); end
    >> find(~q) - 1
    ans =
         1     2     4     7    49    52    54    55
    
  2. Loren replied on :

    Nice one, Tom!

  3. mark replied on :

    here’s a more prosaic approach

    a=zeros(57,1);
    
    p=[0 3 5 8];
    
    for i=0:((4^7)-1)
        c=dec2base(i,4,7);
        j=sum(p(double(c)-47))+1;
        a(j)=a(j)+1;
    end 
    
    find(a==0)-1
    
  4. Steve L replied on :

    I haven’t timed this either, but here’s the obligatory recursive version:

    function validgrades = uniquegrades(numQuestionsRemaining, currentValues, pointValues)
    % UNIQUEGRADES Generate the possible grades for a test
    %
    % validgrades = uniquegrades(numQuestionsRemaining, currentValues,
    % pointValues) calculates the grades it is possible to achieve given that
    % there are numQuestionsRemaining questions left to score, that the list of
    % grades that can be obtained by the previous questions is stored in
    % currentValues, and that the point values for each question are stored in
    % pointValues.
    %
    % Example:
    %     If there are 4 questions on a test and with each question you can
    %     score either 0 points (for an incorrect answer) or 2 points (for a
    %     correct answer), the possible grades you can score on the test are:
    %
    %         vg = uniquegrades(4, 0, [0 2])
    %
    %     To compute the grades it is _not_ possible to achieve, you can use:
    %
    %         cannotAchieve = setdiff(min(vg):max(vg), vg)
    
    if numQuestionsRemaining == 0
        validgrades = unique(currentValues(:)).';
    else
        thisQuestionPoints = bsxfun(@plus, currentValues(:), pointValues(:).');
        validgrades = uniquegrades(numQuestionsRemaining-1, ...
            thisQuestionPoints, pointValues);
    end
    
  5. Gautam Vallabha replied on :

    Walter’s solution is O(K^N), where K is the number of choices per question and N is the number of questions. With a slight modification, we can get a O(N^K) solution, a substantial improvement if N is assumed to vary widely.

    Let k1 = score for choice 1, k2 = score for choice 2, etc, and a1 = # of times choice 1 was chosen, a2 = # of times choice 2 was chosen, etc.

     Essentially, we want to all possible values of
       k1*a1 + k2*a2 + k3*a3 + k4*a4
     that satisfy the constraint
        a1 + a2 + a3 + a4 = N
    

    The constraint can be enumerated and evaluated using NDGRID, as in Walter’s solution. Note that the NDGRID output has dimension K (the # of choices per question) rather than N (the number of questions).

    % setup
    Points = [0 3 5 8];
    K = numel(Points);
    N = 7;
    % solve
    [A{1:K}] = ndgrid(0:N);
    combinations = cell2mat(cellfun(@(x)x(:),A,'unif',false));
    indices = find(sum(combinations,2) == N);
    possibleSums = unique(combinations(indices,:) * Points');
    maximumValue = max(Points)*N;
    setdiff(0:maximumValue,possibleSums)
    

    Answers for various N:

    N=7
         1     2     4     7    49    52    54    55
    N=15
         1     2     4     7   113   116   118   119
    N=30
         1     2     4     7   233   236   238   239
    N=40
         1     2     4     7   313   316   318   319
    

    I got out-of-memory errors for N=60. It would be interesting to see where the various algorithms break down. With Lucio’s second solution, I went up to N=100 without much of a problem, and I could probably go much higher.

  6. Tom Richardson replied on :

    Here are time results for the generating function solution for N = 100 to 1000,by 100’s, using a slightly bigger score set. The ‘/5′ operation after convolution is to avoid overflow in q for N > 400. The number of convolutions could be cut from N to log N in the usual way, which might speed things up.

    result = {};
    for N = 100:100:1000
      tic;
      p = [];
      p([0 5 11 17 23 29] + 1) = 1;
      q = 1;
      for k=1:N
        q = conv(p,q)/5;
      end;
      result{end+1} = find(~q) - 1;
      toc;
    end
    Elapsed time is 0.042640 seconds.
    Elapsed time is 0.157314 seconds.
    Elapsed time is 0.292813 seconds.
    Elapsed time is 0.500169 seconds.
    Elapsed time is 0.852585 seconds.
    Elapsed time is 1.383100 seconds.
    Elapsed time is 2.101764 seconds.
    Elapsed time is 2.867327 seconds.
    Elapsed time is 3.846593 seconds.
    Elapsed time is 4.823939 seconds.
    
  7. John DErrico replied on :

    The problem with any method that uses exhaustive searches via ndgrid is the method will always fail for large problems. In these cases a recursive methodology is often better. (Really, this is just a variation on my partitions code on the FEX to find the partitions of an integer.) Consider the feasibleTestScores function.

    function Feasible = feasibleTestScores(scores,problemscores,nprobs)
    % uses recursive methodology to identify which test scores are achievable
    %
    % scores - a list of scores to test for feasibility.
    %          All elements of scores must be non-negative integers
    %
    % problemscores - the possible (all integers) scores for each
    %          problem in the test.
    %
    % nprobs - the number of problems in the overall test
    %
    % Feasible = logical vector, true where a score was feasible
    
    scoresize = size(scores);
    scores = scores(:);
    
    problemscores = sort(problemscores(:),1,'descend');
    
    % maximum possible score to worry about
    maxscore = max(scores);
    
    if maxscore<min(problemscores)
      Feasible = false(scoresize);
      return
    elseif (maxscore == 0) && ismember(problemscores,0)
      Feasible = true(scoresize);
      return
    elseif ((nprobs*problemscores(1)) < min(scores))
      Feasible = false(scoresize);
      return
    elseif (nprobs==0)
      Feasible = true(scores==0);
      return
    end
    
    % check each potential number of scores for
    % the largest (first element of) problemscores.
    % Note that we don't need to worry about
    % problemscores(1) being zero here.
    n = min(nprobs,floor(maxscore/problemscores(1)));
    Feasible = false(size(scores));
    for i = 0:n
      Feasible = Feasible | ((i*problemscores(1)) == scores);
      k = ((i*problemscores(1)) < scores);
      if any(k)
        Feasible(k) = Feasible(k) | feasibleTestScores( ...
          scores(k)-i*problemscores(1),problemscores(2:end), ...
          nprobs-i);
      end
    end
    Feasible = reshape(Feasible,scoresize);
    

    Test it out:

    s = 0:20;
    b = feasibleTestScores(s,[0 3 5 10],2);s(b)
    ans =
         0     3     5     6     8    10    13    15    20
    

    Thus, for only two problems, we can achieve 9 of the potential scores 0:20.

    What happens when we have a test with 50 problems? Do you want to generate a grid with 4^50 elements in it? NO!!! How long does the recursive solution take?

    tic,b = feasibleTestScores(0:500,[0 3 5 10],50);toc
    Elapsed time is 1.282112 seconds.
    

    Of the potential scores tested, how many were possible?

    sum(b)
    ans =
       485
    
  8. Daniel Armyr replied on :

    So, could someone explain what the toeplitz function represents in this situation?

    –DA

  9. Lucio replied on :

    Daniel:

    The key point of my solution is to use matrix exponentiation as the propagation algorithm, for this we need to start with an adjacency matrix. A non-zero entry in the (i,j) row-column indicates that the total cumulative score can change from (i-1) to (j-1) after considering the n-th question. Note that we need to make the score=index+1 because we need to have a score of zero also.

    If you had the Bioinformatics Toolbox you could have propagated using “Breadth First Search”:

    maxscorep1 = N*max(Points)+1;
    A = toeplitz(sparse(1,1,1,1,maxscorep1),sparse(1,Points+1,1,1,maxscorep1));
    setdiff(1:maxscorep1,graphtraverse(A,1,’Method’,'BFS’,'Depth’,N))-1

    I used the TOEPLITZ function to create the required adjacency matrix. On purpose I used sparse indices, so the Toeplitz matrix is also sparse, as the particular adjacency matrix has many zeros, this allows to scale up the problem. A full matrix would have looked:

    A = toeplitz([1;zeros(maxscorep1-1,1)],[1 0 0 1 0 1 0 0 1 zeros(1,maxscorep1-9)])
    

    I need to admit that the solution with CONV previously posted is very elegant, perhaps it is possible to express a convolution in terms of a toeplitz matrix.

    Lucio

  10. Ilya Rozenfeld replied on :

    Here is another solution. It is based on conversion of the numbers with different bases. Unfotunatley, I don’t think it scales well for larger problems. Anyway, here it goes

    numQuests = 7;
    numAnswrs = 4;
    Points = [0 3 5 8];

    base = numQuests + 1;
    minNum = numQuests;
    maxNum = numQuests * base^(numAnswrs-1);
    charShft = 48;

    allAnswrs = …
    arrayfun(@(x) double(dec2base(x,base, numAnswrs)) - charShft, …
    (minNum : maxNum).’, ‘UniformOutput’, false);

    indx = cellfun(@(x) sum(x) == numQuests, allAnswrs);

    Feasible = unique(cat(1, allAnswrs{indx}) * Points(:));
    notFeasible = setdiff(min(Points)*numQuests : max(Points)*numQuests, Feasible);

  11. Ilya Rozenfeld replied on :

    I’ve just noticed that my solution can be greatly improved by replacing this line

    allAnswrs = …
    arrayfun(@(x) double(dec2base(x,base, numAnswrs)) - charShft, …
    (minNum : maxNum).’, ‘UniformOutput’, false);

    with

    allAnswrs = …
    arrayfun(@(x) double(dec2base(x,base, numAnswrs)) - charShft, …
    (minNum : numQuests : maxNum).’, ‘UniformOutput’, false);

  12. matt fig replied on :

    A couple of observations got me to thinking about this question in general. One is that for a certain number of questions N , usually less than 6, some Points vectors return an impossible-scores vector which is the same
    length (L) for any M > N. The second observation is that, once this N is found, the last (L - k) elements in the impossible-scores vector are multiples of the largest value in Points added to those same (L - k) values given for N. Thus once this N is found, the problem reduces to a simple multiplication and addition. This led me to add to Lucio’s first solution specifically for large N. These additions don’t seem to add too much time when there is no savings to be had (e.g. Points = [0 2], or small number of questions), but when it does work there can be a tremendous gain in speed, for M > N.

    for example:

    Points = [0 3 5 8];

    tic, lucios(10000,Points),toc

    tic,matts(10000,Points),toc

    Even for N = 100, the augmented solution is 20 times faster on my machine.
    Anyway, interesting problem!

     
    
    function nonScores4 = matts(N,Points)
    % Assumes Points is sorted and has only unique values.  This is an
    % extention of Lucio's solution.
    
    a = false(max(Points),1);
    a(nonzeros(Points)) = true;
    t1 = [1 NaN];% These two are used in the augmented code below.
    cnt = 1;%
    
    for i = 1:N-1
        a(bsxfun(@plus,find(a),Points)) = true;
        t2 = t1;  %  Start augmented code.
        t1 = find(~a)'; % t1,t2 Successive vectors for successive #'s of q's.
        if ~isempty(t1) && ~isempty(t2) && numel(t1)==numel(t2) && t2(end)==(t1(end)-Points(end))
            cnt = cnt + 1;
            if cnt == 3
                vct = t1~=t2;
                t2(vct) = t2(vct) + (N-i)* Points(end);
                nonScores4 = t2;
                return % We can skip the rest of the iterations.
            end
        end  % End augmented code.
    end
    
    nonScores4 = find(~a)';
    
     
  13. matt fig replied on :

    Ilya Rozenfeld,

    Your solution will fail for N>=10 because of this:

    dec2base(10,11,4)

    ans =

    000A

    We would like 000(10) but of course that is not possible.

  14. Ben replied on :

    Interesting problem.

    Tom, you could also try q(q>0)=1 instead of the /5 step (avoids having q(1) go to zero).

    Here’s a one-liner solution that works ok for small problems:

    >> find(round(ifft(fft([full(sparse(1,Points+1,1,1,max(Points)+1)),zeros(1,max(Points)*(N-1))]).^N))==0)-1

    ans =

    1 2 4 7 49 52 54 55

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Loren: Hi Ghazanfar- Please look at the help for the function hist (and perhaps histc). They will do what you want...
  • Ghazanfar Ali: Hi Miss Loren I am in need of an algo to count cosecutive duplicate values in a one dimensional...
  • OysterEngineer: Clearly this is a complex topic & these pair of blogs show that The MathWorks are masters of the...
  • Tim Davis: I’ve often been puzzled about how sub-matrix-assignmen t works for full and sparse arrays: clear A =...
  • Loren: Paul- There *are* issues depending on the sizes of ii and jj. And it’s a bit complicated, but really...
  • Loren: Bob- You don’t say what happens when you run your code. Can you please explain more. It looks like you...
  • Loren: Kishore- It is not clear to me what you are trying to actually achieve. If you want to concatenate the 4...
  • Kishore: sorry, in the previous code mat2cell(c,[19 121],[19 134],[19 84],[19 107])
  • Kishore: Hi Loren, Why does the following not work? data_classwise = [19x121 double] [19x134 double] [19x84 double]...
  • Paul Jackson: Loren, Are there any aspects of empty matrices that may be tricky when they are used as indices into...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.