MATLAB Spoken Here

Scoring in Cody 28

Posted by Ned Gulley,

As Helen wrote last week, Cody is a new MATLAB-based game that’s available on MATLAB Central.

We’ve been happily surprised at the amount of activity on Cody. As of Monday morning, 700 people have created 130 problems and provided more than 17,000 correct answers (not to mention a good many incorrect ones).

We’ve gotten a lot of enthusiastic feedback about Cody, which we’re using to build the next set of features. Naturally, some of that feedback came in the form of complaints, and the loudest of these complaints went something like this (I’m paraphrasing here):

“Your scoring system is terrible. Cody promotes bad programming practice by encouraging them to write short, cryptic programs.”

What’s this person talking about? Does he have a point?

Let’s consider how scoring works on Cody. Imagine that you are the official Cody Scorer. People are bringing you dozens of solutions every minute, and they’re all crowding around you asking “Is this one good? How is this one?” It doesn’t take you very long to tell which answers are correct. But you’d like to provide more information than that. You’d like to rank the correct answers somehow. How should you quickly and consistently rank all these answers? Subjective measures like elegance and readability, while appealing, are clearly off the table. If we had an Elegantometer, we would use it. But we don’t. Should we use cyclomatic complexity? Or the number of Code Analyzer messages? What metric would you use?

We decided on code size as the least bad option. It has a lot going for it. It’s simple, objective, consistent, and granular. By “granular,” we mean that we get a reasonably smooth distribution of code sizes for any given problem, as opposed to two or three giant, uniform clusters.

Now we have another complication. If we agree that code size is worth measuring, how shall we measure it? Lines of code? Bad idea. People will cram everything onto one line. Character count? No, because we don’t want to punish people for adding comments and giving their variables nice descriptive names. Again, the least bad option we came up with is to measure the size of the code after it’s been pre-chewed and made ready for digestion by the MATLAB interpreter. We count the number of nodes in the parse tree. How? There’s already a Cody problem on this topic. And if you want to try out the “official” scoring code yourself, you can find it on the File Exchange.

Any metric can be gamed. And we know that short code isn’t always better code, as you can see by looking at any obfuscated programming contest. But code is like medicine. If you can use less of it and still be healthy, you should. Which is to say, other things being equal (say, readability), shorter code generally is better. And size is certainly a useful dimension to refer to when reasoning about large numbers of programs. Here is a picture of 287 solutions to the Nearest Numbers problem.

The crucial point here is that Cody is just a game.

Over the last week we have learned that this game is good at motivating people to solve real problems, and that even as they scramble to come up with “winning” solutions in Cody, they are learning valuable programming practices. And most of all, people seem to be enjoying themselves.

So while some people are telling us that the scoring is bad, others tell us that they are learning a lot. What do you think?

28 CommentsOldest to Newest

Cody is a great game.

It’s true that the scoring system might promote bad programming practice on MATLAB (e.g. pre-allocating costs points), but what the heck?

Like some form of literature (Georges Perec comes into mind), Cody is about programming with a constraint. It does not matter that much what is the constraint, and playing this game, we should not worry about other constraints, e.g. pre-allocations.

Maybe you could do a Cody level 2: once you solved a problem in Cody 1, you can propose a solution in Cody 2, where the constraint is execution time. The execution time would be computed using the timeit function of Steve.

Best
jy

That’s a great idea. We definitely want to add other metrics (like execution time) in the future. Then the author of the problem will be able to decide on the constraint they like best.

Fantastic news! Can’t wait to see it!

You can have zillion of constraint ideas. Like

every line shorter than 10 characters
maximum of 3 pairs of brackets (limit number of function calls)
digits forbidden

Good luck, and again: fantastic game! I am learning so much doing it!

I really like cody. I don’t know how plausible this is but would it be possible to make a similar game for simulink?

I’ve had some fun with Cody and I think it’s interesting – and I’ve learnt some things from others people’s solutions. I can see that the scoring system is a reasonable compromise, and although it can get a bit silly if you go for the absolute minimum size (‘011′-‘0′ instead of [0 1 1] for example), on the whole it’s good to push towards concise code. It encourages you to explore the full capabilities of different functions, which can be worthwhile.

One suggestion: I’d like to see richer test sets. Some of them are a bit minimal, and some solvers produce answers which are simply lookup tables to get the test cases right. I can’t see the point of doing this, but it’s disappointing when you look at what should be a really neat answer and it’s actually just a few if-else clauses. In some cases, where there’s a reference solution, you could have dynamically generated test data, but even without this you could vary things more – for example there are only 4×4 boards as tests in the Conway’s game of life problem.

It’s a fair point. One of the things we’ve tried to do is make it easy to update the test suite on the fly, so when you see a weak test suite, please leave a comment and hopefully the author will come back and strengthen the tests. I know I’ve done this several times. With a rich enough test suite, any hard-coded answer is certain to be longer than a proper answer. Anyone who still enjoys making look-up tables after that has a strange hobby.

I like Cody, and since it doesn’t penalize for comments and descriptive variable names, I think it’s a good way to score. A second score for execution time would also be nice to see.

I am also interested in a Simulink Cody, although I understand that it would be much more labor intensive to create such a thing. I have several scripts written in Matlab that I cannot find a way to execute in Simulink (or embedded Matlab), so anything to help hone my Simulink skills is appreciated!

I amused myself for a few minutes doing this, but am a little frustrated by the policy of hiding the best scoring solution. How am I going to learn something new with that policy in place? If I knew a better solution, I would do it… Cody isn’t as valuable as it could be. It was almost a good idea to allow people to learn from others (possibly better) code. But no… Weird.

Steve’s right. Your frustration is understandable Sean. But one of the things we found in our pre-launch testing is that if it’s easy to see the best current answer, people tend give up quickly. So we make it a little hard, but not impossible, to see all the answers to a question. The price is to correctly answer another question.

And if that sounds like a technique designed to keep you playing, that’s because it is.

Ned:

I’m enjoying Cody – thanks for making it available to help me get my ‘contest fix’ when the contest isn’t running. A question and a concern though:
1. Is there something other than 10 points for a solution and 15 points for a submission that contributes to the total player score? Many of the leading players have total scores that are significantly more than they should be with just those 2 factors (e.g. #1 right now has a total of 3805
with 251 Solved, 26 Created. Using the 10/15 numbers, the total should be 2900).
2. Have you noticed the quirk in the mtree method of code sizes regarding string to num conversions? A lot of the leading solutions are using something like str2num(‘[1 2 3]’), which mtree evaluates as less nodes than the equivalent [1 2 3]. Another similar trick is doing something like ‘1 2 3′ – ‘0’. It seems to me it would be pretty easy to add a small penalty in the code sizing code to compensate for these types of tricks.

@Alan, regarding your str2num observation, yes, it’s true that these are games some people play to shorten their answers. We can protect against some of that, but not all of it. There will always be hacks on whatever scoring system you come up with. But I hope we will be able to stop that kind of behavior from being the norm.

Thanks for the responses. Would it be possible to have an official Cody Blog or message thread linked to from the main Cody page like you do with the Contest? It would help you disseminate more general information and get feedback in a centralized place, instead of being buried in various other blogs (where the Cody messages will eventually be pushed way far down on the stack.)

Dear Ned,

I’m not a Cody fan and I’m one of the persons who claimed, that Cody promotes bad programming practice. See:
http://www.mathworks.com/matlabcentral/answers/27340-what-do-you-think-of-cody-new-service-for-matlab-central

Of course, Cody is a game. Therefore it should not be taken too seriously. As for every game some unique, reproducible and comprehensible rules are required. I agree that any metric for a computer code will have drawbacks.

But Cody is hosted on the MathWorks server and it concerns Matlab. Therefore I’ve expected that the metric prefers fundamental Matlab styles, e.g. numeric matrices, pre-allocation, toolboxes. Instead of this, strings processed by EVAL -hidden inside STR2NUM- are preferred, pre-allocation gets a penalty and the toolboxes are not available.
You wrote “shorter code generally is better”. The value of this sentence critically depends on the definition of “generally”. A missing pre-allocation is a knock-out criterion in a real-world M-code, even if the code is shorter. Although “str2num(‘[1 2 3]’)” is valid Matlab syntax, I feel that “1:3″ or “[1,2,3]” is smarter, is _more_ Matlabish.
On one hand it could be helpful to consider the code size of the called subfunctions also – then STR2NUM would get a greater penalty than HORZCAT for the “[1,2,3]” example. On the other hand I’d prefer the dull old metrics runtime and memory consumption. While the first is easy to measure, the later is hard to define uniquely. But a short enhancement of the underlying mxAlloc-functions could be used to measure the memory usage.

Some questions are really a helpful challenge, e.g. I’ve learned something from the bullseye matrix, http://www.mathworks.com/matlabcentral/cody/problems/18-bullseye-matrix . Others like the sum of integers from 1 to 2^n are less impressing, because the code-size or the solutions is nearly inversely proportional to the efficiency, http://www.mathworks.com/matlabcentral/cody/problems/189-sum-all-integers-from-1-to-2-n .

But Cody is a game. Nobody said, that a game must reward efficient solutions only. E.g. roulette is not efficient (for the players) and for check it would be more efficient, if the king could move like the queen. The rules rules. And therefore Cody is fine and creates joy.
I prefer fast code and in consequence I decide to play anythinge else, and make my Matlab experiments at Answers or the FileExchange.

I find it a little insulting to suggest `And if you want to see the “official” scoring code for yourself, you can find it on the File Exchange.’ when the code appears to be ultimately wrapped up in a proprietary mex file.

The file on the FEX calls the undocument mtree mfile. I am okay with that. The problem is that most of the work in mtree appears to be done by mtreemex, for which the source code does not appear to be available.

The MATLAB parser, which is doing all the work here, isn’t written in MATLAB code and is not available for inspection. I see your point that the language is misleading. I should have said “if you want to try running the official score code for yourself.” I’ll make the change.

I’ve been having a blast with Cody for the last couple of weeks. I’ve even learned quite a bit that has improved my real work.

That said, it’s disappointing to unlock the solutions to a problem, hoping to glean a nugget or two, and find pages full of regexp, eval, gallery, etc. as the lowest points. I’d really like to be able to filter these “solutions” out of my view.

Ned, I am just joining Cody and it’s showing me all of the great facets of the MATLAB community. One concern I have is that the spirit of the game is being continually violated. I’m not arguing for full vs. sparse code, or tricky vs. straight-forward.

I have noticed, however, that while use of EVAL is blocked [1], very many problems have a leading score of 12, which is the node depth of the following program:

function ans = cody(n)
feval(@eval,’anythingAtAllOfPossiblyGreatComplexityOrNodeDepth(n);’);
end

Being as you have ruled out the use of EVAL in solutions, would it not be consistent and easy to also rule out FEVAL or @EVAL? I feel like these are small measures to preserve those aspects of the metric that you are interested in, say, a legitimate measure of code size.

[1] “Error: You may not use the command EVAL in your code”

Thanks for the note David! I agree it’s a problem when people go to absurd lengths to shrink their code size. We are planning to do as you suggest, outlawing FEVALs that evaluate EVALs. Of course there are plenty of other clever ways to game the system, but at least we can get some of the more obvious ones.

That’s great news, Ned. I think that it will really help the essential greatness of this project flourish. Question: what will happen to those solutions that have already been submitted that use FEVALs to evaluate EVALs? If it’s possible, I think it’d be nice to allow people to submit the contents of their EVAL (or have that be submitted automatically) as opposed to, say, wiping the record, which is another option.

Also, is there a way to get email notifications of comments on MathWorks Blogs?

You’ll be glad to know that we have indeed shut down STR2FUNC. No new entries can use it. Having said that, we haven’t pulled down all the old solutions that use STR2FUNC, but once they get rescored, they will fail.

It’s a cat and mouse game. There are plenty of hacks out there, but we’d like to remove some of the most obvious and painful ones.

These postings are the author's and don't necessarily represent the opinions of MathWorks.