Steve on Image Processing with MATLAB

Image processing concepts, algorithms, and MATLAB

Chess and a little text file manipulation

Here's an image of a chess position:

And that's about as close to image processing as today's blog post will come. Because this post is really about text processing.

It seems like a lot of computational tasks in engineering and science involve manipulating data in text files. This weekend I had such a task, although I must admit that it had nothing to do with engineering or science. Even so, I thought the task would be a good illustration of some basic text processing techniques.

I have a text file, tactics.pgn, that is a database of chess tactics puzzles. Here are lines 2,759 through 2,784 from the file.

[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Diagram 211"]
[Black ""]
[Result "*"]
[EventDate "2006.05.20"]
[FEN "2b1R3/ppk4p/8/2q2p2/2Br2n1/2QP2N1/P4PPP/6K1 w - - 0 1"]
[SetUp "1"]
[SourceDate "2011.02.22"]
1.Rxc8+ Kxc8 2.Be6+ Kd8 3.Qxc5 *
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Diagram 212"]
[Black ""]
[Result "*"]
[FEN "3rr1k1/p1p2ppp/Q2b1q2/8/3Np3/4P3/PP3PPP/R1B2RK1 b - - 0 1"]
[SetUp "1"]
[SourceDate "2011.02.22"]
1...Bxh2+ 2.Kxh2 Qxa6 *

These lines store two chess positions. The first is the position I showed above, with White to move. Can you figure out how White wins by capturing the Bishop on c8 with his rook? (If you're interested in how the position is encoded in the text, see the Wikipedia article on Forsyth-Edwards Notation, or FEN.)

I wanted to shuffle the positions in this file randomly. None of the chess software programs that I have can do this, so I decided to tackle it with MATLAB. (Fun thing to do on a Saturday morning, right?)

OK, so here's the basic procedure:

  1. Read the entire file into MATLAB.
  2. Split the data into chunks, one chunk for each position.
  3. Rearrange the chunks randomly.
  4. Write out the rearranged chunks to a new text file.

There are a lot of different ways to do this. Here's what I came up with.

First, read in the entire file. The function fileread is just the ticket.

characters = fileread('tactics.pgn');
size(characters)
ans =

           1      106394

You can see that there are about 106,000 characters in the file. Let's split the data into lines using strsplit.

lines = strsplit(characters,'\n')';
size(lines)
ans =

        4838           1

There are about 4,800 lines of text. But how many positions are there? I'm going to find the starting line of each position by searching for the string "[Event " at the beginning a line. It's time for regexp.

idx = regexp(lines,'^[Event ');
idx(1:15)
ans = 

    [1]
    []
    []
    []
    []
    []
    []
    []
    []
    []
    []
    [1]
    []
    []
    []

This shows us that this string is found twice in the first 15 lines of the file. idx is a cell array, so I'll use cellfun and find to identify all the lines that contain the matching string. Each of these lines is the start of an entry for one chess position.

first_lines = find(~cellfun(@isempty,idx));
first_lines(1:3)
ans =

     1
    12
    23

So there are positions starting on lines 1, 12, and 23.

Next, I'll make a cell array such that that each cell contains all the lines for one position. To make the for-loop work, I'm going to add an "extra" value to the first_lines vector that points to a nonexistent line just past the of the file.

first_lines(end+1) = length(lines) + 1;
for k = 1:length(first_lines)-1
    positions{k} = lines(first_lines(k):first_lines(k+1)-1);
end

Let's take a look at what we have now.

size(positions)
ans =

     1   421

There are 421 positions in the file. For example:

positions{205}
ans = 

    '[Event "?"]'
    '[Site "?"]'
    '[Date "????.??.??"]'
    '[Round "?"]'
    '[White "Diagram 211"]'
    '[Black "Bain"]'
    '[Result "*"]'
    '[EventDate "2006.05.20"]'
    '[FEN "2b1R3/ppk4p/8/2q2p2/2Br2n1/2QP2N1/P4PPP/6K1 w - - 0 1"]'
    '[SetUp "1"]'
    '[SourceDate "2011.02.22"]'
    '1.Rxc8+ Kxc8 2.Be6+ Kd8 3.Qxc5 *'

Again, this is the position shown at top of this post.

Getting near the end, now. It's time to rearrange the positions. Before I do that, though, I'll shuffle the random number generator. I only do this so that if I repeat these steps in a new MATLAB session, I'll be sure to get a different result. After shuffling the random number generator using rng, a quick call to randperm randomly rearranges the positions.

rng shuffle
shuffled_positions = positions(randperm(length(positions)));

We've arrived at the last step: writing out the shuffled positions to a new file.

fid = fopen('shuffled_tactics.pgn','w');
for k = 1:length(shuffled_positions)
    position = shuffled_positions{k};
    for p = 1:length(position)
        fprintf(fid,'%s\n',position{p});
    end
end
fclose(fid);

And that's it!

Reading text, manipulating it in some useful way, and writing the results back out -- a common computing task accomplished using several basic MATLAB functions.




Published with MATLAB® R2013b

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.