Darkness 2011-04-13 12:00:00 UTC
 
Twilight 2011-04-14 12:00:00 UTC
 
Daylight 2011-04-15 12:00:00 UTC
 
Finish 2011-04-20 12:00:00 UTC

Crossword - Rules

Arrow_down  Download Sample Code

Crossword Puzzle Contest

  • Rules in Japanese
  •  

    The Problem

    small crossword

    You are the new crossword puzzle intern for a major math software newspaper, The New York Multiply ("All the News That's Bits and Ints"). Your editor has given you a list of words to use in this week's puzzle. He knows you can't possibly use all the words on his list, so he gave each word a weight and told you to "do the best you can". You happen to know that he gave the same challenge to another intern, so you'd better do a good job if you want to keep your position. Maybe some software would be helpful here...

    The Details

    Note that these rules talk about words for the sake of explanation, but in practice the "words" will be vectors of integers.

    Restated, the challenge goes like this: Given a list of acceptable words (a "dictionary") and a grid size, return a grid that maximizes the use of high-value words while minimizing the use of bogus words (those that do not appear in the dictionary).

    The scoring software looks at your grid and generates a list of all the words that appear in it by scanning the rows from left to right and the columns from top to bottom. Each word that comes off the puzzle grid is either removed from the dictionary (if it is an acceptable word) or added to the bogus word list (if it is not acceptable). In this way, if a word appears only once in the dictionary, then any subsequent time it appears in the grid it is treated as an illegal word. For example, suppose your dictionary contains the words HELP, SINE, MAP, MATH, and PLOT with the weights as shown on the left below. You might construct the grid shown on the right.

    The scoring algorithm is going to read three words going across the rows: MAP, HELP, and MATH. Going down we read not only the approved word PLOT, but also two illegal words: MH and AE. The word SINE, while acceptable, didn't make it onto the board.

    You pay for all the unused words left in your dictionary, so a lower score is better. Once a word is used from the list, it is removed from the list of acceptable words. Since SINE is unused and has a weight of 5, that becomes your cost. But we're not done yet. You still have to pay a penalty for all your illegal words. The weight associated with these illegal words changes with each puzzle. Let's say that in this case the penalty is 3. Once this penalty is taken into account, your total cost is now the sum of your dictionary, 5, plus the sum of your bogus words, 2*3 = 6, for a total of 11.

    It's easy to see now that adding the word MAP on the top row was a bad idea. It cost more (6 penalty points) than it was worth (4 dictionary points). On the other hand, if the penalty per word had been 1 instead of 3, then MAP would have been a smart play. Better still would have been this grid, which achieves a total cost of 4, since MATH and PLOT are the only two words left in the dictionary, and no illegal words are manufactured.

    Other Considerations

    1. Indicate spaces with zeros
    2. Legal words always have a length of at least two. Put another way, the dictionary will never contain any one-letter words.
    3. Each word in the dictionary may be used no more than the number of times it appears in the dictionary. For example, a word that appears exactly once in the dictionary is legal exactly once in the grid. Thereafter it is considered bogus each time it appears.
    4. Words do not need to be connected to a single contiguous group (that is, they don't have to exhibit Scrabble-style connectivity).
    5. A single letter dangling by itself pays a penalty as two illegal words, one across and one down.
    6. ASCII characters and English words are used here for illustration, but the actual test suite will contain vectors of positive integers. These vectors will be unconstrained by the limits of either ASCII characters or the English language.

    By the Numbers

    For complete clarity, in actual practice the above problem would look more like this.

    >> words
        [1x4 double]    [1x4 double]    [1x3 double]    [1x4 double]    [1x4 double]
    
    >> words{1}
        72    69    76    80
    >> words{2}
        83    73    78    69
    >> words{3}
        77    65    80
    >> words{4}
        77    65    84    72
    >> words{5}
        80    76    79    84
    
    >> penalty
         3
    
    >> weights
        10     5     4     2     2
    
    And the grid you return might look like this.
        77    65    80     0
        72    69    76    80
         0     0    79     0
        77    65    84    72
    

    Syntax

    You will write the function solver.m which will be called by the test harness. It must have this signature:

    board = solver(words, weights, n, penalty)
    
    • words is a cell array of acceptable words. In practice these will be numbers.
    • weights is a vector the same size as words.
    • n gives the size of the output grid.
    • penalty is the cost you pay for each illegal word in your grid.
    • board should be an n-by-n matrix of numbers. Unused squares should be set to zero.

    Results

    For each puzzle, the following score will be calculated:

    result = sum(Unused Word Score) + penalty * (Number of illegal words)
    

    Scoring

    The overall score for your entry is a combination of several factors. The first two in this list are the most important. The other two have a milder effect.
    1. Your average result across all the problems
    2. How fast your code runs
    3. The cyclomatic complexity of your code (more information below)
    4. The node count of your code (more information below)

    Since each of these is to be minimized, the lowest overall score at the end of the contest wins.

    Cyclomatic Complexity

    Cyclomatic complexity, also known as McCabe complexity, is a measure of the number of independent paths through a program's source code. Typically, as this number gets higher, it becomes more difficult to understand what's happening in a program. This makes it harder to test, modify, and refactor.

    Since a file can contain multiple functions, the complexity for any given file is defined as the MAXIMUM complexity of any functions contained in it. A good practice is to keep the complexity for each function below 10, so for this contest your overall score will increase according to the complexity in excess of 10. So there is no complexity penalty for submissions in which all functions have a complexity of 10 or less.

    You can measure the cyclomatic (or McCabe) complexity of any function in MATLAB using the "cyc" switch for mlint. Try this, for example:

    >> mlint -cyc magic.m
    

    Node Count

    This is a rough measure of how long your code is, but it will not penalize you for comments and variable name length.

    t = mtree(filename,'-file');
    length(t.nodesize)
    

    Time and Size Limits

    Your entry will time out and be disqualified if it takes more than 180 seconds (three minutes). To keep the queue moving smoothly, you are limited to submitting no more than five files every 15 minutes, for an average of three minutes per file. If we find you are creating multiple accounts in order to get around this limitation, we may disqualify all your entries.

    The code is limited in size by the database architecture. The column in our MySQL database that stores the M-code is of type text, which is limited of 65535 characters. Submissions longer than this limit will fail.

    Developing Your Entry

    The files you need to get started on the contest are included in a ZIP-file in a ZIP-file available on the MATLAB Central File Exchange. If you download and uncompress this ZIP-file, you will have all the files you need to get started. You will be writing the file solver.m using the signature described in the Syntax section above. To test this function with the test suite in the ZIP-file, use runcontest.m.

    >> runcontest
    

    About the Contest

    The contest is divided into three segments. Most of the week will run as usual, with free sharing of code, but the first two days of the contest will hide some of the information about each entry.

    • Darkness. During the first day of the contest, you can't see the code or scores for any of the entries.
    • Twilight. During the second day of the contest, you can see scores but no code.
    • Daylight. For the remainder of the contest you can see scores and code for all entries.

    Starting and ending times are based on noon in Natick, Massachusetts, but the web pages will show all times in Coordinated Universal Time (UTC). For the exact timing of these three phases, refer to the box at the top right corner of the page.

    Collaboration and Editing Existing Entries

    Once an entry has been submitted, it cannot be changed. However, any entry can be viewed, edited, and resubmitted as a new entry. You are free to view and copy any entry in the queue. If your modification of an existing entry improves its score, then you are the "author" for the purpose of determining the winners of this contest. We encourage you to examine and optimize existing entries.

    We also encourage you to discuss your solutions and strategies with others. You can do this by posting to the thread that we've started in our newsreader.

    Fine Print

    The allowable functions are those contained in the basic MATLAB package available in $MATLAB/toolbox/matlab, where $MATLAB is the root MATLAB directory. Functions from other toolboxes will not be available. Entries will be tested against the latest version of MATLAB.

    The following are prohibited:

    • MEX-files
    • Java commands or object creation
    • eval, feval, inline, function handles, etc.
    • Shell escape such as !, dos, unix
    • Handle Graphics commands
    • ActiveX commands
    • File I/O commands
    • Debugging commands
    • Printing commands
    • Simulink commands
    • Benchmark commands such as tic, toc, flops, clock, pause
    • error, clear, persistent

    Hacking

    Entries that compromise the contest machinery are not allowed. Out of consideration for everyone participating in the contest, we ask that you not abuse the system.

    Extraction of puzzles in the test suite by manipulating the score, runtime, or error conditions is also forbidden. Tuning the entry to the contest test suite via multiple entries is permitted, but we ask that you not overwhelm the queue.

    FAQ

    Check our FAQ for answers to frequently asked questions about the contest.

    About named visibility periods

    Contests are divided into segments where some or all of the scores and code may be hidden for some users. Here are the segments for this contest:

    • Darkness - You can't see the code or scores for any of the entries.
    • Twilight - You can see scores but no code.
    • Daylight - You can see scores and code for all entries.
    • Finish - Contest end time.