Automating the Creation of Cryptic Crosswords with Alteryx

by James Charnley

Background

I normally have a favourite puzzle on the go, and in 2023 that’s been the cryptic crossword. Once thought to be too difficult, I’ve finally started to get the hang of them with some help getting started from a wizard/friend/colleague, Finn.

As is seemingly becoming tradition with everything I do these days, I began to think of ways I could integrate Alteryx with my new crosswording hobby. Since unfortunately there is no crossword clue solving tool currently available, I instead began to plan how I could use Alteryx to create the crosswords that people could (in theory) solve.


The Plan

  • Build a workflow that creates a crossword grid, which follows the basic rules of grid building.

  • Find a way to fit random words from the dictionary into the grid.

  • Find a way to automate the assignment of clues for those given words.

While this wasn’t the most extensive plan ever created, it did quickly provide me with a starting point for what felt like a pretty intimidating project.


The Grid

If you haven’t done many crosswords or had some wild reason to look up the grid creation process, you might not realise that there are guidelines on what is allowed or not in a crossword grid. Here are the ones I wanted to follow:

  • The pattern must be symmetrical. That is, to say, if you turn it upside down it will look the same as it does the correct way up.

  • Words must be at least three letters long.

  • Every white square must somehow connect to every other white square. There should be no parts of the crossword that are isolated from the rest of the grid.

  • Words should be separated by at least one row. While this isn’t a strict rule, it follows the Times style and should make assigning words much easier when done randomly.

  • Try to minimise the number of black squares where possible.

With the guidelines in place, I began to make a thirteen row grid one row at a time, using a boolean 1 or 0 to dictate whether or not a cell should be black or white. The first row has a fairly simple logic:

The following rows need to use the starting rows with some additional logic to randomise the production of the grid while still making sure every white square connects. Perhaps the most difficult complication when making the grid this way is that the seventh row needs to be symmetrical within itself, but also be guaranteed to connect to both the sixth and eighth rows. This means that it’s quite commonly an eleven or thirteen letter word. After unioning all of the rows together, the grid looks something like the following:

While I’ve tried to make that sound as simple as possible, the workflow ended up a bit of a mess:

The following is how a grid looks visualised with a Table tool, with conditional column rules that fill in 0 cells as black (this is a different grid to above):

Once the grid is made, the squares that signify the start of a word need to be tagged and numbered. Essentially, across words start with a cell that has a white square (1) to the right but not the left, and down words start with a cell that has a white square (1) below it but not above.

With the correct tagging, our grid now fairly closely resembles something you might see in a newspaper!


Placing Words

I started by downloading a dictionary data set that would provide the word bank for the crossword. The following logic then seemed to make the most sense:

  • Start by placing the longest words in the grid because the more letters are in the word, the fewer words of that length are in the dictionary.

  • If there is a tie, prioritise the words that already have letters in.


For example in the above grid, the macro would start with 10 across, placing a random 13 letter word such as mitochondrion:

The process then iterates so that the letters of mitochondrion are in the grid for other relevant words (such as 1 or 2 down in the above grid) and the macro picks the next word.

The final word placing iterative macro

Iterating with a new first word

Here is where things start to get complicated and I need to start using the word iterate a lot. What we have so far is an iterative macro that will continue to place words in our grid until either the entire grid has been filled, or it has to fit a letter pattern that doesn’t have any matching words in the dictionary. If the latter is true, then we need to empty the grid and iterate the word placement process, starting with a new first word to try again.

That means we need to wrap our iterative macro in an iterative macro that will check if all the words have been placed, and if not, then iterate.

The final grid completion iterative macro

After some testing with a lot of different grids, I found that either the process could be completed in a few iterations, or the macro wasn’t able to find any word combinations at all. This tended to primarily happen if there were too many 11 or 13 letter words in the random grid. After trying and failing to change the grid creation process to make fewer long words every time but still fit the grid making criteria, my next option was to turn the grid creation workflow into yet another iterative macro.

This would give the word placing macro a few tries to place words into the created grid, but if it fails each time, then start again from the beginning by creating an entirely new grid. This lead to another scary-looking iterative macro:

The final grid creation iterative macro

Creating the Clues

My first thought with clue creation was to find a website that provides answers to crossword clues, which I was allowed to web scrape, and reverse engineer it so I could provide the word and then use a previously used clue. This approach had a couple of problems:

  • I could find no such website that fit these criteria.

  • Even if there was, my crossword often uses some quite unusual words that have never been in a crossword before.

This led me to explore what ended up being a much better option - ChatGPT.

The ChatGPT API can be called from Alteryx with one row per prompt, so I just need to create an output from my final grid which has one row per answer and build the prompt dynamically. I could then build a batch macro like so:

The clues often don’t make complete sense, but they do usually follow the rule of having the beginning or end being a synonym of the answer. For example the clue it gave on first run for the word ‘Circumstances’, which I built the macro with, was:

Situations following the sun’s path (11)

The clues always feel, to me at least, that they’re really close to being good clues but always just fall a bit short. Regardless, it’s a lot better than any other options, and definitely a fun use case for ChatGPT.

The Final Product

The final workflow therefore starts by leveraging the three nested iterative macros to find a valid crossword, sends the answers into the batch macro for the clues, and uses the reporting palette to produce a solvable printout and/or a solution:

I’m going to leave this blog here at the end of the Alteryx section, but a talented colleague of mine, Patrick Deans, has turned the grid output into a solvable Tableau dashboard that I hope to embed on the website soon. Fingers crossed he’ll write a blog on how he did it!

Previous
Previous

Using Control Containers in Alteryx Designer

Next
Next

Working with dates in Alteryx Designer