I have discussed a number of times the importance of having good wordlists available to assemble crosswords. In fact, if your wordlists are good, the software you’re using can manage them well, and you pick the answer words with how to clue them in mind, you will almost always create a good crossword. A poor wordlist and ill-chosen answer words will always result in a substandard crossword.
I realised this from the start and set about collecting and creating wordlists to make my job a bit easier. My age and IT background gave me a distinct advantage when it came to using software to create crosswords and to massage wordlists. I had enough knowledge of the programming language Visual Basic to write programs to do those things most people couldn’t without professional help. I’ve currently got about 50 wordlists of different sorts, some general, some very specific and some topical. In 2001 I came up with a plan to create the ultimate set of wordlists, ones that had no redundancy, no wastage – all preselected and approved. This would save me much in grid-filling time and give me a level of confidence that would be unprecedented. The plan was relatively simple:
- Create a set of files, all uniquely identified, from the wordlists I had available from many sources and covering many different areas of language.
- Sort these all together alphabetically into one huge file.
- Write a program to read through the file one word at a time, telling me where it came from, and giving me the option of including it in verified master lists. Selection would depend on the suitability of each word for the various crosswords and word puzzles I was supplying.
As I was designing the process, I realised I had an opportunity to go much further, as each word would require a significant amount of investigation (lookup) that should be exploited. The plan expanded to include the creation of a set of theme wordlists (for specialist magazines), and the bolstering of clue databases (by adding more clues) that I had been building up for a number of years. Effectively I was sourcing every word and term I could, analysing them, and splitting them into discrete lists. One word could end up in multiple lists or be discarded completely.
Unlike many of my ideas that don’t get past the preliminary design stage, this one actually came to fruition, at least it got off the ground. The master input list, which consisted of multiple instances for every word and term in the English-speaking world, consisted of about 500,000 entries. This was daunting, but I considered this a long-term project that I could just chip away at. I started strongly and toiled away at it for months, but there was one major thing working against me. I had no way to easily look up each word in all the source references: I had multiple programs open at once and tried to sequentially run through them, making the processing of each word dreadfully slow.
I eventually abandoned the project after getting to about “ad…”. That doesn’t seem very far, but it was far enough to make me realise I’d been too ambitious. As my paid work increased, there was little time to indulge in pet projects like this.
Here is a design shot of the program I wrote:
My brief thoughts of resurrecting this thankfully quickly dissipated.