At the time of writing, the game Wordle is taking the world by storm. Quite rightly, it's genius and so is the guy who invented it.
Here's one I did earlier:
Two things to know about me:
- I'm rubbish with word games.
- I like to trick my children into thinking I'm cleverer than I am.
So having struggled a bit with Wordle I wondered if I could write some code to more efficiently (for that read cheat) find Wordle answers. For this I used my trusty Raspberry Pi, Python and the Natural Language ToolKit (NLTK) module. I'd used NLTK previously for some online data science courses so I knew it could give me a "corpus" (so set) of words to play with.
First I installed NLTK for use with Python 3 on the Raspberry Pi using this command: sudo pip3 install nltk
Then I looked at which word corpus NLTK has that I could use. There's some details here and a few places pointed me to the "Brown" corpus as a good place to start. To make this corpus available for NLTK in a Python script I opened a Python3 shell and ran these commands:
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('brown')
- Whether it is in the final word in the exact position you guessed it it, I call these exact matches.
- Whether it is in the final word but not in the exact position you guessed it in. I call these partial matches.
- Whether it is not in the final word. I call these non-matches.
1-Make a prediction (which I then enter into Wordle) based on the data structures2-Input the result of the prediction from Wordle3-Update some data structures
- A tuple of exact matches where the tuple contains the letter and the exact match position. So from the above it would be [('S', 0)].
- A dictionary of partial matches where the key is the letter and the value is a list of positions that the letter is not in. S from the above it would be {'A': [3], 'R': [4]}
- A list of non-matches. So from the above it would be ['U', 'G']
2)Find a word in the corpus that has each of the 5 most common letters. This leads me to use "AROSE".
Full code listing:
#!/usr/bin/env python3
from nltk.corpus import brown
from nltk.corpus import webtext
from nltk.corpus import gutenberg
import sys
#Our main 5 letter word list
five_letters = {} #A dictionary of 5 letter words with frequencies
#Gets a list of 5 letter words
def add_to_list(in_word_list):
for word in in_word_list:
if word.isalpha():
if len(word) == 5:
if word.upper() not in five_letters: #as variety of case of same word could be present
five_letters[word.upper()] = 1
else:
five_letters[word.upper()] += 1
#Compares our 5 letter word corpus with the list of letter frequencies to find the best start word
#The best word has all the highest frequency letters just once
def get_start_words(letter_frequencies, start_pos):
words_found = 0 #Counts when our set of 5 high frequency letters matches a word from our corpus
list_start_pos = start_pos #Used to track where we are in our letter frequency list
word_list = []
#Loop until we've found words from the word corpus or we've exhausted the
while words_found == 0 and list_start_pos < 22: #It's 22 as this will mean we've got to positions 21,22,23,24,25 in the character frequency list
#Loop for each of the words in the corpus
for my_word in five_letters:
letter_count = 0 #Incremented if we find a letter in the word
#Then for each of the five letters identified by the outer while loop
for i in range (list_start_pos, list_start_pos + 5):
my_letter = letter_frequencies[i][0]
if my_letter in my_word:
letter_count += 1 #Count if the letter is in the word
#return word_list
#See if we found all our letters in the word. So if one letter is there twice we should not
if letter_count == 5:
word_list.append(my_word)
words_found += 1
list_start_pos += 1
#Return our word list
return word_list
print ("########Building five letter word list")
add_to_list(brown.words())
print ("Added Brown corpus and word list has {} entries".format(len(five_letters)))
add_to_list(webtext.words())
print ("Added Webtext corpus and word list has {} entries".format(len(five_letters)))
add_to_list(gutenberg.words())
print ("Added Gutenberg corpus and word list has {} entries".format(len(five_letters)))
#Calculate letter frequencies
print ("########Calculating letter frequencies")
freq_dict = {}
for my_word in five_letters:
#Build letter frequencies
for my_letter in my_word:
#See of the letter is a key in the dictionary
if my_letter in freq_dict.keys():
freq_dict[my_letter] += 1 #Increment value
else:
freq_dict[my_letter] = 1 #Add value
#Sort the dictionary to get the highest probability letters and show the user
sorted_values = sorted(freq_dict.items(), key=lambda x:x[1], reverse=True)
print (sorted_values)
#Holds the round number
round_number = 1
#Holds the exact matches. A list of tuples
exact_matches = []
#Holds the partial matches. A dictionary of key is letter, value is list of positions letter is not in
partial_matches = {}
#Holds the non-matched letters. A list
non_matches = []
#Loop for all the rounds
while round_number < 7 and len(exact_matches) < 5:
print ("######################################################")
print ("########This is round number {}".format(round_number))
#Special case for round 1
if round_number == 1:
#Get a list of possible starting words based upon the letter frequencies
print ("########Getting a list of words to start off with")
#Imagine we get 6 wrong starting words! Accoutn for each
for k in range (0,3):
start_words = get_start_words(sorted_values, k)
print ("Where we start at position {} of the letter frequencies the start words are: {}".format(k, start_words))
else:
#Step 1, rule out a bunch of words that have eliminated letters in them
print ("########Assessing data from previous round to make a recommendation")
print ("########First rule out words based on letters found not to exist in the answer")
after_non_match_check = []
for my_word in five_letters:
has_ruled_out = False
for letter in non_matches:
if letter in my_word:
has_ruled_out = True
if not has_ruled_out:
after_non_match_check.append(my_word)
print ("########At the end of this we are down to {} words".format(len(after_non_match_check)))
#Step 2, rule in a set of words based on the matched list
print ("########Second rule in words based on exact matched letters")
after_full_match_check = []
for my_word in after_non_match_check:
has_ruled_out = False
for match_tuple in exact_matches:
if my_word[match_tuple[1]] != match_tuple[0]:
has_ruled_out = True
if not has_ruled_out:
after_full_match_check.append(my_word)
print ("########At the end of this we are down to {} words".format(len(after_full_match_check)))
#print (after_full_match_check)
#Step 3, rule out a set of words where letters are in partial match positions
print ("########Third rule in words based on partial matched letters")
after_partial_match_check = []
for my_word in after_full_match_check:
has_ruled_out = False
for my_partial in partial_matches: #Loop through each dictionary entry
#First simply check the partial is in the word
if my_partial in my_word:
#Now check for the partial positions
for my_partial_pos in partial_matches[my_partial]: #Loop through each item in the partial match
if my_partial == my_word[my_partial_pos]:
has_ruled_out = True
else:
has_ruled_out = True
if not has_ruled_out:
after_partial_match_check.append(my_word)
print ("########At the end of this we are down to {} words. Recommendation:".format(len(after_partial_match_check)))
#Form an ordered list of words based on overall word frequency from the corpus that was built right at the start
suggestion_dict = {}
ordered_suggestion = {}
for word_suggestion in after_partial_match_check:
suggestion_dict[word_suggestion] = five_letters[word_suggestion]
#Order the suggestions
ordered_suggestion = sorted(suggestion_dict.items(), key=lambda x:x[1], reverse=True)
#Tell the user
if len(ordered_suggestion) < 11:
print (ordered_suggestion)
else:
#Print just the first 10
print (list(ordered_suggestion)[:10])
#Get input from the user as to what happened in that round
round_result = input("Enter the result of that round in format A_E,B_P,C_N where _E for exact, _P for partial, _N for non matched:")
#Pull round result apart and process each
result_list = round_result.split(",")
#Loop for each result
letter_pos = 0 #Holds which letter result position we're looking at
for result in result_list:
#Get the entered letter and the result
entered_letter = result[0]
letter_result = result[2]
if letter_result == "E":
print("########Processing an exact match for letter {}".format(entered_letter))
#See if we already have this exact match. If not, add it
letter_found = False
for my_tuple in exact_matches:
if my_tuple[0] == entered_letter and my_tuple[1] == letter_pos: #So we could have double letters so this checks for existence of the letter in the given position
letter_found = True
if not letter_found:
exact_matches.append((entered_letter,letter_pos))
#Update existing partial matches as well, i.e. 1)They can't be in the position of the found letter. Also, if there is already a partial for what is now exact, remove it
if entered_letter in partial_matches:
partial_matches.pop(entered_letter)
else:
for partial in partial_matches:
if letter_pos not in partial_matches[partial]:
partial_matches[partial].append(letter_pos)
elif letter_result == "P":
print("########Processing a partial match for letter {}".format(entered_letter))
if entered_letter in partial_matches:
#Look partial record, seeing if the current position is in the position list. If not, add it
if letter_pos not in partial_matches[entered_letter]:
partial_matches[entered_letter].append(letter_pos)
else:
#Entered letter not in partial matches dictionary. Add it together with the position
partial_list = []
partial_list.append(letter_pos)
#But as this is a new partial we also need to add all existing exact matches which the letter can also not be in that position
for exact in exact_matches:
partial_list.append(exact[1])
#FInal update of the partial list
partial_matches[entered_letter] = partial_list
elif letter_result == "N":
print("########Processing a non match for letter {}".format(entered_letter))
#See if we already have this non-match. If not, add it
if entered_letter not in non_matches:
non_matches.append(entered_letter)
#Update so we get the next letter position
letter_pos +=1
#Show what the structures are at the end of this round
print ("########Exact matches {}".format(exact_matches))
print ("########Partial matches {}".format(partial_matches))
print ("########Non matches {}".format(non_matches))
#End of turn Update for next loop
round_number += 1
#See how we came out of the main loop
if len(exact_matches) == 5:
print ("########You won! Way to go/cheat")
else:
print ("########All rounds completed, you lost!")