Paul's Geek Dad Blog: February 2022

At the time of writing, the game Wordle is taking the world by storm. Quite rightly, it's genius and so is the guy who invented it.

Here's one I did earlier:

Two things to know about me:

I'm rubbish with word games.
I like to trick my children into thinking I'm cleverer than I am.

So having struggled a bit with Wordle I wondered if I could write some code to more efficiently (for that read cheat) find Wordle answers. For this I used my trusty Raspberry Pi, Python and the Natural Language ToolKit (NLTK) module. I'd used NLTK previously for some online data science courses so I knew it could give me a "corpus" (so set) of words to play with.

First I installed NLTK for use with Python 3 on the Raspberry Pi using this command: sudo pip3 install nltk

Then I looked at which word corpus NLTK has that I could use. There's some details here and a few places pointed me to the "Brown" corpus as a good place to start. To make this corpus available for NLTK in a Python script I opened a Python3 shell and ran these commands:

Python 3.7.3 (default, Jan 22 2021, 20:04:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('brown')

So that sets a corpus of words ready to use in a Python script. I won't go into the detail of how Wordle is played but overall you get 6 goes to guess a 5 letter word. With each guess, the game tells you for each letter:

Whether it is in the final word in the exact position you guessed it it, I call these exact matches.
Whether it is in the final word but not in the exact position you guessed it in. I call these partial matches.
Whether it is not in the final word. I call these non-matches.

I then played around with snippets of code to examine words from the corpus and rule them in or out based upon whether they had exact matches in them or non-matches not in them. That led me overall to an algorithm of:

-Load word corpuses (at the time of writing I use Brown, Webtext and Gutenberg)

-Build a dictionary of 5 letter words and their frequency of occurrence in the corpus

-Setup data structures to log exact, partial and non-matches.

-Loop at least six times doing:

1-Make a prediction (which I then enter into Wordle) based on the data structures
2-Input the result of the prediction from Wordle
3-Update some data structures

(Full code is at the end of this post)

Taking step 2 above first, I simply enter the result of each Wordle round in a coded string. So take this result:

I enter this as S_E,U_N,G_N,A_P,R_P. Where _E is for exact, _N is for non-matched and _P is for partial.

Looking at step 3, I update 3 data structures based upon the input the user provides. The data structures are:

A tuple of exact matches where the tuple contains the letter and the exact match position. So from the above it would be [('S', 0)].
A dictionary of partial matches where the key is the letter and the value is a list of positions that the letter is not in. S from the above it would be {'A': [3], 'R': [4]}
A list of non-matches. So from the above it would be ['U', 'G']

If it's an exact match I do two things:

1)Update the tuple of exact matches with the new matched letter/position combination.

2)Update the partial match dictionary as this is a a)new position a partial matched letter can't be in and b)it may require removal of an entry as what was previously partially matched is now fully matched.

If it's a partial match I do two things:

1)Add a new partial (with the letter position) or update the list of positions for an existing partial

2)If it's a new partial, don't just add the position it is in, add the position of all the other exact matches as the letter can't be in this position either.

If it's a non-match I just add the letter to the list of non-matches.

So for game above, the log output showed this:

Enter the result of that round in format A_E,B_P,C_N where _E for exact, _P for partial, _N for non matched:S_E,U_N,G_N,A_P,R_P

########Processing an exact match for letter S

########Processing a non match for letter U

########Processing a non match for letter G

########Processing a partial match for letter A

########Processing a partial match for letter R

########Exact matches [('S', 0)]

########Partial matches {'A': [0, 1, 3], 'R': [1, 2, 4]}

########Non matches ['O', 'E', 'P', 'T', 'U', 'G']

Finally looking at step 1, for round 1 I recommend a initial word based upon the following logic:

1)Do a letter frequency count across all the 5 letter words in the corpus.
2)Find a word in the corpus that has each of the 5 most common letters. This leads me to use "AROSE".

Then for subsequent rounds I do the following:

1)Eliminate any words from the corpus that have the letters in the non-matching list.

2)Eliminate any words from the corpus that don't have the exact matching letters in the exact matching position.

3)Eliminate any words from corpus that don't have the partial matching letters or, if they do, have the partial matching letters in the positions logged that they can't be in.

...which results in a list of words which I augment with the frequency of occurrence in the corpus. I then print this and let the use choose the word to enter next.

The story so far is that I have played 4 games with this code and:

1)I have got the right answer every time, but

2)My 17 year old daughter has got the right answer in fewer guesses 3 times out of 4!

Full code listing:


#!/usr/bin/env python3
from nltk.corpus import brown
from nltk.corpus import webtext
from nltk.corpus import gutenberg
import sys

#Our main 5 letter word list
five_letters = {}    #A dictionary of 5 letter words with frequencies

#Gets a list of 5 letter words
def add_to_list(in_word_list):
  for word in in_word_list:
    if word.isalpha():
      if len(word) == 5:
        if word.upper() not in five_letters:    #as variety of case of same word could be present
          five_letters[word.upper()] = 1
        else:
          five_letters[word.upper()] += 1

#Compares our 5 letter word corpus with the list of letter frequencies to find the best start word
#The best word has all the highest frequency letters just once
def get_start_words(letter_frequencies, start_pos):
  words_found = 0               #Counts when our set of 5 high frequency letters matches a word from our corpus  
  list_start_pos  = start_pos   #Used to track where we are in our letter frequency list
  word_list = []

  #Loop until we've found words from the word corpus or we've exhausted the
  while words_found == 0 and list_start_pos < 22:    #It's 22 as this will mean we've got to positions 21,22,23,24,25 in the character frequency list
    #Loop for each of the words in the corpus
    for my_word in five_letters:
      letter_count = 0     #Incremented if we find a letter in the word
      #Then for each of the five letters identified by the outer while loop
      for i in range (list_start_pos, list_start_pos + 5):
        my_letter = letter_frequencies[i][0]
        if my_letter in my_word:
          letter_count += 1 #Count if the letter is in the word
      #return word_list
      #See if we found all our letters in the word.  So if one letter is there twice we should not 
      if letter_count == 5:
        word_list.append(my_word)
        words_found += 1
    list_start_pos += 1

  #Return our word list
  return word_list


print ("########Building five letter word list")
add_to_list(brown.words())
print ("Added Brown corpus and word list has {} entries".format(len(five_letters)))
add_to_list(webtext.words())
print ("Added Webtext corpus and word list has {} entries".format(len(five_letters)))
add_to_list(gutenberg.words())
print ("Added Gutenberg corpus and word list has {} entries".format(len(five_letters)))

#Calculate letter frequencies
print ("########Calculating letter frequencies")
freq_dict = {}
for my_word in five_letters:
  #Build letter frequencies
  for my_letter in my_word:
    #See of the letter is a key in the dictionary
    if my_letter in freq_dict.keys():
      freq_dict[my_letter] += 1    #Increment value
    else:
      freq_dict[my_letter] = 1     #Add value

#Sort the dictionary to get the highest probability letters and show the user
sorted_values = sorted(freq_dict.items(), key=lambda x:x[1], reverse=True)
print (sorted_values)

#Holds the round number
round_number = 1

#Holds the exact matches.  A list of tuples
exact_matches = []
#Holds the partial matches.  A dictionary of key is letter, value is list of positions letter is not in
partial_matches = {}
#Holds the non-matched letters.  A list
non_matches = []

#Loop for all the rounds
while round_number < 7 and len(exact_matches) < 5:
  print ("######################################################")
  print ("########This is round number {}".format(round_number))
  #Special case for round 1
  if round_number == 1:
    #Get a list of possible starting words based upon the letter frequencies
    print ("########Getting a list of words to start off with")

    #Imagine we get 6 wrong starting words!  Accoutn for each
    for k in range (0,3):
      start_words = get_start_words(sorted_values, k)
      print ("Where we start at position {} of the letter frequencies the start words are: {}".format(k, start_words))
  else:
    #Step 1, rule out a bunch of words that have eliminated letters in them
    print ("########Assessing data from previous round to make a recommendation")
    print ("########First rule out words based on letters found not to exist in the answer")
    after_non_match_check = []
    for my_word in five_letters:
      has_ruled_out = False
      for letter in non_matches:
        if letter in my_word:
          has_ruled_out = True
      if not has_ruled_out:
        after_non_match_check.append(my_word)
    print ("########At the end of this we are down to {} words".format(len(after_non_match_check)))

    #Step 2, rule in a set of words based on the matched list
    print ("########Second rule in words based on exact matched letters")
    after_full_match_check = []
    for my_word in after_non_match_check:
      has_ruled_out = False
      for match_tuple in exact_matches:
        if my_word[match_tuple[1]] != match_tuple[0]:
          has_ruled_out = True
      if not has_ruled_out:
        after_full_match_check.append(my_word)
    print ("########At the end of this we are down to {} words".format(len(after_full_match_check)))
    #print (after_full_match_check)

    #Step 3, rule out a set of words where letters are in partial match positions
    print ("########Third rule in words based on partial matched letters")
    after_partial_match_check = []
    for my_word in after_full_match_check:
      has_ruled_out = False
      for my_partial in partial_matches:  #Loop through each dictionary entry
        #First simply check the partial is in the word
        if my_partial in my_word:
          #Now check for the partial positions
          for my_partial_pos in partial_matches[my_partial]:    #Loop through each item in the partial match
            if my_partial == my_word[my_partial_pos]:
              has_ruled_out = True
        else:
          has_ruled_out = True

      if not has_ruled_out:
        after_partial_match_check.append(my_word)
    print ("########At the end of this we are down to {} words.  Recommendation:".format(len(after_partial_match_check)))

    #Form an ordered list of words based on overall word frequency from the corpus that was built right at the start
    suggestion_dict = {}
    ordered_suggestion = {}
    for word_suggestion in after_partial_match_check:
      suggestion_dict[word_suggestion] = five_letters[word_suggestion]

    #Order the suggestions
    ordered_suggestion = sorted(suggestion_dict.items(), key=lambda x:x[1], reverse=True)
    #Tell the user
    if len(ordered_suggestion) < 11:
      print (ordered_suggestion)
    else:
      #Print just the first 10
      print (list(ordered_suggestion)[:10])

  #Get input from the user as to what happened in that round
  round_result = input("Enter the result of that round in format A_E,B_P,C_N where _E for exact, _P for partial, _N for non matched:")
  #Pull round result apart and process each
  result_list = round_result.split(",")
  #Loop for each result
  letter_pos = 0        #Holds which letter result position we're looking at
  for result in result_list:
    #Get the entered letter and the result
    entered_letter = result[0]
    letter_result = result[2]
    if letter_result == "E":
      print("########Processing an exact match for letter {}".format(entered_letter))
      #See if we already have this exact match.  If not, add it
      letter_found = False
      for my_tuple in exact_matches:
        if my_tuple[0] == entered_letter and my_tuple[1] == letter_pos:    #So we could have double letters so this checks for existence of the letter in the given position
          letter_found = True
      if not letter_found:
        exact_matches.append((entered_letter,letter_pos))
      #Update existing partial matches as well, i.e. 1)They can't be in the position of the found letter.  Also, if there is already a partial for what is now exact, remove it
      if entered_letter in partial_matches:
        partial_matches.pop(entered_letter)
      else:
        for partial in partial_matches:
          if letter_pos not in partial_matches[partial]:
            partial_matches[partial].append(letter_pos)

    elif letter_result == "P":
      print("########Processing a partial match for letter {}".format(entered_letter))
      if entered_letter in partial_matches:
        #Look partial record, seeing if the current position is in the position list.  If not, add it
        if letter_pos not in partial_matches[entered_letter]:
          partial_matches[entered_letter].append(letter_pos)
      else:
        #Entered letter not in partial matches dictionary.  Add it together with the position
        partial_list = []
        partial_list.append(letter_pos)
        #But as this is a new partial we also need to add all existing exact matches which the letter can also not be in that position
        for exact in exact_matches:
          partial_list.append(exact[1])
        #FInal update of the partial list
        partial_matches[entered_letter] = partial_list

    elif letter_result == "N":
      print("########Processing a non match for letter {}".format(entered_letter))
      #See if we already have this non-match.  If not, add it
      if entered_letter not in non_matches:
        non_matches.append(entered_letter)
    #Update so we get the next letter position
    letter_pos +=1

  #Show what the structures are at the end of this round
  print ("########Exact matches {}".format(exact_matches))
  print ("########Partial matches {}".format(partial_matches))
  print ("########Non matches {}".format(non_matches))

  #End of turn Update for next loop
  round_number += 1

#See how we came out of the main loop
if len(exact_matches) == 5:
  print ("########You won!  Way to go/cheat")
else:
  print ("########All rounds completed, you lost!")

Paul's Geek Dad Blog

Friday, 4 February 2022

Cheating at Wordle with Python, NLTK and a Raspberry Pi