Tuesday, August 23, 2022

Wordle: number of words from unique letters

In [1] an optimization model is proposed for the following problem:

Using a list of 5-letter words, try to find a set of 5 words such that each letter is used at most once. 


Data


The list of 5-letter words is from the wordle [2] game. Words with duplicate letters are removed. A partial list is:


----   1035 SET w  words: large list of 5-letter words

abdom,    abend,    abets,    abhor,    abide,    abies,    abyes,    abilo,    abime,    abysm,    abled,    abler,    ables
ablet,    ablow,    abmho,    abner,    abnet,    abode,    abody,    abohm,    aboil,    abord,    abort,    abote,    about
above,    abret,    abrim,    abrin,    abris,    abrus,    absey,    absit,    abstr,    abune,    abuse,    abush,    abuts
acedy,    acerb,    ached,    achen,    acher,    aches,    achor,    acidy,    acids,    acier,    acies,    acyls,    acing
ackey,    acker,    aclys,    acmes,    acned,    acnes,    acoin,    acold,    acone,    acorn,    acost,    acoup,    acred
acres,    acrid,    acryl,    acron,    acrux,    acted,    actin,    acton,    actor,    actos,    actus,    acute,    adcon


The total number of 5-letter words in this list is: 10,175.

For our model, we need to know if character \(c\) is present in word \(w\). For this, we form a 2-dimensional set \(\color{darkblue}{\mathit{inWord}}_{w,c}\). A partial view looks like this:


----   1053 SET inWord  character c is in word w

                a           b           c           d           e           f           g           h           i           j

abdom         YES         YES                     YES
abend         YES         YES                     YES         YES
abets         YES         YES                                 YES
abhor         YES         YES                                                                     YES
abide         YES         YES                     YES         YES                                             YES
abies         YES         YES                                 YES                                             YES


Model


A simple model to find a maximum subset of words that have unique letters can be:


Covering Model
\[\begin{align}\max& \sum_w \color{darkred}x_w \\ & \sum_{w|\color{darkblue}{\mathit{inWord}}(w,c)} \color{darkred}x_w \le 1 && \forall c \\ & \color{darkred}x_w \in \{0,1\} \end{align}\]

Note that this model just finds one solution. This is the reported solution:


----   1082 VARIABLE x.L  select word

brick 1.000,    fldxt 1.000,    nymph 1.000,    quags 1.000,    vejoz 1.000


----   1082 VARIABLE count.L               =        5.000  number of selected words


I am not familiar with all these words. E.g. I had to look up: fldxt (an abbreviation for fluid extract).


This solution is found very quickly. The total solution time is about 0.14 seconds (and needed 0 nodes).


All solutions


An obvious question is: how many sets of 5 words with unique letters do exist? We can use the solution pool for that. This is a technology present in most of the more advanced MIP solvers, to find and report more than one solution. When I do that, we see that the total number of 5-word solutions is 831. The solution time to find all these solutions is about a minute (78 seconds).

An alternative approach would be to add a so-called no-good cut to the problem to forbid the current solution and resolve the model. For a small number of solutions, this is a great strategy [3]. But in this case, the number of solutions is large. We would need to solve 832 MIP problems to conclude there are 831 solutions. The last model would be infeasible: that is the stopping criterion. Each subsequent model is a bit more expensive to solve (not uniformly so but on average). 

An early attempt to solve this problem is documented in [4]:

According to Parker, executing the Parker algorithm on a laptop took about a month.


References


Appendix: GAMS model


$ontext

   
Using a large collection of words, form a subset of 5 words such that
   
each letter is not used more than once.

$offtext


$set inc words.inc
$set url https://raw.githubusercontent.com/aopt/OptimizationModels/main/Wordle/%inc%


*-------------------------------------------------
* 5-letter Words
*-------------------------------------------------

* download file
$if not exist %inc% $call curl -O %url%

set
  w 
'words: large list of 5-letter words' /
$include %inc%
 
/
;
display w;

scalar numWords 'number of words in our collection';
numWords =
card(w);
display numWords;

*-------------------------------------------------
* Derived data
* Characters in words
*-------------------------------------------------

sets
  c 
'characters' /a*z/
  p 
'position' /1*5/
  inWord(w,c)
'character c is in word w'
;

* ord(string,n) = ascii number of string[n]
inWord(w,c) =
sum(p$(ord(w.tl,p.val)=ord(c.tl)), 1);
display inWord;


*-------------------------------------------------
* Optimization Model
*-------------------------------------------------

binary variable x(w) 'select word';
variable count 'number of selected words';

equations
   counting 
'number of different words with unique characters'
   cover(c) 
'number of selected words containing char c'
;

counting..  count =e=
sum(w, x(w));
cover(c).. 
sum(inWord(w,c), x(w)) =l= 1;

model m /all/;
solve m maximizing count using mip;
display x.l,count.l;

* remove if you want to see all solutions
$stop

*-------------------------------------------------
* How many solutions?
*-------------------------------------------------

$onecho > cplex.opt
SolnPoolAGap = 0.5
SolnPoolIntensity = 4
PopulateLim = 10000
SolnPoolPop = 2
solnpoolmerge solutions.gdx
$offecho

m.optfile=1;
option threads=0;

* not strictly needed, but seems to help
count.fx = round(count.l);

Solve m using mip maximizing count ;

set
  index0
'superset of solution indices' /soln_m_p1*soln_m_p10000/
  index(index0)
'used solution indices'
;
parameter
  xall(index0,w)
'all solutions'
;

execute_load "solutions.gdx",index,xall=x;
display xall;

scalar numSolutions 'number of solutions';
numSolutions =
card(index);
display numSolutions;

No comments:

Post a Comment