More English dictionary fun
More English dictionary fun
(OP)
Based on a file that contains 194,000 English words, Mike Lewis mentioned in thread1551-1811770: English dictionary fun
I mentioned a spin off question to his question about all words in the graph
What letters in which arrangement allow to build the most words?
I think the sheer number of 26^9 combinations (as letters can occur more than once and their position matters) cannot easily be brute forced.
So I thought of a milder problem of which 9 letter word inn the dictionary yields most words. There still is the issue of needing to test all paths in all the different ways the graph nodes can be populated.
So I thought I at least probe by choosing random paths of length 9 and random 9 letter words from the dictionary.
There are 784 full-length paths and there are 28608 9-letter-words, which may be reduced by palindromes and removing mirrored or rotated paths, but even without such optimizations means analysis of a magnitude of 10 million problems of that type. Could be tested in about 8 hours. Lets say within a day, when assuming 27 ms for a single problem setting is the average.
It could be fun to output the best solution found so far on the way. It will not include situations where no 9-letter word is found, but there might be more shorter words worth doing without the 9-letter-word category. Another idea is combining the letters of all 4 and 5 letter words to have statistics on which letter composition is most used. You can of course get an orientation from letter frequency analysis, but the number of repeated letters and letter combinations can shift this a bit. Many vowels are good, I guess repeated Es would work better but surely the optimum isn't all vowels.
I mentioned a spin off question to his question about all words in the graph
L - A - R | x | x | C - O - I | x | x | T - A - F
What letters in which arrangement allow to build the most words?
I think the sheer number of 26^9 combinations (as letters can occur more than once and their position matters) cannot easily be brute forced.
So I thought of a milder problem of which 9 letter word inn the dictionary yields most words. There still is the issue of needing to test all paths in all the different ways the graph nodes can be populated.
So I thought I at least probe by choosing random paths of length 9 and random 9 letter words from the dictionary.
There are 784 full-length paths and there are 28608 9-letter-words, which may be reduced by palindromes and removing mirrored or rotated paths, but even without such optimizations means analysis of a magnitude of 10 million problems of that type. Could be tested in about 8 hours. Lets say within a day, when assuming 27 ms for a single problem setting is the average.
It could be fun to output the best solution found so far on the way. It will not include situations where no 9-letter word is found, but there might be more shorter words worth doing without the 9-letter-word category. Another idea is combining the letters of all 4 and 5 letter words to have statistics on which letter composition is most used. You can of course get an orientation from letter frequency analysis, but the number of repeated letters and letter combinations can shift this a bit. Many vowels are good, I guess repeated Es would work better but surely the optimum isn't all vowels.
Chriss
RE: More English dictionary fun
Chriss
RE: More English dictionary fun
CODE -->
Running time (excluding printing the results): 0.73 seconds.
Three longest palindromes:
Spoiler:
DEIFIED, REVIVER, ROTATOR
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
Edit: You see? hen you swap letters and apply all arrangements on the 3x3 nodes graph it's sufficient to test one of the anagram words. palindromes obviously are a special case of that.
Chriss
RE: More English dictionary fun
Not to worry. It was an interesting exercise to find the longest single-word palindromes.
What I would really like is some automatic way of generating multi-word palindromes, but that would be asking too much. (I'm thinking of things like being able before seeing Panama, or something vaguely like that <g>.)
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
Maybe I did not understood the challenge, but the longest palindromes in the english3.txt file are 9 letters long.
Spoiler (one-liner)
Spoiler (longest palindrome)
Feherke.
feherke.github.io
RE: More English dictionary fun
So which letters do you need to put in to get most words out from traversing legal paths in this graph (the rules were visiting neighbor by neighbor, neither skipping over a node (ie the graph does not connect all letters to each other, only the center letter sees all others as its neighbors) nor visiting nodes twice overall and even not to use an o as oo in a word. And more strict rules someone suggested was not allowing paths to cross, eg not going right, down+left ,right, up+left, as the last step crosses the line of the second step. I won't follow that rule, so I will allow crossing, but you can always modify a task as you like to.
Since you can populate the 9 ? nodes with 26 letters you have 26^9 combinations, some that are mirrored or rotated could be removed, but the magnitude is rather that for the arrangement of any letter alone, and then each can be traversed in all the different allowed paths. The arrangement makes a difference, if you have two Os, but they are not neighbor nodes, you can't build words with oo.
So my thought to make that a bit easier is populate the nodes with 9-letter-words and find the optimal one in its optimal path to allow most other words, down to 1-letter-words and find out which 9-letter word in which arrangement yields the most words. And there anagrams make no difference, nor palindromes, sorry. But that's actually the least important about the task, just a possible way to shrink down the number of cases to process.
(Another edit: I think even anagrams make a difference, because you can have new direct neighbors you didn't have before, so that is even a red herring).
Chriss
RE: More English dictionary fun
Not sure if you were being circumspect or not but my favorite Palindrome involves Panama.
A Man, A Plan, A Canal, Panama!
RE: More English dictionary fun
Yes, you did understand the challenge. The two words that you identified are indeed palindromes, and are nine letters long. However, they are both proper nouns, Rightly or wrongly, I am now using a slightly shorter word list (downloaded from the same source) that excludes many proper nouns. Maybe I'll go back to the longer one.
kwbMitel,
Yes, of course I was referring to the famous Panama palindrom, and also to the supposed quote re Napoleon's exile: "Able was I ere saw Elba."
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
That suggests a programming challenge to find the longest alternating consonant/vowel or vowel/consonant word in the dictionary used in this thread. It looks to me to be an easy programming assignment, except that I can't think of a simple way to handle the letter 'y'. One would need some artificial intelligence built in to distinguish 'y' as a consonant from 'y' as a vowel. It might be easiest to have a manual step at the end to examine the output and eliminate 'y' words that don't meet the requirements.
RE: More English dictionary fun
Alternatively, you could simply establish a rule that says that Y is always a consonant.
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
I'll do a bit more testing and tidy up the code before I post it. In the meantime, here is the 15-letter word:
Spoiler:
Unless anyone can find a longer one?
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
CODE -->
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
In all probability you are right that none of the longer words contain a 'y'. However, I suspect your program doesn't conclusively demonstrate this. I don't see any code that allows for the possibility of multiple 'y's in the same word with at least one 'y' used as a consonant and another 'y' used as a vowel.
RE: More English dictionary fun
Agreed. I didn't allow for that possibility. I would need to think about that.
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
And in case anyone is interested, the longest word that contains multiple Ys is: encyclopaedically (which my spell-checker has just rejected).
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
Hidden:
15 letters
cytomegalovirus
monocotyledones
paramyxoviruses
unimaginatively
16 letters
hypocotyledonary
volatilisability
volatilizability
22 letters
honorificabilitudinity
RE: More English dictionary fun
My program treats every 'y' as a wildcard that can be either a consonant or vowel. Because of this I can find words like 'yakety', but I also get a lot of spurious hits that don't meet the alternating consonant/vowel criteria at all. An example is 'waylays', The first 'y' passes the a/y test as a consonant and the y/l test as a vowel. Similarly for the second 'y'. A more sophisticated program could doubtlessly eliminate these spurious hits within the progamming logic, but I'm not sure I'm sufficiently interested to make the effort.
RE: More English dictionary fun
Hidden:
anyway
everyday
everyway
gayety
yakety
yarely
RE: More English dictionary fun
When I started all this (with my first efforts at a Trackword), I was using a 194,000-word dictionary (http://www.gwicks.net/textlists/english3.zip), which I expect is the one that you are using, Karl. But then I switched to a smaller one (84,000 words: http://www.gwicks.net/textlists/engmix.zip). I can't remember exactly why I switched. It might have been because the bigger one had too many ridiculously obscure words.
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
CODE -->
Not all of these are in my "real" dictionary, by the way.
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
aminosalicylic
deliberatively
deliverability
denominatively
hereditability
heterozygosity
manipulatively
pararosaniline
recapitulative
recapitulatory
recoverability
regeneratively
rehabilitative
relocatability
supererogative
supererogatory
tumorigenicity
vaporisability
vaporizability
verisimilitude
vituperatively
RE: More English dictionary fun
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads
RE: More English dictionary fun
I'm afraid my results using the shorter engmix.zip dictionary don't match yours. Your program doesn't seem to find a bunch of legitimate alternating consonant/vowel words that contain a 'y'. (You also seem to have manually edited out words that are minor spelling variants of another.)
My list using engmix.zip:
15 letters
unimaginatively
verisimilitudes
14 letters
manipulatively
recapitulative
recoverability
rehabilitative
supererogatory
verisimilitude
vituperatively
13 letters
demilitarised
demilitarises
demilitarized
demilitarizes
denaturalised
denaturalises
denaturalized
denaturalizes
depoliticised
depoliticises
depoliticized
depoliticizes
gynecological
imaginatively
ineligibility
inevitability
inimitability
mineralogical
monocotyledon
numerological
recapitalised
recapitalises
recapitalized
recapitalizes
recapitulated
recapitulates
rehabilitated
rehabilitates
rehabilitator
taramasalatas
toxicological
unimaginative
RE: More English dictionary fun
I noticed earlier that my list only shows British spellings, not American ones (at a quick glance). Yet your list has both American and British variations, such as "demilitarised" and "demilitarized", which is strange. I'll investigate.
Mike
__________________________________
Mike Lewis (Edinburgh, Scotland)
Visual FoxPro articles, tips and downloads