Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Sunday, January 17, 2016

Determine the difficulty of an english word

Determine the difficulty of an english word


I am working a word based game. My word database contains around 10,000 english words (sorted alphabetically). I am planning to have 5 difficulty levels in the game. Level 1 shows the easiest words and Level 5 shows the most difficult words, relatively speaking.

I need to divide the 10,000 long words list into 5 levels, starting from the easiest words to difficult ones. I am looking for a program to do this for me.

Can someone tell me if there is an algorithm or a method to quantitatively measure the difficulty of an english word?

I have some thoughts revolving around using the "word length" and "word frequency" as factors, and come up with a formula or something that accomplishes this.

Answer by borrible for Determine the difficulty of an english word


Difficulty is a pretty amorphus concept. If you've no clear idea of what you want, perhaps you could take a look at the Porter Stemming Algorithm (see for example the original paper). That contains a more advanced idea of 'length' by defining words as being of the form [C](VC){m}[V]; C means a block of consonants and V a block of vowels and this definition says a word is an optional C followed by m VC blocks and finally an optional V. The m value is this advanced 'length'.

Answer by Paolo Falabella for Determine the difficulty of an english word


depending on the type of game the definition of "difficult" will change. If your game involves typing quickly (ztype-style...), "difficult" will have a different meaning than in a game where you need to define a word's meaning.

That said, Scrabble has a way to measure how "difficult" a word is which is also quite easy algoritmically.

Also you may look into defining "difficult" in terms of your game. You could beta test your game and classify words according to how "difficult" players find them in the context of your own game.

Answer by chamel for Determine the difficulty of an english word


Word length is a good indicator , for word frequency , you would need data as an algorithm can obviously not determine it by itself. You could also use some sort of scoring like the scrabble game does : each letter has a value and the final value would be the sum of the values. It would be imo easier to find frequency data about each letter in your language .

Answer by Matthieu M. for Determine the difficulty of an english word


In his article on spell correction Peter Norvig uses a dictionary to count the number of occurrences of each word (and thus determine their frequency).

You could use this as a stepping stone :)

Also, frequency should probably influence the difficulty more than length... you would have to beta-test the game for that.

Answer by Martin DeMello for Determine the difficulty of an english word


Get a large corpus of texts (e.g. from the Gutenberg archives), do a straight frequency analysis, and eyeball the results. If they don't look satisfying, weight each text with its Flesch-Kincaid score and run the analysis again - words that show up frequently, but in "difficult" texts will get a score boost, which is what you want.

If all you have is 10000 words, though, it will probably be quicker to just do the frequency sorting as a first pass and then tweak the results by hand.

Answer by DNA for Determine the difficulty of an english word


In addition to metrics such as Flesch-Kincaid, you could try an approach based on the Dale-Chall readability formula, using lists of words that are familiar to readers of a particular level of ability.

Implementations of many of the readability formulae contain code for estimating the number of syllables in a word, which may also be useful.

Answer by Aaron Levitt for Determine the difficulty of an english word


I agree that frequency of use is the most likely metric; there are studies supporting a high correlation between word frequency and difficulty (correct responses on tests, etc.). Check out the English Lexicon Project at http://elexicon.wustl.edu/ for some 70k(?) frequency-rated words.

Answer by BBagi for Determine the difficulty of an english word


I'm not understanding how frequency is being used... if you were to scan a newspaper, I'm sure you would see the word "thoroughly" mentioned much more frequently than the word "bop" or "moo" but that doesn't mean it's an easier word; on the contrary 'thoroughly' is one of the most disgustingly absurd spelling anomalies that gives grade school children nightmares...

Try explaining to a sane human being learning english as a second language the subtle difference between slaughter and laughter.

Answer by AndyM for Determine the difficulty of an english word


Crowd-source the answer.

  • Create an online 'game' that lists 10 words at random.
  • Get the player to drag and drop them into easiest - hardest, and tick to indicate if the player has ever heard of the word.
  • Apply an ranking algorithm (e.g. ELO) on the result of each experiment.
  • Repeat.

It might even be fun to play, you could get a language proficiency score at the end.

Answer by DiscipleMichael for Determine the difficulty of an english word


I would guess that the grade at wich the word is introduced into normal students vocabulary is a measure of difficulty. Next would be how many standard rule violations it has. Meaning your words that have spellings or pronunciations that seem to violate the normal set off rules. Finally.. the meaning.. can be a tough concept. .. for example ... try explaining abstract to someone who's never heard the word.

Answer by gerri for Determine the difficulty of an english word


There are several factors that relate to word difficulty, including age at acquisition, imageability, concreteness, abstractness, syllables, frequency (spoken and written). There are also psycholinguistic databases that will search for word by at least some of these factors. (just do a search for "psycholinguistic database".

Answer by Franck Dernoncourt for Determine the difficulty of an english word


Word frequency is an obvious choice (of course not perfect). You can download Google n-grams V2 here, which is license under the Creative Commons Attribution 3.0 Unported License.

Format: ngram TAB year TAB match_count TAB page_count TAB volume_count NEWLINE

Example:

enter image description here

Corpus used (from Lin, Yuri, et al. "Syntactic annotations for the google books ngram corpus." Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, 2012.):

enter image description here


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.