This is a guest post from Dr. Aron Levin, author of The Ultimate Word List – Japanese: 2935 Most Commonly used Kanji (with English translation)
Dear People-who-read-AJATT,
Khatzumoto has allowed me the distinguished honor of writing a guest post. The reason for this post is to PIMP my new book. It’s called “The Ultimate Word List – Japanese: 2935 Most Commonly Used Kanji”. I compiled a huge number of Kanji from online books, news articles, wikipedia, etc., and sorted them by frequency. Then I hired a bilingual Japanese translator (and K’s friend) to provide pronunciation of each Kanji (on’yomi and kun’yomi), and English translation.
I created another 15 books in lots of languages in a very similar fashion. All are sorted by the L2, then translated into English and also transliterated if the language uses a non-Latin alphabet (Greek, Russian, Hebrew, Arabic, Persian, Mandarin, Japanese). All translations/transliterations were done by bilingual native speakers.
These types of frequency lists are useful because the most commonly used words, in any language, represent a large proportion of the words you’ll read and hear. According to my calculations, the 100 most common Kanji represent of 36% of the Kanji you’ll read. Over 1/3! In only 100 Kanji! In Spanish, for another example, the first 100 words represent 57% of words you read. It’s easy to understand the value of learning the really common words.
How do I use the Hebrew frequency list? I used it as one of my primary entry points into Hebrew. I found Hebrew kind of hard to beak into because, even though Hebrew uses an alphabet, you don’t necessarily know how to pronounce a word until you hear it. Thr r n vwls n Hbrw (There are no vowels in Hebrew). With my Hebrew frequency list, I was able to learn the most common words and their pronunciation variants. For example, the 2nd most common word in Hebrew is את. (Don’t forget to read right-to-left) It is pronounced either “et” or “at”. Those two pronunciations are both very important words and have completely different meanings. “At” means ‘you (f.s.)’, and “et” indicates the next word is a direct object.
For another example, take the word רק. The first letter is ר, which is an ‘r’ sound. The second letter is ק, which is a ‘k’ sound. רק is pronounced ‘rak’ and means “only”. The book will tell you all this. It’s useful because otherwise you have no idea if it’s pronounced rak, rek, rik, rok, or ruk. “Can’t I just look that up in a dictionary?”, you ask. YES! Of course. But as a newbie Hebrew-learner, you had no friggin’ idea that רק was a realllly common word. And that’s the point. Your dictionary says it’s just 1 of 40,000 equivalently crazy looking Hebrew words. But the frequency list tells you it’s the 31st most common word in Hebrew. Big difference.
Will memorizing these books make you fluent? HELL NO. They are merely a starting point. I am studying Hebrew on my own and made the Hebrew book first because I wanted it for myself. I found knowing the most common words useful in my self-studies, so I figured people studying other languages might want a frequency list, too. It helped me break into the language and gave me confidence that I was learning important stuff right from the beginning. Studying individual words like this is most beneficial at the start of learning a new language. As you progress, however, sentences are the best. People speak in sentences. But those sentences are made up of words. And certain words are used more than others. Knowing the most commonly used words will only speed up your language learning.
As Khatz would say, get this book if you want eternal happiness and sweet abs.
Good luck with your studies,
Aron
This is EXACTLY what I’ve been looking for!
Thank you Aron.
Is it a vocabulary list or just a characters list?
What Chuck said! Your general idea is spot on but not that original, we have kanji lists organized by grade that are mostly by frequency, we know what the most common 2000 kanji are, see the ad below for the 2046 kanji poster. Japanese is different than other ‘word’ based languages. What I would like is a kanji vocabulary list, say take the the most common 500 kanji and cross reference it with the most common words and give me that in book one. Than bump it up to 1000 most common kanji and the most common words they are in etc. That would be a lot more useful than a list of kanji.
That actually reminds me of a little set of programs I wrote to help me study. It downloads a few hundred pages from the yomiuri shimbun (usually totaling around a megabyte worth of text when the stories have been parsed out), then it uses the periods at the ends of sentences as delimiters to break up that megabyte or so into usually around 100,000 sentences or so. Then it runs my sentence database through Jim Breen’s edict dictionary to count what “patterns” are used in my sentences and then it does the same for every sentence from the aggregate webpage. Then it uses that to figure out how many new “patterns” are in each sentence and it organizes the sentences based on that. In addition, it does a similar type of comparison with my kanji database. Only, with the kanji, it counts ‘unfamiliar’ kanji. An unfamiliar kanji is currently defined as one that has either not been ‘activated’ yet or which has a prescribed delay of less than 30 days.
To keep a short story short, my programs spend a few hours downloading webpages, delimiting them into sentences, and cross-referencing that aggregate with my sentence and kanji databases to produce a list that is sorted primarily by the number of new words and secondarily by the number of new kanji and it’s all presented in a nice html page so I can use RikaiChan. : )
Python is cool. C is cool. Japanese is cool. ; )
Feel like releasing that to the public? 😉
I’d need a place to host it and it’d take rearranging some things… it’d take a while for me to get documentation in place. I’d need to get some kind of interface together, etc. It’s currently a bunch of scripts ; )
I actually don’t think the database format would have to be standardized as long as everything was UTF-8 encoded… Yeah. I’ll work on that.
Well it’s just a list of characters, as it seems… And such you can find all throughout the internet … just google for “kanji frequency list”
they differ sometimes depending on the source over which the frequency was counted … but in general it should not be that importand since the basic kanji all always in the upper places.
This just seems to be a list of kanji with readings, not vocabulary words. Although some kanji can stand for individual words, that’s not always the case. I think this would be great for Spanish for example though.
My problem with frequency lists, however, has always been that if these words are so common why not just spend the time reading books and immersing in your L2 and you’ll learn them in real context and get a lot of exposure at the same time. In fact you shouldn’t have to worry about the most common words, they will take care of themselves.
Wow, I expect more from the commentators/readers of AJATT. You guys didn’t even look at the book, did you? It has the 2935 most commonly used Kanji, and the words they’re in and their readings. And may I mention the book is without romanji? This word list is made of win.
Thanks Aron. I’ll be checking out the rest of your books.
I did look at the book, and it would be an excellent companion book to RTK… but it’s not that useful to me or anyone else who already has basic reading skills.
Yeah, I’m with ya. I think this is just one of those things that’s better as data you find free on a website. Just for giggles.
This book seems like it would be a very good segway book from RTK into sentences.
“Hm I’m done RTK and some grammar what sentence to pull? According to Aron’s sexy book here it says that so and so is the 34th most common kanji I should pull something with that early on”
*find source of entertainment that has said kanji in it and add to SRS*
Cool post, by the way.
I’ll agree with Kanjius, this isn’t a great use of your money. This looks like just single kanji in isolation, not real words. (If this was Chinese, this would be great.) Real words would build your vocabulary, this will not. RTK already gets you familiar with the most common 2K kanji. From there, your probably much better off with something like Smart.fm’s Core2K as it’s free and has decent audio for everything. The other option could be 2001 Kanji Odyssey. These methods would at least teach you readings for real words with sentences.
If you want raw data on kanji word frequency based on Wikipedia you can find it on the RevTK forums.
forum.koohii.com/viewtopic.php?id=5266
This is a kanji/word frequency list. Check the book out, and you’ll find kanji and their words. Use the kanji as Branden says, and you’ll win.
If it is a word frequency list and not a kanji frequency list, you can’t tell from the few pages you can sample on Amazon. From the sample it’s just a bunch of kanji and how to read them when they are alone, no sentences or anything else helpful for context. And even if does have full words and kanji compounds, I still think your time is better spent on other solutions, as stuff like Core2k is free. Besides, if you actually just start reading, frequency will take care of itself.
I think the author Aron and Tyler haven’t studied Japanese at all or for a very short time so don’t seem to understand the problem. The author Aron has confused words with kanji. The book is just a kanji frequency list with a few example words for each kanji which we can get for free and is the same as lots of books and sets of cards I have on my bookshelf. Nothing bad but not worth 3000 yen.
Personally I think frequency lists are great but we already have plenty of kanji frequency lists. What would have been useful would have been vocabulary and phrase frequency lists in Japanese possibly with some ways to study them.
Yeah, I agree. I think you could do a lot better with $30 than on a frequency list. I honestly would rather pay for two months of AJATT+, a much higher value per dollar. I just think that it should be seen, as you said, as another tool to use, not the “wrong tool”.
Sorry, I’ll stop typing. I don’t know why but it just pieved me with all the negativity. Sometime negativity is very good.
Thanks for this post. This *definitely* looks like a good book to buy.
— I’m gonna go ahead and disagree with Khatzumoto here —
In my opinion, the best kanji list is a frequency list. Once you know kanji components (No small task, there are lots of them), building the kanji from them becomes elementary. This is the main draw of RTK, imo. After learning them, at that point that I’d drop RTK with its English stories and switch to 100% Japanese and kick off the sentences, learning any new kanji as they appear in them. This approximates a frequency list. Having a real frequency list can help fill in holes while swiss cheesing your way through the language, for maximum coverage of what really matters, instead of learning many rare kanji early (Compare Heisig order with frequency order. It’s essentially random with many high frequency kanji a good ways in).
Don’t get me wrong, you will need to learn them all eventually, and regardless of what method you use, you’ll arrive at the same place once you know them. Using a frequency list just makes it so you learn the most useful characters first, so you are better off during the journey, understanding more of what you come across, and therefore learning faster.
— end disagreement with Khatzumoto —
What’s more interesting, though, is the word frequency list (…if it has one. It’s a bit unclear from the post whether the book has a frequency list of words, or is just showing common words for each kanji drawn from the frequency list of kanji. Even if it’s the latter, though, it will more or less approximate a word frequency list). This helps even more with sentences, as any word that’s high on a frequency list but you don’t know is a dead giveaway for a hole you need to fill. This definitely fits the “swiss cheesing” model where you learn the most important bits first and fill in the details as needed. A frequency list tells you unequivocally what the important bits are, and that’s very valuable information.
I don’t think this is necessarily a bad book. I just think it’s solving a problem that doesn’t exist in Japanese. With most languages, there’s no definite priority-sort for words; there’s no big, glowing sign saying “literacy begins here”. Japanese, for better or worse, does have this: the General-Use Kanji. And Heisig presents a far more complete method of getting there than this does.
If this book handled compounds, or contained example sentences, it would be more useful in the context of AJATT. It can still be used, as Branden pointed out upthread, but I don’t think that’s worth $30. Were I learning any other language this series covers (especially one without Roman script) it’d be quite useful though.
If you want a word frequency list instead of just kanji, then this might help:
www.manythings.org/japanese/words/leeds/
Thank you so much for posting this link!
Thank you. So much.
Thanks Cameron, a good reference tool, and free!
While I fully support any product that will help people learn Japanese, I have to criticize the concept of a frequency list from a standpoint of what I think are “the principles of ajatt.” In brief, I think that what you ought to learn, and in what order is a hugely complex issue that comes down to much more than statistical word frequency.
For instance, I have a friend who’s learning Japanese basically so he can read OnePiece (a comic about pirates). To him, learning the word 海賊 かいぞく [pirate] is probably more important than learning the word 経済 けいざい [economy], despite the latter being more common in Japanese, generally. The reason you should learn a word is not simply because it’s common, but because it is interesting to you. The reason you learn a sentence is not just because it is in Japanese, but because the content and meaning of the sentence is interesting.
One of the great things about AJATT and using native media as a primary study tool, imo, is the opportunity to discover Japanese for yourself, which gives the language intrinsic value and meaning to you. This opportunity cannot be engendered by a set of words and grammar structures spoon-fed to you out of context from a textbook or a teacher that merely is there to study “because it is Japanese.” In other words, the search for words and grammar that have meaning to you, parsing the wheat from the chaff, is an important part of learning how to read. Using and reading real Japanese is not only the end goal, it is also the means to achieving that goal.
In practice, though, the majority of the first several hundred words will be in high usage regardless of the media, and in fact using native media can quite difficult at the beginning of the sentence phase. (Or maybe I should say, it’s tough to find good media that is appropriate for beginners) I would guess that a resource such as this could be helpful for people in the under-500 ish sentence range, but I would encourage people to wean themselves off bland lists of words, sentences and characters (i’m lookin at you, Heisig) as early as possible and start learning to read things that are fun to you regardless of how unacademic or useless they might seem. (pokemon anyone?) Its about the journey not the destination!
In brief, I worry that people will use a book like this less as a reference and more as a “heisig for words” learning every one religiously until they reach the end. This may achieve some literacy, but at what cost? The cost of the fun that comes with engaging with native media.
fwiw,
Tommy
I went through about 2,500 words(and example sentences)religiously from a frequency dictionary for Portuguese. And while it certainly wasn’t as much fun as native material, by the time I’d finished I found that all the native material I looked at was so much more accessible and enjoyable. Personally, I find it really hard to resist looking up words I don’t know in native material, which takes the fun out in itself. So, spending a few months cramming the boring high-frequency dictionary sentences was like an investment in increased fun later on… I found the fun-cost you talked about was worth it. And I’m doing the same thing in my next language (French).
(blog pimpin’: franticfrenchmission.blogspot.com/2011/01/what-works-for-learning-another.html)
Is there any way I can have Aron’s email? I am looking for this type of book in Hindi. I see he has many other titles but I am specifically looking for a Hindi book of word frequency lists. Maybe this would not be so useful for Japanese but for any other language it is extremely useful.
Thanks
Hi,
I’m Aron. I don’t have Hindi (yet), but there might be something for it in the future. I have some additional stuff in mind for the future that I’m pretty sure you’ll all love. Plus it might be free. We’ll see.
Aron
Awesome. I guess I will have to wait and see.
Aron, are The Ultimate Word List books still available? Amazon says they’re out of print. I was planning to start with the one for Spanish. I do already have a list of common words that I got for free, but I think the way you present it in your book would be more helpful than working from a plain text document with no translations.
BTW, for the folks who were looking for a Frequency List of Japanese words, Wiktionary has lists of most common words for over 30 languages, including Japanese. See ‘Wiktionary:Frequency_lists’ at en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Oops. I just noticed the Japanese words at Wiktionary are Basic Words, not the most frequently used, so that may or may not be helpful afterall. The lists for the other languages are most frequent though. Don’t know why they haven’t compiled one for Japanese yet.
Thanks a bunch for the frequency lists!
To be blunt, this book is a complete waste of time.