JMDict makes the interesting choice of organizing data by grouping together various readings and glosses, where the glosses are organized by part of speech (and the readings are not).

This confuses beginners, because normally when you want to look up a word you will want to look up a word which is considered to be a particular part-of-speech. It could be a noun, a verb, an adjective, or whatever. A great example is the term “meeting” which can be (in English) both a noun and a verb.

But beginners do not need to be told about every single part of speech a word can take on. For example, beginner ESL students do not learn about gerunds (ex. “meeting” again) until they are ready. What’s more, since readings are not organized by part of speech, special rules are needed to determine which readings go together with which glosses, and it is all together entirely too confusing.

Considering this a problem of natural language processing, a software solution to this would then be impossible because the readings are not organized by part of speech, and therefore there is no way to know which gloss a user needs to find when he is looking up a word. Therefore, we must conclude that organizing gloss by part of speech underneath readings which are not organized is not helpful for learners. A software solution would only be possible if the readings (keywords) themselves were organized by part-of-speech. Note that this is how most normal dictionaries are organized; in the Oxford Dictionary homonyms, homographs, synonyms and so forth are not listed in main categories as equals of other words because that would confuse people as to the distinct meanings and implications of various words. It’s the job of a thesaurus, for example. To include them in a dictionary without other explanation would imply that they had the same meaning as the main entry glosses. Thus, extra information is necessary in order to ensure that various readings and glosses which don’t really go together are kept separate. In extreme cases, you can have different readings which apply separately to different glosses, and one wonders why they are in the same entry at all anyways.

Organizing glosses by reading and part-of-speech is just simpler. It ends up requiring simpler data structures and simpler methods. There is no need for special case programming. No need for XML or JSON in the database. Further, it handles the stage of learning most commonly traversed by beginners very well, just as well as any other method.

When the beginners are not beginners anymore, perhaps they would be better served by a different kind of presentation. But my own experience as a teacher tells me that is not such a relevant concern. A basic keyword-gloss-example method can be extended to back a vocabulary of thousands of words; it doesn’t really expire so much as the student moves on to other ways of learning when they are ready. A project like JMDict_e may be useful in that situation, in the context of a full J/E dictionary. I guess in the end that’s what it was designed for, anyway.

By Serena

Leave a Reply

Your email address will not be published. Required fields are marked *