So here we are, finally hooking up the entry editor to the flashcard quiz routine. And the most glaring thing is that it no longer makes sense to have people edit their own cards.

The problem is that everything we’ve learned so far points to us having a dictionary-like entry for the cards. The problem is that these entries can not be dictionary entries, because dictionary entries are too complex and wordy for flashcards. What are we looking at here?

An entry has…

  • Keyword
  • Part-of-speech
  • Gloss
  • Example
  • User Knowledge Stats
  • etc.

What we would ideally like to do is split this into a dictionary-style flashcard entry table and a second user table which contains information for cards the user is studying. We actually don’t need decks, tags would take care of that for any reasonable use case (tags would be much more powerful).

So really, what is incompatible with a full dictionary entry? Keyword, POS, Gloss and example would be part of a dictionary entry anyways, with gloss being closer to a thesaurus or word-for word translation (if such a thing is even possible). There isn’t any real need to have multiple copies of the same entry on a per-user basis is there? How many copies of “apple -> ri.n.go” do we really need? The space saving would probably let us just normalize everything. Speed would not be a really major concern anymore since the database size would be much smaller. Realistically even with a 0.1% conversion rate we should be able to pay for a server. (50,000 users -> 50 paying users, $10/mo. = $500/mo. This gives us an absolutely sick server on any reasonable cloud service). I mean, 5000 users a day each sucking up 5mb of our bandwidth is still well below 1 TB bandwidth a month. Which costs $5/mo. on Digital Ocean. We should be making enough money to warrant a server upgrade by then (or something is drastically wrong with our pricing model).

So what would be better is to leverage the user base to flesh out the dictionaries (or not, it wouldn’t matter). We are after all selling a product, which is an easier to use type of learning service. Users don’t want to enter apple/ri.n.go. Who the heck would? It’s JAPANESE. Why force you to type it in? You already have to write it out 500 times. Ahh so that’s it da ne yo. It’s a product. A service.

So the forward facing service should really be a dictionary, with the edit function saved for the admins. Okay then why not just have it on a wiki? Why run a DB? Well, search. It is probably just going to be faster to keep things in a table from the standpoint of search. Then again, this table could be built from wiki entries. This was always a kind of goal I had. Kind of like compiling a CSS file from SCSS or LESS or whatever. The protocol no longer matters because the program will keep a local record for speed.

The question of Knowledge Representation

So let’s say there are two tables, one for cards and one for the reference dictionary. Let’s say the user has 1000 cards in his flashcard table. How do we represent how well the user knows his language? Running it against the reference dictionary would be useless because this is not a target for literacy but a comprehensive taxonomy. So we will need target lists. These would be keyword and/or ketyword/pos targets which would match up well with flashcards. The user could then sum up his knowledge percentages for each flashcard entry and run it against the target list (i.e. a JLPT N3 target list) to see how well they have approached that target.

So in theory we are looking to simulate or approximate what you would get on a test against that target, or the representation becomes somewhat meaningless. This would require some amount of simulation where we give test-style questions with target-level material. This really shows the difference between a reference dictionary and flashcard information, which is now really being taken to mean, not flashcards, but test-like questions. Flashcards would be a component of this, a sort of “staple food” of the process, but not necessarily the entire main draw. Thus the idea here is really not to include the learning material with the reference dictionary or to source it’s information out of it. That isn’t an important goal. It is rather important to organize around a standard knowledge unit wrt a language. Keyword/POS pairs seem to be a very good place to start with one reservation; no ability to distinguish from secondary meanings.

How often do we really need to look at secondary and tertiary meanings? What about Color/n, Color/v? Color/v can have more than one meaning; it can mean to color a drawing, or to color an idea (color a conversation). Obviously color/v has a primary meaning of filling something in with a marker or crayon. Only when this meaning has become familiar does the learner need to know the secondary meaning. But what natural language distinction can be made between them? Both are the word “color”. Both are verbs. In this case both are also transient so we cannot say there is a vi vs vt difference in part of speech. “Blue/adj” is another example. Blue meaning color or blue meaning sad. In these cases there is no reasonable way to distinguish blue/adj/color from blue/adj/sad. If a user sees “Blue (adj.)” or “Color (v.)” he may not know what nuance the answer holds. So then should we add some sort of nuance or “hint” field?

Perhaps.

If the user is learning, they will first learn the primary meaning. Once they know the primary meaning if they are then introduced to a secondary meaning it is entirely reasonable to state that the learner should be aware the word has more than one meaning. Therefore, he may wish to ask for a hint. The only natural language clue we are given in real life for this kind of problem is context. Consider:

  • He colored the drawing.
  • He colored the conversation.
  • He was blue.  (ambiguous — maybe we are describing a smurf)
  • He was feeling blue.
  • He was painted blue.

In the above examples, context — and sometimes general (topical) context — is what we use to determine the meaning of the word. Therefore, we should be able to rely on good keyword sentences to distinguish between entries. The user should be able to ask for a hint to distinguish which particular blue is being talked about. If I saw “He was feeling _______.” or “The fence was pained _______.” I might not know the meaning of “blue” from “sad” or “red”, but I would have a context that would enable me to determine which translation of “blue” was appropriate in my language, or to state in the target language the definition of the word as used in context.

The worst thing that could be done is keyword/pos/#.

But as a native speaker, I know full well that most usages of the word color are not learned as specific cases of part of speech or nuance, but as interpretations based on the grammar (context) as-it-appears. For example, despite being a native speaker I would never say “surprise coloured her voice”, neither have I ever heard such a phrase to my recollection. Nevertheless I understand perfectly well what it means and I can appreciate the nuance or flavor of the text.

How did I understand what it means? By moving, from the primary meaning to similar meanings, and taking the only possible meaning of the phrase given that information. In short this isn’t something you really should learn from a flashcard or even study on an ongoing or repetitive basis. It’s something you are supposed to just know, based on your familiarity with the grammar in context at the time.

Concept: Indexing by Quantum Meaning

There is the question of how to index flashcards (how to present the face). “Blue” isn’t enough. “Blue/adj” isn’t enough. The student learns “Blue/adj” (color) and “Blue/adj” (feeling). So we can see presenting these “tag words” or “topic words”, “hint words”, “subject words” might be a good idea. That particular sense is the actual underlying key (keyword/pos/sense). But the sense could equally well be represented by an example sentence.

An example of why representing dictoinary data is bad(tm)

If we fall One problem is how to keep all the learning data tables (like flashcards, grammar, sentences) properly keyed with the reference dictionary. For example what happens when we change the meaning of a sense in the dictionary, and don’t change any of the attached flashcards/etc which point to it? This is a messy problem. One editor cannot be responsible for editing all quanta of knowledge connected to a quantum sense-meaning, and the real issue here is that a random number is being given meaning which cannot be inferred or assumed from the number. If only we could get away with using keyword/pos! Well, what if we could? How about a super-gloss called “direct”, or “sense” which is given to mean a particular sense? Let’s look at color.

  • ‘the foliage will not colour well if the soil is too rich’ (grow)
  • ‘he hated finger-painting and colouring in pictures’ (filling in)
  • ‘he has coloured the dance with gestures from cabaret and vaudeville’ (influenced)
  • ‘she coloured slightly’ (strengthened)
  • ‘rage coloured his pale complexion’ (visceral expression of emotion)
  • ‘surprise coloured her voice’ (revealed by)
  • ‘the experiences had coloured her whole existence’ (i’ve already run out of sensible words that have a hope of being reflectable)
  • ‘witnesses might colour evidence to make a story saleable’ (influence/lie?)
  • ‘this lent colour and credibility to his defence’ (truth, certainty)
  • ‘he felt the waiters could see that in his cashmere tweed jacket he is sailing under false colours’ (appearances)
  • ‘she was only too anxious to get out of the room now that her employer had shown his true colours’ (real thouhts/feelings)
  • ‘under colour of writing the history of the East Frankish kingdom, he has dealt as much with the history of Italy’ (guise, by-way-of)
  • ‘Sylvia had passed her exams with flying colours’ (super-rainbow easy, happy-go-lucky sort)

Ok so this is not a good idea. As we state we are not interested in representing dictionary knowledge. Anyways a lot of the above is really phraseology ex. “true colors”, “false colors”, versus “colored the …”, etc. which might as well be ‘flashcards’ of their own. There is ample premise for this in other languages — Chinese has kanji and kana, Chinese has bigram, idiom, etc.

Allowances / Phasing / Special Part-Of-Speech

We will have to make a decision. Primary and secondary meanings only. Perhaps differentiated via artificial POS constructs. Example; blue (color) would be adj-t, while blue (the feeling) would be adj-i, (ex, intransient) because it doesn’t refer to a physical thing. This is the best idea yet but problematic in that we would then need to teach the student some sort of arbitrary code for which part of speech goes with which word.

Perhaps some kind of allowance / phasing could be used? If a student is only expected to be aware of blue/adj the color, that’s all he needs to worry about, but if he is expected to know both, then any answer he gives would be appropriate? Maybe.

Maybe what really needs to be done is to just move away from flashcards as a central method of knowledge representation?

 

Meditation

My mother bought a yellow _______ at the supermarket:

a) lemon

b) soap

c) pencil

d) banana

–> We’ve got a problem.

He looked _______.

a) happy

b) green

c) blue

d) sad

–> Another similar type of problem.

Face: Blue (adj). “He looked _____.” (Yet another problem, poor keyword sentances.)

Ultimately we need to figure out how to index the tables so we can provide a good face to the students in flashcard mode. That is the main concern. How do we let the student know that Blue (adj.) is a feeling or a color? “Sense: Feeling”? Context Sentance?

There are strong reasons to settle on a single context sentence. As many as there are to avoid such a construct and use a context keyword (clarification, subject, or topic word)! This will need to be experimented with.

 

Keyword/Pos/Target

Since we are using target lists (targeted vocabulary) there may be some interest in tagging sense by level. Something like “Blue/Adj/Level1” would mean blue the color, while “Blue/Adj/Level3” might be where the person is expected to know we want a specific secondary or tertiary meaning. This would not solve the issue where the same word is used twice in a target (level).

Running Theory

The best idea seems Keyword/Pos/Sense, i.e. /Hint, or possibly /Cloze or /Example. We have to understand that there will be precious few conflicts with this approach anyways, even with a vocabulary of 1000 words there are not many truly secondary meanings to words in the sense of how they are used (overlooking a more academic approach of grammatical pidgeonholing). When primary and secondary meanings in the same part-of-speech become a problem, it is probably time to move over to context anyways, since that has already been observed to be the primary difficulty in determining the meaning of a word at that level. So then we can hide contextual sentances from beginning learners anyways since they don’t need it, and when they do it becomes the demarcation line for determining context anyways.

So then, skip indexing, allow multiple Keyword/POS entries (don’t force it as a unique index) and allow access to a clarification hint (subject/topic word or context sentance) esp. at upper levels.

Transition into Context-based and Cloze tests

It seems then that sentance based keyword tests are actually more useful for determining secondary and tertiary meanings than straight word-translation flashcards. Once the primary meaning of a word has been learned, the student can pick up subtle vagaries via cloze and sentances in-context. The grey (transition) period will be the time during which the student needs to see the context hint to determine the meaning of the word. At that time a transition towards more context-based testing can be made.

By Serena

Leave a Reply

Your email address will not be published. Required fields are marked *