From the previous discussions we have arrived at the following:

  1. Simple “Time Bubbles” (SM-2) is fine, but you can be ~30% more efficient by introducing an inertia quantifier.
  2. You can be a further ~30% more efficient by using a matrix of such quantifiers and tweaking them based on historical data.
  3. The algorithm’s efficiency is tied to the data source; introducing better data can improve the algorithm’s efficiency by 100% or more.

Number one and two are no big deal from a technical standpoint, but number three has staggering implications. When you look at how users self-medicate #3 they do it by adding different kinds of data. For example, they will add audio, or they will add sentence patterns on top of vocabulary. But this is an incomplete solution. I will state this as a #4 in a moment, even though I believe it is a form of #3. But first let me give you a hard example that I am sure any Anki, SuperMemo, etc. user can relate to.

I was just using Anki to study Chinese. I uploaded a Chinese Character deck, a Vocabulary deck (bi-grams, idioms, etc.) and a sentence deck. All were great decks. But using them efficiently was a pain because Anki doesn’t understand Chinese.

When I learned the sentance “Zhōngguó hěn dà / China is very big”, I had to manually schedule the vocabulary word “Zhōngguó” and the Chinese Characters for “Zhōng” and “guó”. What a pain! And then, I had to review the vocabulary word “Zhōngguó” and the individual characters “Zhōng” and “guó”. If only Anki knew that these three different kinds of information were related, it would save me a lot of time cross-referencing the words. It would save me so much time adding characters and skipping reviews of vocabulary I already know. I found myself thinking, “If only Anki understood Chinese.”

So here’s #4.

4. If the program understands the language, it can improve efficiency by ~100%.

I am pretty sure I don’t have to review the Chinese character, and the vocabulary word, and the sentence, each as if they were independent peices of information. I mean, yes, it’s true that I do in fact want to see the sentence, and that a bigram (in Chinese) often has a non-intuitive meaning. But do I really need to mark a sentence “easy” three times in a row, then input a character and vocabulary for it, then go through the grading process for both the vocabulary and the underlying word? If each one needs five ‘easy’ reviews to hit (say) a 1 year review time, I can certainly knock one and possibly two reviews off the other two kinds of information. This alone would be an efficiency boost of 25%.

So I don’t yet know if this is 100% true, 50% true or 130% true. What I do know is that there is immense promise in this area of research.

Approving K1 for certain languages only

There’s a huge trade-off I haven’t mentioned yet, but assumed (a brilliant reader may have noticed this by now), but, this kind of algorithm can not be defined for a general purpose flashcard interface. We are talking as if Chinese characters, bi-grams and idioms, are the only way a language is constructed. This approach wouldn’t work so well for Japanese or English, where knowledge is represented in a slightly different way.

From an implementation standpoint the algorithm would be K1/general which would be for general/custom cards, then K1/chinese for targeting Chinese, K1/japanese for targeting Japanese, K1/english for targeting English, and so forth. It would be set up as a plugin operating on top of a general purpose interface. The user would have to choose his algorithm from a list. If a targeted algorithm was chosen, it would operate exactly as normal except when encountering cards tagged K1/language. When encountering such cards it could call the plugin like a hook; the plugin would be able to request that hook — which would be created as we develop the plugins. Then there could be special flashcards marked “locked” or “system” which means the user cannot edit those cards; the plugin would then be able to rely that those cards have certain fields it can rely on, such as to co-relate cards in different decks, from different types of knowledge.

The plug-in could then for example check to make sure that if a sentence is marked “easy”, that at the very least least all of the words in that sentence are added to the user’s words deck and scheduled (introduced) for review. It may also be desirable to modify the schedule of the underlying words so they are not due on that day, but perhaps postponed by one day. This longer interval would naturally cause the next interval to be longer when the card is due for review. Perhaps always adding one hour (or one day) to underlying words is desirable. This can all be placed into some sort of settings plugin. Past data can be used to fine-tune the numbers and provide a better initial setting.

Development will continue along these lines.

By Serena

Leave a Reply

Your email address will not be published. Required fields are marked *