Possible implementation ideas for the K-1 algorithm. We will begin with a quote:
Spaced repetition is a solid idea with real results. But, there is a reason why it was invented in the 1970s and still hasn’t changed the world- it is not a magic bullet. Spaced repetition alone will not make you smart. In fact, if you use badly created material to study, it will even be less effect than an old fashioned approach like cramming textbooks.http://nihongoperapera.com/flashcards-insufficient.html
Since K-0 doesn’t use SRS (technically speaking), why bother implementing it, if it isn’t a magic bullet?
Clues from the Classroom
Stephen Krashen in his influential work has demonstrated time and time again the power of reading when learning a language. Yes, SRS has applications outside of language learning, but being completely honest most of what Anki, SuperMemo, Mnemosyne, etc. are used for is language instruction. At the least, it’s our sole focus here, so we will examine things from that perspective. I’d just like to point out that yes, SRS does have applications outside of language instruction. But, language instruction is special because unlike other forms of learning, where one is expected to learn before using, language instruction often requires a student to use the language as he is learning it.
So if SRS is in fact useful, why isn’t a greater focus paid to SRS by people like Krashen. The answer is simple; even according to my own research, reading material can be constructed with vocabulary lists as small or smaller than 50 words. In short, SRS isn’t really needed to teach a language, even in terms of saving time or efficiency. A massive comprehensible input (MCI) based approach can be used almost immediately with great effect, assuming the correct material is constructed or found and presented by a qualified instructor.
So really, what is the point of SRS? Is SRS just a red herring?
What’s the Point of SRS?
Okay lets say you want to learn espanese and I give you the word “finklebottoms”, which I tell you means “feather pillow”. You study this information, and the next day I ask you what it is. Maybe you remember it, and maybe you don’t. The essential idea behind SRS is that if you knew what it meant, instead of showing it to you again the next day, I would increase the interval to maybe 4 days or 6 days, and if you didn’t remember it after that time I would shorten the interval; maybe start over, and require you to answer correctly again before increasing the interval again. The effect is that as time goes by, the cards you know will be shown to you less frequently than the cards you don’t know.
There are all sorts of ways to get this done; Let’s say I have “graded intervals” like 1 minute(A), 10 minutes(B), 30 minutes(C), 1 day(D), 4 days(E), a week, 21 days, and so on. You would get to interval F only after demonstrating interval E. If you failed at F, you might go back to interval E. Maybe interval D. Whatever. The point is, that over time we expect that the average time between repetitions will be just small enough to refresh your memory.
There are however a number of problems with this theory (no matter how well it is implemented).
First, no matter how much data one may collect on the average human mind, the interval on any success is always a guess.
This is due to various factors far outside of the program’s control. Say the student is in a language course and trains a specific 20 words every week. The student will mark these words as learned relatively quickly and the program will be thrown out of whack because it can not possibly account for the knowledge it does not contain. In short, SRS operates in a vaccum and is always playing catch-up with the real world.
What’s the Real Goal of SRS?
Let’s say you need to learn 20 words. Some of these words will be easier for you to know and some will not. Lets go back to finklebottoms. Lets say, that it turns out, the next day you remember finklebottoms. And you remembered it on the fourth day. And on the 21st. And after six months. A good SRS algorithm might then guess you are doing well, or that your memory is good, and try to just show you some cards one or two months later instead of making you ramp up. The only problem with this is that every time a user is shown a card it is a “review” and will trigger the student’s memory. So the review process itself can affect the length of time a student remembers the card. Further and again, if the student as much reads and understands a single newspaper article, it is a powerful form of review for every word in that article. No algorithm will be able to account for such types of random outside reviews.
Relying on SRS considered harmful
Consider that instead of just telling you what finklebottoms meant, I told you twenty times, possibly over a period of days. And you did some number of reviews that a SRS program seeks to avoid (but not a monotonous amount). Then you write it out twenty more times in the classroom, then twenty more times for homework. The fact is, you are going to remember what finklebottoms means a whole lot better with this sort of rote. Better in fact than if you used SRS alone. So depending on what your goals are (learning a specific, short list i.e. cramming) SRS may not be what you are looking for.
Where SRS really shines is in remembering large amounts of untargeted information. Meaning, if you are having trouble with finklebottoms (the word you absolutely need to know) any good SRS algorithm is bound to skip over the card and let you learn easier words first — words you don’t really care about. So while SRS will allow you to memorize as many words as you possibly could — thus maximizing your study time in the long run — it is no good if you need to remember a short, targeted list of words in the short term. So if you need to know finklebottoms for your oral exam in two weeks, and you do not care about any other three words, SRS is not the algorithm you are looking for.
What about just making a deck with the words you need to know?
This is possible, it is in fact possible to “cram”, but to do this you need to bypass the SRS algorithm. As soon as you do this (ex. “study ahead” in Anki) you are using the interval ratings as a card-ordering mechanism. It would work less optimally than the fscore system in K0 since you are only ever shown cards from the very top of the deck. It will also destroys the accuracy of your SRS data, as the program no longer has accurate interval data to judge your progress. So you can see right away the two systems are designed for different goals and aren’t really inter-operable, and that switching between them on a frequent basis can not be recommended.
So if the question behind SRS is, “Did you really need to write it out twenty times, or was five times good enough?” I must say that it really depends on what exactly you are trying to do. If your teacher asked you to write it out twenty times, listening to Anki and writing it out just five times will not be good enough no matter what.
Safely Minimizing Repetitions
Given that the primary goal is no longer minimizing repetitions, it can still be a secondary goal. Then it must be done “safely” i.e. not harming the primary goal. Let’s examine the common sense ways to do this while remaining conservative on the side of too many repetitions.
The Inertia Method (SM-4)
From my research this idea was originally implemented in SuperMemo as of SM-4. What did SuperMemo do differently? They began using a matrix to modify the optimal interval between reviews; instead of just modifying the previous interval based on the user’s response, they used the ease factor to select an interval set which was easier or more difficult. This amounted to an inertia factor which would assign shorter intervals to cards with a spotty history, while letting cards with a stellar history fly forward and receive larger intervals. Implementing the idea in the form of a matrix was interesting, as it would allow the program to fine-tune the interval periods along a curve, by determining what the user “really meant” by his own personal evaluation of a card.
For example, let’s say the user answered “finklebottoms” correctly three times in a row, but “finkleborkums” correctly once, then as unknown, then correctly and correctly again. What exactly did the user mean when he marked finkleborkums as “unknown” that one time? The program could guess what penalty to assign for that answer, and compare it to it’s success at guessing the next interval, and also going forward as the user successfully answers finkeborkums two more times. This would tend to minimize the effect of typos during review and still provide the ability to respond to short-term memory swings (i.e. inertia) for a card.
According to the SM-4 page,
In the course of repetitions, particular entries of the matrix were increased or decreased depending on quality of responses. For example, if the entry indicated the optimal interval to be X and the used interval was X+Y while the response quality after this interval was not lower than four, then the new value of the entry would fall between X and X+Y.http://www.supermemo.com/english/ol/sm4.htm
A matrix approach would allow the program to fine-tune what the user might have “really meant” by grading the card a certain way, but assigning an inertia variable on a per-card basis will have an almost exactly similar effect. There is a tradeoff, but I believe the simplicity of a single variable equation versus a matrix makes the algorithm more beautiful, akin to the type of short, descriptive equations you find in physics for modelling the real world. And, I also believe the right equation will be more accurate than a matrix. It is interesting to note that SM-4 seeds the matrix with data from the SM-2 equation, so at least in the beginning it operates exactly like (is no more efficient than) SM-2. An approach using an equation might be able to solve this issue.
Adding the Memory Curve (SM-6 to SM-11)
Algorithm SM-6 and SM-8 used collected data suggesting a curve with a retention rate of ~70% after 20 days.
Although it has been claimed that Algorithm SM-6 is not likely to ever be substantially improved (because of the substantial interference of daily casual involuntary repetitions with the highly tuned repetition spacing), the initial results obtained with Algorithm SM-8 are very encouraging and indicate that there is a detectable gain at the moment of introducing new material to memory, i.e. at the moment of the highest workload. After that, the performance of Algorithms SM-6 and SM-8 is comparable. The gain comes from faster convergence of memory parameters used by the program with actual memory parameters of the student. The increase in the speed of the convergence was achieved by employing actual approximation data obtained from students who used SuperMemo 6 and/or SuperMemo 7.http://www.supermemo.com/english/algsm8.htm
By SM-11, the algorithm had been improved to the point where the author was reporting a user experience between 30% and 50% more efficient than SM-6. The implication then, is that it would have been around 100% more efficient than SM-2 (Anki).
What was different? The author explains it succinctly;
“SuperMemo begins the effort to compute the optimum inter-repetition intervals by storing the recall record of individual items (i.e. grades scored in learning). This record is used to estimate the current strength of a given memory trace, and the difficulty of the underlying piece of knowledge (item). The item difficulty expresses the complexity of memories, and reflects the effort needed to produce unambiguous and stable memory traces. SuperMemo takes the requested recall rate as the optimization criterion (e.g. 95%), and computes the intervals that satisfy this criterion. The function of optimum intervals is represented in a matrix form (OF matrix) and is subject to modification based on the results of the learning process. Although satisfying the optimization criterion is relatively easy, the complexity of the algorithm derives from the need to obtain maximum speed of convergence possible in the light of the known memory models.”
In essence, SM-11 is just doing what SM-4 wanted to do all along, but the author had discovered the correct variables to use, and had solved the problem of the matrix not updating quickly enough by using ease factors in the matrix instead of intervals. Alongside this was years of collected data seeding the matrix and not “taken from the formulas used in the Algorithm SM-2.” SM-11 was simply the epitome of the inertia idea, implemented using a matrix.
The Path from SM-2 to SM-11 is very logical and straightforward
While I will be the first to admit I only have a very surface understanding of SM-4 through SM-11, and I probably have made errors in the above analysis, the general idea seems very logical, that a logical path of reasoning and experimentation led naturally to SM-11 directly from SM-2. The author seems to say this in his notes as well.
I will probably implement K-1 without a matrix, but instead use a set of logically derived equations with variable inputs corresponding to the recall record of individual items. Again, I prefer this approach because it is more fine-grained, and I enjoy the beauty of simple equations versus tables of values. If this approach really cannot model user data based on experimental results, we can always construct a matrix of arbitrary size later on down the road.
I’ve seen Mnemosyne do this, and Kongzi Desktop has a feature for this too. It involves asking the user directly or indirectly how well he knows a card. This value can be used to seed the starting interval, to seed the inerta, or however one is organizing the cards. The program might even even assign this as a variable or ‘status’ to the card and modify all such marked cards as it learns how the user rates cards the second time he sees them (ex “cards marked known are likely to be marked known on the second interval…”).
Card Status Method
A separate, and very easy improvement (which Anki has also implemented) would be a field or flag recording the status of the card — i.e. lapsed, on hold, hidden, and so forth.
So Is Minimizing repetitions worthwhile?
No. It misses the point.
What is more important is that the user is shown the right card at the right time. If there are no scheduled cards, this really means one of the following:
a) The algorithm is perfect and the user’s brain is full (the dream).
b) The algorithm is not perfect and the user can derive SOME possible benefit from being shown a card which is most effective for him to see at that time (the reality).
SRS proponents state that b) is a waste of time. This isn’t entirely true. It may not be an effective use of time that is true. But if it’s only 50% efficient it is still 500% more efficient than doing nothing, if you do it ten times over. This is the essence of cramming and learning, and using this to jumpstart into readers and being able to ask questions and have a simple conversation in real life.
This is where SRS really fails; it is great at finding the optimum time for reviewing a card. It is terrible at finding the BEST card to show a user (the card he needs to see most) if in fact such a card lies outside the forget-curve.
Is the problem poorly created notes?
I will freely admit this.
Let’s say a user enters keyword and gloss for 5,000 words, and gets to know them all quite well, and Anki tells that user there’s nothing left for him to review that day. Oh really. Well, what about an example sentence for each word? A keyword sentance like, “Pizza is made with cheese and [______] sauce.” could be added for each of the 5,000 words. This form of knowledge is well-known to be helpful. Suddenly, the user has an additional 5,000 flashcards to study! Where did this knowledge come from and why did the program tell the user there was nothing left for him to learn, if the truth is– there was?
What I am really saying is that SRS only gives back to you what you put into it, and if a user is going to a SRS algorithm and asking to learn, and it’s telling him he should not bother studying right now, it’s both true and false. The truth is, given that data set it’s true, but given the real goal — learning a language, it is not true.
Then there is the opportunity cost of entering those notes in the first place. If you study for ten minutes a day, that’s 61 hours. Twenty minutes a day, 122 hours. How long does it take to source 5,000 sentences and put them in? Well, no time if you can find it, but being honest such a thing does not currently exist (and likely never will, which will serve your purpose 100%). So adding it yourself, which is recommended anyways, lets say it takes an honest 30 seconds to source and put in a sentance for any particular word. You’re looking at 40 hours minimum. And being completely honest, we could probably say 60 hours. So you’re losing an entire year’s study time in order to add an entire year’s study material. Even if your algorithm was 100% more efficient, you may not see any benefit from doing this until after a full year, or more, of daily use.
So does anyone really care about being 30% more efficient than SM-2?
No. Well, yes. Again it depends on your goals. If you can be 100% more efficient just by adding cloze or example sentances, then no, nobody cares about the 30%. Time is what it is, a student that only devotes ten minutes a day is a bad student. We all know that a proper student does two or more hours a day per course in outside work just to tread water.
So even using something 50% as efficient as untuned basic SRS (SM-0, fondly referred to as “time bubbles”) or something like the ginormous Pimsleur intervals “guess”, is already so good that improving it by improving your notes and improving your recalls beyond simple front-back flashcards is going to give you more benefit than upgrading your algorithm from K0 or time bubbles to SM-2, or from SM-2 to SM-11, or SM-18, or whatever they’re up to these days. After a point, what being 30% more efficient translates into, is seeing less of the language in a week than you need to for normal studying and learning, and that begins to affect your learning if you don’t have any other input.
Given all of the above, I stand by K0 being a really great algorithm. But of course, SM-2 is better. So while getting to SM-2 efficiency is great, and a desirable goal, why not just skip it and go to SM-4, or something even better? After all, in a way, you don’t have to run faster than the bear, you just have to run faster than the guy beside you.
How fast is that? Apparently, SM-2.
What can be done?
If you use Anki, SuperMemo, Mnemosyne, or any other program, why not keep using it? You’ve invested considerable time into it, and there’s no reason to stop. As user response analysis shows, even an outdated algorithm is better than not using anything at all. The real key is not in accidentally reviewing a card 130 times instead of 100 times. It’s in adding quality flashcards that represent not just vocabulary, but grammar, puns, idioms, phrases of speech, slang, and many other concepts. Also, in expanding your learning by doing many kinds of knowledge representation; not just flashcards, but cloze. Incremental reading. Don’t forget, reading for reading’s sake. Getting out there and actually using the language. If you can do that, it doesn’t matter what program you use, or even if you use SRS at all. You’re learning a language. It becomes yours. Don’t fall into the trap of endless studying in a vacuum, even if it turns out that your study method is 30% or 50% more efficient than someone else’s.