Adding history & results to my vocabulary application

(I originally posted this to my special blog on this application but thought it is worth reposting here as well).

I’ve built quizzes (aka vocabulary drills) several times when I’ve built this kind of application before. One problem I’m trying to solve this time (at all, or better than previous attempts) is to get the drills to be more relevant relative to how well I already know words (or in some cases combinations of words, like the titles (on menus) of certain dishes.

I got interested in this decades ago due to a simple situation. I once went a very nice Italian restaurant in Carmel and its menu was entirely in Italian. At that time I knew very few Italian words so I was completely lost reading the menu. Of course as the restaurant was in the U.S. with English as the native language of the waiters it was easy enough to ask the waiters about the menu items, but of course that blew my pseduo-sophistication about being a worldly gourmet. As a result I set out to create a small “cheatsheet” I could carry in my wallet should I be faced with this situation again. I did this in the very early days of Net access so I was limited in resources I could use to create my own Italian->English word list so I actually had to buy an Italian dictionary which was very limited in its food-related words so my list was never much good.

Years later when I had Net access I returned to my list (now on PC instead of Mac) and I was able to find a lot more. Plus I now had several Italian cookbooks with the recipe titles in both English and Italian (thus making it easy to deduce some individual food words). So using all the dictionaries or food pages I could find on the Net I generated a much larger list. In fact the list eventually got too big and with too many obscure terms (ingredients used in recipes but unlikely to be listed on menus) but worst my obsession with list building (some people collect stuff, I collect knowledge – it’s cheaper and has some practical value). So I eventually found lists of wines, cheeses, pastas, etc. and so ended up with a list that was way too large to study. Plus I had all the recipe phrases from both my cookbooks and various online sources.

So I needed some sort of drill program. At the time the only programming support I had was the old Visual Basic 6 (mostly pre-OOP). I got an application (since dead on an old archive from older PC) and it worked and I learned many words. But as I never went to Italian nor again to a U.S. Italian restaurant without English items I never used what I learned and so, of course, mostly forgot all I learned. And as I don’t have either the vocabulary list or a working version of the application if I ever get to go to Italy I’ll have to do this all over.

During that exercise, however, I wanted the quizzes/drills (i.e. a word with multiple choice answers) to be constructed in a clever way that used history as well as my knowledge. Rather than be drilled over and over on words I knew I wanted to focus on words I didn’t know or that I had gotten wrong on previous drills. I mostly got that to work with the simple approach of generating a “score” (purely based on history, i.e. how often I’d drilled this word, how accurately I got it right on the drill). I could convert these scores into probabilities by simply summing all the scores and then dividing each individual score by that sum, then choose words randomly but via this non-uniform probability distribution. That helped but still had many flaws in terms of building the ideal learning tool. The biggest problem I had was incorporating time. While working on the project I’d do the drills often and I have good short-term memory so I’d learn words (or recipes) but then a month (or more) later I would have forgotten much of that since I had no occasion to use what I’d learned and thus reinforce it by repetition. But I didn’t forget everything equally so merely adjusting the probabilities by a single measure (time since last drill) didn’t work very well, so this time I’m really going to try to do that better. This was actually exaggerated by my score being tied to frequency of drill. As I got to know words better they came up less often (as expected) and thus often the counts of certain words, the ones I knew best, were lower. So when I’d repeat the drill months later those words tended to come up too frequently so I really wasn’t factoring in time properly. And I wasn’t using the drill tool properly, i.e. forcing myself to do the drills fairly frequently (like at least do one every few days on consistent basis).

I know repetition helps from an actual experience. Once I visited Quebec City at a time when the people there, despite being required to learn English in public school, adamantly insisted on using only French. Now I’d had a little French and so wasn’t entirely lost, but my vocabulary was way too small. Fortunately waiters would be much more helpful if at least you tried to speak French (as I noticed them being very rude to some U.S. people who wouldn’t even try speaking French and/or even complained about no English). But the real interesting effect I noticed was that being inserted in a place where I had to use the local language I rapidly relearned (and learned new) French words. The immersion really helps. Lots of restaurants had pictures or specials or whatever that provided clues, but mostly lots of repetition. A few weeks in a place like that and I would have learned a lot, so when it comes to foreign languages there just is no substitute for repetition and usage.

Now last fall when I thought I would die soon I really wanted to go to Spain. Despite living in California for decades and picking up a word here and there I knew very little Spanish. And on a vacation to Portugal where we tended to go more “native” (i.e. go to little mom-and-pop restaurants instead of the touristy ones) I learned, the hard way, that really visiting a foreign country, outside the big cities, really requires some knowledge of local language, certainly at least in terms of ordering food in restaurants. There are some things I want to eat and other things I definitely don’t want to eat so knowing food vocabulary is essential. So in anticipation of a trip to Spain, despite the globalization that exists today and ESL common in Europe, I wanted to go to out of the way small towns and find the small mom-and-pops and eat there, so I needed the vocabulary. With these previous experiences (Quebec, Germany, Japan (didn’t learn much there but they have plastic food displays and I knew the basics of Kanji so I could copy the title under the plastic food to show to the waiter), and Portugal) I had a pretty good idea what I needed to do to prepare myself for a long visit to Spain. I needed two things: a) large food vocabulary and a few phrases needed in restaurants, and, b) again, a good drill/practice program.

So I set out to create the list, but I went about it a bit naively. I’d developed tactics before (with my Italian list) to find multiple sources and consolidate these on a spreadsheet where I indexed each entry by the word (or recipe), by what source I’d found to mine (extract terms with translations) or if I extracted word from a menu listing what online dictionary I’d for translation, and then all the disparate definitions (given online sources are often wrong or use somewhat bizarre translations) so I could, after getting lots of raw data, generate a single and succinct translation. So far so good EXCEPT what I failed to realize is that Spanish food terms are not universal: those in Mexico (or esp. Puerto Rico) may be quite different than in Spain. Plus in more third-world places that use Spanish or smaller restaurants in Spain unique vocabulary would be used (certainly true in Basque country of Spain, but even Catalan was different too). And I’d learned something interesting in China (where my hosts selected food but I asked them lots of questions) that often local ingredients are used, esp. vegetables, that literally don’t have any English equivalent (either the word translation or even that such an ingredient would even be available in the U.S.) So my Spanish lists was a mish-mash. And then I got a different diagnosis and so dreams of a long visit to Spain evaporated and I dropped the list.

So here I am again, trying to compose a good drill program. This time I’m focused on the harder (less commonly used) English words but the basic problem is the same. And now, especially after my years as architect at EMC I have a better idea how to structure, especially better OOP, such a program so I can make it far more sophisticated and robust. So that’s what I’m doing and I’ll comment on this issue again in more technical detail in another post.

Even though I’m trying to write decent code for this third try at this I realize I’m still doing a bit too much quick-and-dirty coding, not following good design rules. In particular, the approach for drill on less common English words has differences contrasted to drill on foreign language words. So I should have realized this at the start and immediately designed and used an XML “configuration” file that would alter the behavior of the application based on the subject matter. I didn’t do that, so that will be a big hunk of refactoring to now go back and add that, but that’s the subject of another post.

Not very many people write code, especially relatively large programs, for their own personal use. I do it because: a) I like to program, in fact, I started doing it purely for fun (although my first encounter, again had a practical value to me, my feeble and amateurish attempt to create a class scheduling program for my high school), b) the relatively few programs I could just buy (or download for free or use at some website) are fairly pathetic and my programs are a lot better, and, c) during the process of development I actually do some of the learning of the vocabulary itself (this can be a drawback, however).

If I did a really good job of this it might make a nice product with a nice alternative revenue stream (having a “free” player, but selling the vocabularies). It would also be interested to do this as a website (but I dread using ASP.Net instead of a local Windows Forms app, or worse, trying to do Javascript, or even worse, coding an Android app, which, of course, would be handy to actually take with me on a foreign trip (bah-hambug at Apple because iOS is closed to causal developers like me and a smaller bah-hambug at Google and Android makers for making Android development messy). A website, if I could ever attract people to visit, has the fun option of using some “big data” tricks (under assumption I get lots of visits) to actually tune the vocabularies (use big data to determine “difficulty” of words, plus how to make the wrong choices on multiple choice harder so guessing is less possible), but I doubt I’ll bother with that since my patience on a single project only lasts a couple of months before I find some new “fun” project.

But as probably the last time I’ll do this in my lifetime I’m hoping this application I’m doing now will be my best which is part of the reason I’m writing all these blog posts so I’m thinking through all the issues fairly carefully, plus also recording ideas or alternative techniques that I would forget without written records (of course, assuming I go back and look at all this written history).

So this won’t be either the least or the last post on this thread, so, oh joy to you, Dear Reader, you get a chance to read more of these long posts.



About dmill96

old fat (but now getting trim and fit) guy, who used to create software in Silicon Valley (almost before it was called that), who used to go backpacking and bicycling and cross-country skiing and now geodashes, drives AWD in Wyoming, takes pictures, and writes long blog posts and does xizquvjyk.
This entry was posted in comment and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s