Adding history & results to my vocabulary application

(I originally posted this to my special blog on this application but thought it is worth reposting here as well).

I’ve built quizzes (aka vocabulary drills) several times when I’ve built this kind of application before. One problem I’m trying to solve this time (at all, or better than previous attempts) is to get the drills to be more relevant relative to how well I already know words (or in some cases combinations of words, like the titles (on menus) of certain dishes.

I got interested in this decades ago due to a simple situation. I once went a very nice Italian restaurant in Carmel and its menu was entirely in Italian. At that time I knew very few Italian words so I was completely lost reading the menu. Of course as the restaurant was in the U.S. with English as the native language of the waiters it was easy enough to ask the waiters about the menu items, but of course that blew my pseduo-sophistication about being a worldly gourmet. As a result I set out to create a small “cheatsheet” I could carry in my wallet should I be faced with this situation again. I did this in the very early days of Net access so I was limited in resources I could use to create my own Italian->English word list so I actually had to buy an Italian dictionary which was very limited in its food-related words so my list was never much good.

Years later when I had Net access I returned to my list (now on PC instead of Mac) and I was able to find a lot more. Plus I now had several Italian cookbooks with the recipe titles in both English and Italian (thus making it easy to deduce some individual food words). So using all the dictionaries or food pages I could find on the Net I generated a much larger list. In fact the list eventually got too big and with too many obscure terms (ingredients used in recipes but unlikely to be listed on menus) but worst my obsession with list building (some people collect stuff, I collect knowledge – it’s cheaper and has some practical value). So I eventually found lists of wines, cheeses, pastas, etc. and so ended up with a list that was way too large to study. Plus I had all the recipe phrases from both my cookbooks and various online sources.

So I needed some sort of drill program. At the time the only programming support I had was the old Visual Basic 6 (mostly pre-OOP). I got an application (since dead on an old archive from older PC) and it worked and I learned many words. But as I never went to Italian nor again to a U.S. Italian restaurant without English items I never used what I learned and so, of course, mostly forgot all I learned. And as I don’t have either the vocabulary list or a working version of the application if I ever get to go to Italy I’ll have to do this all over.

During that exercise, however, I wanted the quizzes/drills (i.e. a word with multiple choice answers) to be constructed in a clever way that used history as well as my knowledge. Rather than be drilled over and over on words I knew I wanted to focus on words I didn’t know or that I had gotten wrong on previous drills. I mostly got that to work with the simple approach of generating a “score” (purely based on history, i.e. how often I’d drilled this word, how accurately I got it right on the drill). I could convert these scores into probabilities by simply summing all the scores and then dividing each individual score by that sum, then choose words randomly but via this non-uniform probability distribution. That helped but still had many flaws in terms of building the ideal learning tool. The biggest problem I had was incorporating time. While working on the project I’d do the drills often and I have good short-term memory so I’d learn words (or recipes) but then a month (or more) later I would have forgotten much of that since I had no occasion to use what I’d learned and thus reinforce it by repetition. But I didn’t forget everything equally so merely adjusting the probabilities by a single measure (time since last drill) didn’t work very well, so this time I’m really going to try to do that better. This was actually exaggerated by my score being tied to frequency of drill. As I got to know words better they came up less often (as expected) and thus often the counts of certain words, the ones I knew best, were lower. So when I’d repeat the drill months later those words tended to come up too frequently so I really wasn’t factoring in time properly. And I wasn’t using the drill tool properly, i.e. forcing myself to do the drills fairly frequently (like at least do one every few days on consistent basis).

I know repetition helps from an actual experience. Once I visited Quebec City at a time when the people there, despite being required to learn English in public school, adamantly insisted on using only French. Now I’d had a little French and so wasn’t entirely lost, but my vocabulary was way too small. Fortunately waiters would be much more helpful if at least you tried to speak French (as I noticed them being very rude to some U.S. people who wouldn’t even try speaking French and/or even complained about no English). But the real interesting effect I noticed was that being inserted in a place where I had to use the local language I rapidly relearned (and learned new) French words. The immersion really helps. Lots of restaurants had pictures or specials or whatever that provided clues, but mostly lots of repetition. A few weeks in a place like that and I would have learned a lot, so when it comes to foreign languages there just is no substitute for repetition and usage.

Now last fall when I thought I would die soon I really wanted to go to Spain. Despite living in California for decades and picking up a word here and there I knew very little Spanish. And on a vacation to Portugal where we tended to go more “native” (i.e. go to little mom-and-pop restaurants instead of the touristy ones) I learned, the hard way, that really visiting a foreign country, outside the big cities, really requires some knowledge of local language, certainly at least in terms of ordering food in restaurants. There are some things I want to eat and other things I definitely don’t want to eat so knowing food vocabulary is essential. So in anticipation of a trip to Spain, despite the globalization that exists today and ESL common in Europe, I wanted to go to out of the way small towns and find the small mom-and-pops and eat there, so I needed the vocabulary. With these previous experiences (Quebec, Germany, Japan (didn’t learn much there but they have plastic food displays and I knew the basics of Kanji so I could copy the title under the plastic food to show to the waiter), and Portugal) I had a pretty good idea what I needed to do to prepare myself for a long visit to Spain. I needed two things: a) large food vocabulary and a few phrases needed in restaurants, and, b) again, a good drill/practice program.

So I set out to create the list, but I went about it a bit naively. I’d developed tactics before (with my Italian list) to find multiple sources and consolidate these on a spreadsheet where I indexed each entry by the word (or recipe), by what source I’d found to mine (extract terms with translations) or if I extracted word from a menu listing what online dictionary I’d for translation, and then all the disparate definitions (given online sources are often wrong or use somewhat bizarre translations) so I could, after getting lots of raw data, generate a single and succinct translation. So far so good EXCEPT what I failed to realize is that Spanish food terms are not universal: those in Mexico (or esp. Puerto Rico) may be quite different than in Spain. Plus in more third-world places that use Spanish or smaller restaurants in Spain unique vocabulary would be used (certainly true in Basque country of Spain, but even Catalan was different too). And I’d learned something interesting in China (where my hosts selected food but I asked them lots of questions) that often local ingredients are used, esp. vegetables, that literally don’t have any English equivalent (either the word translation or even that such an ingredient would even be available in the U.S.) So my Spanish lists was a mish-mash. And then I got a different diagnosis and so dreams of a long visit to Spain evaporated and I dropped the list.

So here I am again, trying to compose a good drill program. This time I’m focused on the harder (less commonly used) English words but the basic problem is the same. And now, especially after my years as architect at EMC I have a better idea how to structure, especially better OOP, such a program so I can make it far more sophisticated and robust. So that’s what I’m doing and I’ll comment on this issue again in more technical detail in another post.

Even though I’m trying to write decent code for this third try at this I realize I’m still doing a bit too much quick-and-dirty coding, not following good design rules. In particular, the approach for drill on less common English words has differences contrasted to drill on foreign language words. So I should have realized this at the start and immediately designed and used an XML “configuration” file that would alter the behavior of the application based on the subject matter. I didn’t do that, so that will be a big hunk of refactoring to now go back and add that, but that’s the subject of another post.

Not very many people write code, especially relatively large programs, for their own personal use. I do it because: a) I like to program, in fact, I started doing it purely for fun (although my first encounter, again had a practical value to me, my feeble and amateurish attempt to create a class scheduling program for my high school), b) the relatively few programs I could just buy (or download for free or use at some website) are fairly pathetic and my programs are a lot better, and, c) during the process of development I actually do some of the learning of the vocabulary itself (this can be a drawback, however).

If I did a really good job of this it might make a nice product with a nice alternative revenue stream (having a “free” player, but selling the vocabularies). It would also be interested to do this as a website (but I dread using ASP.Net instead of a local Windows Forms app, or worse, trying to do Javascript, or even worse, coding an Android app, which, of course, would be handy to actually take with me on a foreign trip (bah-hambug at Apple because iOS is closed to causal developers like me and a smaller bah-hambug at Google and Android makers for making Android development messy). A website, if I could ever attract people to visit, has the fun option of using some “big data” tricks (under assumption I get lots of visits) to actually tune the vocabularies (use big data to determine “difficulty” of words, plus how to make the wrong choices on multiple choice harder so guessing is less possible), but I doubt I’ll bother with that since my patience on a single project only lasts a couple of months before I find some new “fun” project.

But as probably the last time I’ll do this in my lifetime I’m hoping this application I’m doing now will be my best which is part of the reason I’m writing all these blog posts so I’m thinking through all the issues fairly carefully, plus also recording ideas or alternative techniques that I would forget without written records (of course, assuming I go back and look at all this written history).

So this won’t be either the least or the last post on this thread, so, oh joy to you, Dear Reader, you get a chance to read more of these long posts.


Posted in comment | Tagged , | Leave a comment

Going to Boston soon

After dithering quite a bit I finally decided to attend the Centennial Celebration of the founding of the MIT chapter of my fraternity (Kappa Sigma) in a few weeks. Plus I hope to get an extra day and try (most of) the ascent of Mt. Washington, reliving another memory.

In general I don’t like reunions and have avoided them like the plague. In fact my 50th high school reunion was just held this summer and I did invite one attendee over for dinner who swore I/we (plus my wife was in same class) missed a great event. But even with that enthusiastic description I wasn’t sold and so don’t regret not going.

But in a couple of years I have my 50th MIT reunion which to me is a bit bigger deal, plus obviously a once in a lifetime event. But due to the circumstances of my undergrad days I actually had few friends outside of the people in my fraternity (which, then, was across the Charles and so isolated from other school social events).

So when I got the invitation to attend my fraternity’s Centennial that sounded more promising. Not only might I know more people at that event, but the obligations of brotherhood in a fraternity require everyone to be on best behavior and be civil and so forth. Even the young current members will be pleasantly attentive to us old farts because that’s their obligation. So this is more likely to be a better social event for my taste. And it’s not so packed agenda so there will be lots of time to see the surrounding area.

I have been back to Boston multiple times on business trips (only once for vacation) but I was usually too busy with work to do more sightseeing, so this trip fits the bill. And it settles the issue of attending 50th reunion (of course this event may reduce my aversion to reunions and I’ll attend that one too).

So soon off for hopefully a fun trip. Then a week later is our fall vacation so I get to throw the canyon country of Utah and later Santa Fe in for an interesting contrast to the intensely urban area of Boston. Now the question is: can I still drive like a maniac Boston driver?


Posted in comment | Tagged | Leave a comment

Programming post; can’t crosspost it

I spent a long time writing a post based on a simulation I was doing about certain behavior in an application I’m writing. I thought I’d be able to crosspost it here when I was done, but that doesn’t appear to be a feature.

So here’s a link:  Testing randomness in my application


Posted in comment | Tagged , | Leave a comment

Back at Starbucks – 11

I thought I might be done with this post thread but here I am at Starbucks yet again, exiled (somewhat suddenly) from my home where I was busily and eagerly working on a personal programming project. I hate be interrupted in mid-thought when I’m working as my stack of ToDo’s evaporates when I have to switch gears. And seating arrangements and Starbuck’s environment are not very conducive to programming so I guess I’ll find other things to do, my usual routines here.

For several months now the cause of my exile has not been present, but now it’s returning. As usual there is the steady creep back to the previous expected conditions of entitlement of the person who wants the services of my house. Of course that person would love me around, as another cook and waiter, too, but I’m at the very least not going to do that routine.

Meanwhile this situation gives me a chance to catch up on some posting. The good thing about being stuck at Starbucks is that it’s a good place to work on posts and so not being here has been one of several factors why my posting has been so light lately. The others are: a) my obsession with getting weight back under control, so much of my day is spent on exercise machines, b) home is not a good place to have the privacy to do these posts, c) my programming projects consume most of the rest of my available time. Hard to believe, that retired, I don’t have time for a few posts. Of course I’ve been posting other places as well.

And soon it’s about football season and I’m stuck with season tickets (yet again, not my choice, hopefully most will get sold this year) and that really blows the weekend. Or, of course, geodashing trips that eat up at least one weekend. And at the September will be another vacation. It seems like just yesterday I was on the road, but in fact it’s now about four months ago. I really wish I could do a lot more trips, especially solo, but it seems I have the pattern now, no (long) travel in summer or winter, fall is for “family” travel, and spring is my only chance for solo trips. Given I just turned 68 there won’t be many more opportunities for camping trips so I hope I can get a few more while I’m still able.

I’m not going to post my weight charts since I pigged out yesterday and blew my record (even after hitting new (recent) low during the week). The tyranny of my data collection is an occasional discouragement but I know today’s number was semi-bogus so I’ll just keep going and focusing on longer-term, but I was thinking with the interruptions of vacation and football games it’s going to be hard to hit my target. Sure I could try to maintain eating discipline (exercise is almost impossible) during these other activities, but that kills some of the fun so therefore I end up blowing a full week of progress just by having a little fun.

But enough of this, more posting to do other places and usual Sunday online chores.

Posted in comment | Tagged | Leave a comment

Time to get back on the wagon – 2

Now 12 days into my renewed discipline I have only a bit of progress to show. I’ve succeeded in building exercise back up, probably adding 300 cals/day burned, but this isn’t enough. Unlike my original major weight loss I’m having a harder time trying to maintain calorie intake discipline. And my statistics are unclear as to whether I’m making much progress:


The chart above extends the previous one in my first post with this title with the 12 days in August (shown as red markers). Note that my first two days of renewed discipline actually hit recent highs and then weight began to decline. The regression line’s slope (1.43 lbs/wk) may actually overstate the loss, esp. given the usual effect that immediately upon starting weight discipline there is a early big drop. The real bad thing is the high variability, rather than nice steady decline. Now part of this is undoubtedly the poor measurement (I’m not doing multiple weigh-ins and averaging like before so I know each point is probably around ± 1.5lbs. BUT, the good news is the gain has been arrested and at least I’m showing a few points down in the 192-193 range. To the extent I can trust the regression line as a future prediction it looks like I might hit my August target, but even assuming I do that’s still a long way to go to where I need to be, so we’ll see.

Posted in comment | Tagged , , | Leave a comment

Added another blog

I’m mostly creating this post to have a link to my new blog, A Place for Words as it doesn’t appear that Google is indexing it yet. I’m not actually notifying you, Dear Reader, about this as I don’t (necessarily) want my loyal readers to check it out. This new blog represents a programming and knowledge project that won’t be of general interest here.

Posted in comment | Tagged | Leave a comment

Time to get back on the wagon

For the past three months I’ve gone on my fanatical record keeping and weight control and I’ve paid the price. I have collected minimal daily weight (single reading from scale, known to be inaccurate, plus a single (first) weight tends to understate actual average of multiple reading due to some weirdness in scale). But no matter how I look at it the last three months have been a bad trend I now must reverse before it gets worse. So here’s the bad news:


The gap between week 82 and week 86 was the time of my retreat to the Dakotas where I have no data at all. The first data in week 80 was the first week in May and the last data is for first day of August. Piecing this together was a chore since I had multiple flaws in my data collection: a) before my retreat I had no absolute calendar dates in my solid record keeping so it was a lot of deduction to correctly set my scale, and, b) I had no absolute calender dates (just day-of-week) in my sporadic weigh-ins since returned. But I did manage to piece the data together to create continuity between recent infrequent data and earlier (for 81 weeks) solid data. Then I can put this together for the graph above to show the overall consequences.

I was fairly steady, in April and May, before my retreat, right at my target (185lbs). Of course during retreat I had no data at all but I can recall many days with substantial eating, other days with “normal” (not too much, but not diet either). And during retreat, while I did some “real” exercise, my fanatical exercising (nearly 1000 cals/day) definitely dropped off. Since returning my exercise has probably been about 300 cals/day below what I was doing during intense weight control and fitness, which actually adds up to about 3.5lbs/month. And interestingly, that’s most of my weight gain.

IOW, I’ve managed to pack on about 0.9lbs/week in past three months or a total gain of about 12lbs, i.e. about 3lbs/month. It appeared, in late June and early July I was doing OK job of maintenance, but toward end of July a lot of feasting undermined that and so I hit, for several days, my “trigger” weight (195lbs) where I have to get back to discipline and reverse the gains.

Contrary to my intuitive notion I’ve actually do a little better after the retreat than during it as shown below:


But it is a fairly steady trend upwards, although at somewhat lower gain/week, in the last eight weeks. I thought, wrongly, I was doing more or less OK for about 1.5 months, but taking the long view it’s clear I was not at maintenance and my infrequent weigh-ins were deceiving me.

So what do I learn from this:

  1. I must do daily weigh-ins (at least mostly) and pay attention to these. Contrary to a lot of the stupid weight control advice, more data, more weigh-ins is critical.
  2. I have to return to my higher level of exercise, getting those 300 cals/day back into my exercise plan.
  3. And, given I will have some eating days above my target calorie limit I have to compensate with days below, so at least my average is my target (I think my “good” eating days were probably just about my target level, so the “high” days are not compensated for).

The real point though, is I must continue to be moderately fanatical about record keeping so these graphs and analysis drive my disciple. Without all the statistical stuff (including getting back to intense exercise recording and analysis) I just get this drift up that goes largely unnoticed in short time periods but adds up over long intervals. Of course I can also tell this by the tightness of my pants but that takes at least a month to be noticeable. In short it’s the disciple each and every day that matters, not the less frequent data gathering and concern.

So where does this leave me. I gained 12 lbs during this three-month period of inattention, less exercise, and more eating. I need to reverse that. I need to try (possibly won’t succeed) to lose not just the 12 lbs I gained, but actually get a bit lower, so 15 lbs needs to be my target. But my most immediate goal is more like 10lbs. I suspect, if I really get my discipline back, I’ll do better during August than after I’m back down a bit.

So my targets are:

191.0 by Aug 31 (i.e. 5lb loss)

187.0 by Sep 30 (4lb loss)

184.0 by Oct 31 (3lb loss)

These are doable but not easy especially as the next three months is going to have days with over-eating (various dinner parties and eating out, possibly a trip, in late Sep) so my real emphasis needs to be burning calories with exercise. During July I hit a record of “real” biking which also meant treadmill and stationary biking were lower and I need to reverse that. Hopefully I can make most of my progress in next 1.5 months since a possible trip at end of September is going to be some backsliding (definitely less exercise, probably more eating), so to make my October end goal may require aggressive loss during the time after returning from vacation.

But the key point of all this is that I can’t be casual about weight maintenance and have to use most of the same disciple (and record keeping and statistical analysis) I was doing before during my very successful (and huge) weight loss.

One day at a time, with a few “bad” days that have to be compensated for with better-than-average days.

So, Dear Reader, you may get to start seeing all my charts and Nate Silver-ish analysis you probably thought had disappeared. Get ready for boring posts.

Posted in comment | Tagged , , | Leave a comment