Footnote on my ‘fortuity’ post

On a  related notion I have another, but highly unrelated, example of fortuity.

I love my Kindle (the Paperwhite, don’t like the Fire especially camping where I can’t recharge it, long battery lifetime is a critical feature for my needs). It’s perfect for long trips, or other occasions, where I need lots of reading material. Instead of hauling around many pounds of books, mostly not what I want, I can carry one lightweight device with enough books I’ll have something I want to read (even if it is re-reading an old one).

So thank your fortuity (instead of saying ‘god’) for inspiring Amazon to invent the Kindle.

But that’s not the fortuity I’m now writing about. I also love Kindle (and spend too much money) for being able to download samples. So before leaving on camping trip I downloaded a lot of samples. Fine, but what do I do in a comms deadzone if I want the book?

So, just for that purpose, I drove the few miles back to a tower (I’m getting good at watching for these since unlike civilized areas they are scarce out in nowhere) and bought and downloaded the book. I was doing something else at the time but slightly noticed the Kindle acting strange (rebooted itself a couple of times). I didn’t pay any attention until I got back to camp and found a non-requested item which was the letter explaining my Kindle had had its software updated including a new feature, which is what I’ll get to in a moment, the gift of fortuity.

Now one other feature I like about Kindle is being able to touch a word and have its dictionary definition appear. Sometimes I sorta understand the word, but maybe just barely, or I don’t see how it fits in the context of what I’m reading. Having easy access (now a popup) to the definition is a big help and another nice feature of an eBook (making comments in another but I’m still waiting for a good tool to use that feature more, the raw feature being not enough for my needs).

So I realized that instantly looking up a word is helpful, but simultaneously I will have forgotten what I learned in just a short time. So I’ve developed the practice of highlighting the words. And another wonderful feature of Kindle (compared to dictator Steve Jobs’ design of iPod with implications to Amazon’s Kindle app, i.e. no file system or access to anything stored on iPod except through iTunes (unless jail-broken which I don’t want to do) is that Kindle is an ordinary OS and when you connect it to computer via USB it just looks like a disk and Windows can navigate the file system and you can find a plain text file which is all your highlights and thus simply drag-and-drop that file to your computer.

I earlier developed a simple app, not based on the harder job of actually reverse engineering and parsing that file, so instead I just wrote simple C# program to create a lexicon of all the words in that file. But this is a bit tedious (to do repeatedly and update a master file) because I then have to manually filter all the stuff in the lexicon and just keep the terms I want.

Now what is my point?

Way back when I went through a long test prep study for the SATs. You can definitely improve your score if you can expand your vocabulary. So I learned a ton of words that I (mostly) have subsequently forgotten. Plus there are words I sorta know but almost never use. So, at minimum, I’d like to have a list of words, with tags (so I can filter) for at least three attributes:

  1. words I sorta know but not very accurately, IOW, I can probably understand, from my limited memory, the word in the context it occurs
  2. words I know, but that I don’t use, esp. while writing because they’re so specific (which makes them good words) I’d rarely need them
  3. words I don’t know but once I see the definition look like good words to know

Now there are some more types of words (and my types overlap) which is why I’d like tagging, so a word can be on multiple lists via a filter. I’d also like to tag words as: a) specialized vocabulary, i.e. molecular biology or software architecture or other science, not likely to be words I’d use for general writing nor even bother to try to memorize, but would like to easily access (my own private dictionary), b) words I find humorous or interesting, but again probably not to memorize and use in writing, c) words I doubt I’d ever use but still would like to have a list of, and so forth.

So I need a tool. Reading lots of Kindle books gets me the raw data, but I need to tool to: a) consolidate and unduplicate the list (possibly recording the context and/or source), and, b) some sort of organizing features, esp. for drilling. The only real way to learn vocabulary is through practice, not memorization. But relatively rare (thus college rather than 8th grade vocabulary of most writing) words that I might only use once a year are hard to remember (unless I’m cramming for SAT, which, btw, worked for me as I’m sure my scores were a bit higher if I hadn’t done the prep, and, btw, I got 800 (the 800 on math was really easy), just to brag a little).

So, thanks to fortuity, Amazon added an app, unexpectedly, to my Kindle called vocabulary builder. Now feature-wise this isn’t what I want for a tool, BUT, it is a good tool to collect the words (not very good at stemming, however, it’s completely literally, so I do get “dups”) and it does record the context.

I haven’t tried to find what file it saves all that in and/or (assuming I can find it) reverse-engineer Amazon’s format so I can use that as input for my own program, which won’t be hard for me to write. But it’s a good start, esp. as Vocabulary Builder sorts the words by order of marking them, i.e. the more recently marked words first in the list, so even if I have to retype them into my program that won’t be too difficult to periodically update my master list.

Now, the only part of this, when I’ve done something like this in the past, is actually fine-tuning the drill algorithm.

I’ve done my own drill program before, at the cost of tedious extraction of the vocabulary, once for Italian words generally used in cooking (and certainly on menus) and Spanish words. I’m not good at languages so while I’d like to learn either Spanish or Italian I know that is not going to happen, but as I do like cooking (and eating) mastering food terms is useful.

There are a couple of challenges to the drill program. Obviously it’s easy to keep a cumulative score of my right/wrong answers and thus rate a word by the probability I’ve somewhere in the past known it and thus drill the words I know least well most often. But that’s not enough. It’s also easy to detect a few types of patterns, i.e. random scores maybe right, maybe wrong, but then a string of right answers largely because the drill program, using the first metric, showed these to me more often and if I spent several days studying my short-term memory improved, but not long term, so seeing a word again that I just got right an hour ago is a waste. And I need some sense of time, i.e. my score should decrease as a function of how long long I last was getting the word right, since I tend to do this kind of thing in spurts and then not use it for a long time and then come back, so the old scores from last burst of study aren’t very relevant.

But getting a good scoring algorithm (i.e. the probably a word should be on my “flashcard”) is tricky. I’ve done better than purely random picking (the problem with simple flashcards like the Kindle Vocabulary Builder), but never gotten a very good algorithm.

I researched, although not thoroughly, some online tools for this problem and those aren’t even as good as the ones I’ve coded for myself, but have the virtual of being easier to create, i.e. just sign up and use, don’t need to write code or compile lists.

So, while I’m blog on this subject again in a while, this app, combined with the new Kindle feature is on my mind (and things like this I do intensely for a while, then get bored, then much later want to do again). And since I’m about to have a ton of excess free time with nothing to do, this goes on my project list to fill time, both to keep in practice writing code (I hate you Microsoft for obsoleting all my developers tools and I hate the terrible public domain tools, if you can even call a text editor and PHP (or Python or whatever) a tool)

So done with disgorging this bunch of thoughts on my mind and using my door indoors to be online and writing.

p.s. My ideal tool would be to do this for all sources, not just Kindle, but that’s a pain. And most importantly, I’d like to integrate the tool with blogging, so I can, as a goal, use at least one of the words in by to-learn lists in every post so I can upgrade my own writing above 8th grade vocabulary (too bad I can’t conceive of a tool to do something about the quality of my writing, esp. conciseness).


About dmill96

old fat (but now getting trim and fit) guy, who used to create software in Silicon Valley (almost before it was called that), who used to go backpacking and bicycling and cross-country skiing and now geodashes, drives AWD in Wyoming, takes pictures, and writes long blog posts and does xizquvjyk.
This entry was posted in comment and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s