Debased Data
Last weekend, before going to the library, I used its catalog to find audiobooks that would be on the shelves. I made a short list (5 books) of what I was interested in—but all but one weren’t there. After some help from the librarians, I found out why.
Seems my search of the library’s database (the catalog) wasn’t accurate because every time I modified the search, it reset some of the search parameters. All the time I thought I was narrowing my search, the system was actually broadening it.
Why do I think this is worth a post? The reason is simple: the library catalog is only one of many databases I regularly use that rarely surrenders its information without a fight. Extracting data from databases is becoming more and more difficult. Why?
To provide some perspective, I constantly battle with both Amazon’s database and the MyHeathlEVet database for Veterans. (I’ve given up on the Library of Congress Talking Books database. It takes the prize for the worst I’ve ever seen.)
I’m not new to databases. Back in the 60s, I wrote a natural language retrieval package for the Standard & Poor’s database. In the 80s, I wrote a video store system in the dBbase language.
So I know a good database system when I see one. I use the excellent CommissaryRewards.gov site once or twice a month. You clip coupons online, which are accessed with their card at checkout. It’s from the Defense Commissary Agency (DeCA).
The library retrieval is so lame, you can’t sort your results by author. Of course, the library arranges its books on the shelves by author. Imagine not being able to look for fiction by author!
Yet, as lame as that is, it’s good compared to the government’s MyHealthEVet database. This monster has no clue how to update, and its output looks like it came from the 50s.
I could go on for hours about these problems, but I’m sure you’ve seen your share. I’m not writing this to bitch or point fingers. The question I’m asking is, how did things get this bad?
In the 50s and 60s, we were still learning how to efficiently create and search databases. Techniques kept improving over the decades, but not as fast as the hardware. Now the hardware is so fast, no one remembers the improved methods we learned.
It looks to me as if programmers now assume hardware speed and massive storage will solve all their problems. It’s like they’ve upgraded to a Ferrari, but removed the steering wheel.