The First Search Engines, Built By Librarians

Before the Internet became the advertisement generator we know and love today, interspersed with interesting information here and there, it was originally a network of computers largely among various universities. This was even before the world-wide web and HTML which means that the people using these proto-networks, mostly researchers and other academics, had to build things we might take for granted from the ground up. One of those was one of the first search engines, built by the librarians who were cataloging all of the research in their universities, and using their relatively primitive computer networks to store and retrieve all of this information.

This search engine was called SUPARS, the Syracuse University Psychological Abstracts Retrieval Service. It was originally built for psychology research papers, and perhaps unsurprisingly the psychologists at the university also used this new system as the basis for understanding how humans would interact with computers. This was the 1970s after all, and most people had never used a computer, so documenting how they used search engine led to some important breakthroughs in the way we think about the best ways of designing systems like these.

The search engine was technically revolutionary for the time as well. It was among the first to allow text to be searched within documents and saved previous searches for users and researchers to access and learn from. The experiment was driven by the need to support researchers in a future where reference librarians would need assistance dealing with more and more information in their libraries, and it highlighted the challenges of vocabulary control in free-text searching.

The visionaries behind SUPARS recognized the changing landscape of research and designed for the future that would rely on networked computer systems. Their contributions expanded the understanding of how technology could shape human communication and effectiveness, and while they might not have imagined the world we are currently in, they certainly paved the way for the advances that led to its widespread adoption even outside a university setting. There were some false starts along that path, though.

Simultaneous Invention, All The Time?

As Tom quipped on the podcast this week, if you have an idea for a program you’d like to write, all you have to do is look around on GitHub and you’ll find it already coded up for you. (Or StackOverflow, or…) And that’s probably pretty close to true, at least for really trivial bits of code. But it hasn’t always been thus.

I was in college in the mid 90s, and we had a lab of networked workstations that the physics majors could use. That’s where I learned Unix, and where I had the idea for the simplest program ever. It took the background screen color, in the days before wallpapers, and slowly random-walked it around in RGB space. This was set to be slow enough that anyone watching it intently wouldn’t notice, but fast enough that others occasionally walking by my terminal would see a different color every time. I assure you, dear reader, this was the very height of wit at the time.

With the late 90s came the World Wide Web and the search engine, and the world got a lot smaller. For some reason, I was looking for how to set the X terminal background color again, this time searching the Internet instead of reading up in a reference book, and I stumbled on someone who wrote nearly exactly the same random-walk background color changer. My jaw dropped! I had found my long-lost identical twin brother! Of course, I e-mailed him to let him know. He was stoked, and we shot a couple funny e-mails back and forth riffing on the bizarre coincidence, and that was that.

Can you imagine this taking place today? It’s almost boringly obvious that if you search hard enough you’ll find another monkey on another typewriter writing exactly the same sentence as you. It doesn’t even bear mentioning. Heck, that’s the fundamental principle behind Codex / CoPilot – the code that you want to write has been already written so many times that it will emerge as the most statistically likely response from a giant pattern-matching, word-word completion neural net model.

Indeed, stop me if you’ve read this before.

Not On The Internet

Whenever you need to know something, you just look it up on the Internet, right? Using the search engine of your choice, you type in a couple keywords, hit enter, and you’re set. Any datasheet, any protocol specification, any obscure runtime error, any time. Heck, you can most often find some sample code implementing whatever it is you’re looking for. In a minute or so.

It is so truly easy to find everything technical that I take it entirely for granted. In fact, I had entirely forgotten that we live in a hacker’s utopia until a couple nights ago, when it happened again: I wanted to find something that isn’t on the Internet. Now, to be fair, it’s probably out there and I just need to dig a little deeper, but the shock of not instantly finding the answer to a random esoteric question reminded me how lucky we actually are 99.99% of the time when we do find the answer straight away.

So great job, global hive-mind of über-nerds! This was one of the founding dreams of the Internet, that all information would be available to everyone anywhere, and it’s essentially working. Never mind that we can stream movies or have telcos with people on the other side of the globe – when I want a Python library for decoding Kansas City Standard audio data, it’s at my fingertips. Detailed SCSI specifications? Check.

But what was my search, you ask? Kristina and I were talking about Teddy Ruxpin, and I thought that the specification for the servo track on the tape would certainly have been reverse engineered and well documented. And I’m still sure it is – I was just shocked that I couldn’t instantly find it. The last time this happened to me, it was the datasheet for the chips that make up a Speak & Spell, and it turned out that I just needed to dig a lot harder. So I haven’t given up hope yet.

And deep down, I’m a little bit happy that I found a hole in the Internet. It gives Kristina and me an excuse to reverse engineer the format ourselves. Sometimes ignorance is bliss. But for the rest of those times, when I really want the answer to a niche tech question, thanks everyone!

Linux Fu: Roll With The Checksums

We are often struck by how often we spend time trying to optimize something when we would be better off just picking a better algorithm. There is the old story about the mathematician Gauss who, when in school, was given busy work to add the integers from 1 to 100. While the other students laboriously added each number, Gauss realized that 100+1 is 101 and 99 + 2 is also 101. Guess what 98 + 3 is? Of course, 101. So you can easily find that there are 50 pairs that add up to 101 and know the answer is 5,050. No matter how fast you can add, you aren’t likely to beat someone who knows that algorithm. So here’s a question: You have a large body of text and you want to search for it. What’s the best way?

Continue reading “Linux Fu: Roll With The Checksums”

Hackaday Links Column Banner

Hackaday Links: January 5, 2020

It looks like the third decade of the 21st century is off to a bit of a weird start, at least in the middle of the United States. There, for the past several weeks, mysterious squads of multicopters have taken to the night sky for reasons unknown. Witnesses on the ground report seeing both solo aircraft and packs of them, mostly just hovering in the night sky. In mid-December when the nightly airshow started, the drones seemed to be moving in a grid-search pattern, but that seems to have changed since then. These are not racing drones, nor are they DJI Mavics; witnesses report them to be 6′ (2 meters) in diameter and capable of staying aloft for 90 minutes. These are serious professional machines, not kiddies on a lark. So far, none of the usual government entities have taken responsibility for the flights, so speculation is all anyone has as to their nature. We’d like to imagine someone from our community will get out there with radio direction finding gear to locate the operators and get some answers.

We all know that water and electricity don’t mix terribly well, but thanks to the seminal work of White, Pinkman et al (2009), we also know that magnets and hard drives are a bad combination. But that didn’t stop Luigo Rizzo from using a magnet to recover data from a hard drive. He reports that the SATA drive had been in continuous use for more than 11 years when it failed to recover after a power outage. The spindle would turn but the heads wouldn’t move, despite several rounds of percussive maintenance. Reasoning that the moving coil head mechanism might need a magnetic jump-start, he probed the hard drive case with a magnetic parts holder until the head started moving again. He was then able to recover the data and retire the drive. Seems like a great tip to file away for a bad day.

It seems like we’re getting closer to a Star Trek future every day. No, we probably won’t get warp drives or transporters anytime soon, and if we’re lucky velour tunics and Spandex unitards won’t be making a fashion statement either. But we may get something like Dr. McCoy’s medical scanner thanks to work out of MIT using lasers to conduct a non-contact medical ultrasound study. Ultrasound exams usually require a transducer to send sound waves into the body and pick up the echoes from different structures, with the sound coupled to the body through an impedance-matching gel. The non-contact method uses pulsed IR lasers to penetrate the skin and interact with blood vessels. The pulses rapidly heat and expand the blood vessels, effectively turning them into ultrasonic transducers. The sound waves bounce off of other structures and head back to the surface, where they cause vibrations that can be detected by a second laser that’s essentially a sophisticated motion sensor. There’s still plenty of work to do to refine the technique, but it’s an exciting development in medical imaging.

And finally, it may actually be that the future is less Star Trek more WALL-E in the unlikely event that Segway’s new S-Pod personal vehicle becomes popular. The two-wheel self-balancing personal mobility device is somewhat like a sitting Segway, except that instead of leaning to steer it, the operator uses a joystick. Said to be inspired by the decidedly not Tyrannosaurus rex-proof “Gyrosphere” from Jurassic World, the vehicle tops out at 24 miles per hour (39 km/h). We’re not sure what potential market for these things would need performance like that – it seems a bit fast for the getting around the supermarket and a bit slow for keeping up with city traffic. So it’s a little puzzling, although it’s clearly easier to fully automate than a stand-up Segway.

Online Chip Reference Trims The Fat

partsdb

Quick: which pins are used for I2C on an ATmega168 microcontroller?

If you’re a true alpha geek you probably already know the answer. For the rest of us, ChipDB is the greatest thing since the resistor color code cheat sheet. It’s an online database of component pinouts: common Atmel microcontrollers, the peripheral ICs sold by SparkFun, and most of the 4000, 7400 and LMxxx series parts.

The streamlined interface, reminiscent of Google, returns just the essential information much quicker than rummaging through PDF datasheets (which can also be downloaded there if you need them). And the output, being based on simple text and CSS, renders quite well on any device, even a dinky smartphone screen.

Site developer [Matt Sarnoff] summarizes and calls upon the hacking community to help expand the database:

“The goal of my site isn’t to be some comprehensive database like Octopart; just a quick reference for the chips most commonly used by hobbyists. However, entries still have to be copied in manually. If anyone’s interested in adding their favorite chips, they can request a free account and use the (very primitive at this point) part editor. Submissions are currently moderated, since this is an alpha-stage project.”

Homeland Security Issues Policy On Laptop Seizures


The US Department of Homeland Security recently disclosed a new policy that allows agents to seize laptops, or anything capable of storing information, “for a reasonable period of time”. Okay, so this seems normal; A government agency is declaring they may confiscate personal property. However, the strange part of this story is that under this policy, federal agents can confiscate these things without any suspicion of wrong doing or any reason what so ever. So what happens to your personal data after they seize your laptop? Apparently they share the data with federal agencies, and in some cases the private sector, as additional services such as file decryption or translation are needed. While this may seem like a major violation of privacy, it is important to note that this policy only applies to people entering the United States. However given the direction that our federal government is moving in the area of security, it wouldn’t surprise me if this policy will soon apply for domestic flights as well.

[photo: postmodern sleaze]

[via eff.org]