How Search Engines Enabled Finding Needles In A WWW-Sized Haystack

May 19, 2026 by Maya Posch 19 Comments

When the World Wide Web surged into existence during the 1990s, we were introduced to the problem of how to actually find something in this ever-ballooning construction zone that easily outpaced even the fastest post-WW2 urban sprawl. Although domain names provided a way to find servers using DNS rather than having to mash in IP addresses, you still somehow had to know the relevant URL.

A range of solutions were thought up over time, ranging from printed Yellow Pages type guides, to online curated lists of resources, as well as things like web rings where one website would link to a relevant similar website. This was the time when word-of-mouth was also very relevant, with people proudly announcing their own website on Geocities or other hosting service.

Search engines already existed long before the WWW became the hot new thing during the 1990s, but it was the WWW that would really push them to their limits. As anyone who used search engines for the WWW can attest, they had many issues. Often you’d end up using multiple search engines to find something, and despite fierce competition between web search engines to become the starting page for their browser, actually finding things on the WWW remained a tough problem.

Since a web search engine ‘just’ has to index the WWW and match a search query against the results, why was this such a hard problem that persisted until Google apparently cracked the code?

Continue reading “How Search Engines Enabled Finding Needles In A WWW-Sized Haystack” →

Hackaday Links: May 3, 2026

May 3, 2026 by Tom Nardi 3 Comments

Software that collects public data from the Internet and uses it to provide half-assed answers to your questions might seem like a modern craze, but today we bid farewell to a website that helped pioneer pretend conversations all the way back in 1997 — as of May 1st, Ask Jeeves is no more.

Well, technically they dropped the “Jeeves” part back in 2006. Since then it’s just been Ask.com, but as the name implies the idea was more or less the same. Rather than the relatively rigid parameters and keywords required by traditional search engines, you could ask Jeeves questions about the world using natural language. Early advertisements showed the virtual valet answering arbitrary questions like “How many calories in a banana?,” which of course today seems commonplace and utterly unimpressive, but was a pretty wild for the 1990s.

It might seem surprising that a site designed from day one to offer a human-like Q&A experience should fold right as such technology is becoming commonplace. But of course, that commonality is the problem. When Google can answer your questions just as well (or poorly…) as Jeeves or anyone else, what’s the benefit for the average Internet user to seek out another service? But it’s still somewhat ironic, which is probably why the farewell message on Ask.com ends with the line “Jeeves’ spirit endures.”

Continue reading “Hackaday Links: May 3, 2026” →

The most exciting search engine 68k can handle.

There’s Nothing Boring About Web Search On Retro Amigas

October 31, 2025 by Tyler August 23 Comments

Do you have a classic Amiga computer? Do you want to search the web with iBrowse, but keep running into all that pesky modern HTML5 and HTTPS? In that case, [Nihirash] created BoingSearch.com just for you!

BoingSearch was explicitly inspired by [ActionRetro]’s FrogFind search portal, and works similarly in practice. From an end-user perspective, they’re quite similar: both serve as search engines and strip down the websites listed by the search to pure HTML so old browsers can handle it.

Boing search in its natural habitat, iBrowse on Amiga.

The biggest difference we can see betwixt the two is that FrogFind will link to images while BoingSearch either loads them inline or strips them out entirely, depending on the browser you test with and how the page was formatted to begin with. (Ironically, modern Firefox doesn’t get images from BoingSearch’s page simplifier.) BoingSearch also gives you the option of searching with DuckDuckGo or Google via the SerpAPI, though note that poor [Nihirash] is paying out-of-pocket for google searches.

BoingSearch is explicitly aimed at the iBrowse browser for late-stage Amigas, but should work equally well with any modern browser. Apparently this project only exists because FrogFind went down for a week, and without the distraction of retrocomptuer websurfing, [Nihirash] was able to bash out his own version from scratch in Rust. If you want to self-host or see how they did it, [Nihirash] put the code on GitHub under a donationware license.

If you’re scratching your head why on earth people are still going on about Amiga in 2025, here’s one take on it.

Keeping Track Of Old Computer Manuals With The Manx Catalog

December 24, 2024 by Maya Posch 4 Comments

An unfortunate reality of pre-1990s computer systems is that any manuals and documentation that came with them likely only existed on paper. That’s not to say there aren’t scanned-in (PDF) copies of those documents floating around, but with few of these scans being indexable by search engines like Google and Duck Duck Go, they can be rather tricky to find. That’s where the Manx catalog website seeks to make life easier. According to its stats, it knows about 22,060 manuals (9,992 online) across 61 websites, with a focus on minicomputers and mainframes.

The code behind Manx is GPL 2.0 licensed and available on GitHub, which is where any issues can be filed too. While not a new project by any stretch of the imagination, it’s yet another useful tool to find a non-OCR-ed scan of the programming or user manual for an obscure system. As noted in a recent Hacker News thread, the ‘online’ part of the above listed statistics means that for manuals where no online copy is known, you get a placeholder message. Using the Bitsavers website along with Archive.org may still be the most pertinent way to hunt down that elusive manual, with the Manx website recommending 1000bit for microcomputer manuals.

Have you used the Manx catalog, or any of the other archiving websites? What have been your experiences with them? Let us know in the comments.

Finding And Resurrecting Archie: The Internet’s First Search Engine

May 20, 2024 by Maya Posch 11 Comments

Back in the innocent days of the late 1980s the Internet as we know it today did not exist yet, but there were still plenty of FTP servers. Since manually keeping track of all of the files on those FTP server would be a royal pain, [Alan Emtage] set to work in 1986 to create an indexing and search service called Archie to streamline this process. As a local tool, it’d regularly fetch the file listing from FTP servers in a list, making this available for easy local search.

After its initial release in 1990, its feature set was expanded to include a World Wide Web crawler by version 3.5 in 1995. Years later, it was assumed that the source for Archie had been lost. That was until the folk over at [The Serial Port] channel managed to track down a still running Archie server in Poland.

The name Archie comes from the word ‘archive’ with the ‘v’ stripped, with no relation to the Archie comics. Even so, this assumption inspired the Gopher search engines Jughead and Veronica. Of these the former is still around, and Veronica’s original database was lost, but a re-implementation of it is still around. Archie itself enjoyed a period of relative commercial success, with [Alan] starting Bunyip Information Systems in 1992 which lasted until 2003. To experience Archie today, [The Serial Port] has the Archie documentation online, along with a live server if you’re feeling like reclaiming the early Internet.

Continue reading “Finding And Resurrecting Archie: The Internet’s First Search Engine” →

The First Search Engines, Built By Librarians

June 11, 2023 by Bryan Cockfield 14 Comments

Before the Internet became the advertisement generator we know and love today, interspersed with interesting information here and there, it was originally a network of computers largely among various universities. This was even before the world-wide web and HTML which means that the people using these proto-networks, mostly researchers and other academics, had to build things we might take for granted from the ground up. One of those was one of the first search engines, built by the librarians who were cataloging all of the research in their universities, and using their relatively primitive computer networks to store and retrieve all of this information.

This search engine was called SUPARS, the Syracuse University Psychological Abstracts Retrieval Service. It was originally built for psychology research papers, and perhaps unsurprisingly the psychologists at the university also used this new system as the basis for understanding how humans would interact with computers. This was the 1970s after all, and most people had never used a computer, so documenting how they used search engine led to some important breakthroughs in the way we think about the best ways of designing systems like these.

The search engine was technically revolutionary for the time as well. It was among the first to allow text to be searched within documents and saved previous searches for users and researchers to access and learn from. The experiment was driven by the need to support researchers in a future where reference librarians would need assistance dealing with more and more information in their libraries, and it highlighted the challenges of vocabulary control in free-text searching.

The visionaries behind SUPARS recognized the changing landscape of research and designed for the future that would rely on networked computer systems. Their contributions expanded the understanding of how technology could shape human communication and effectiveness, and while they might not have imagined the world we are currently in, they certainly paved the way for the advances that led to its widespread adoption even outside a university setting. There were some false starts along that path, though.

Love AI, But Don’t Love It Too Much

December 7, 2022 by Jenny List 51 Comments

The up-and-coming Wonder of the World in software and information circles , and particularly in those circles who talk about them, is AI. Give a magic machine a lot of stuff, ask it a question, and it will give you a meaningful and useful answer. It will create art, write books, compose music, and generally Change The World As We Know It. All this is genuinely impressive stuff, as anyone who has played with DALL-E will tell you. But it’s important to think about what the technology can and can’t do that’s new so as to not become caught up in the hype, and in doing that I’m immediately drawn to a previous career of mine. Continue reading “Love AI, But Don’t Love It Too Much” →