Finding And Resurrecting Archie: The Internet’s First Search Engine

Back in the innocent days of the late 1980s the Internet as we know it today did not exist yet, but there were still plenty of FTP servers. Since manually keeping track of all of the files on those FTP server would be a royal pain, [Alan Emtage] set to work in 1986 to create an indexing and search service called Archie to streamline this process. As a local tool, it’d regularly fetch the file listing from FTP servers in a list, making this available for easy local search.

After its initial release in 1990, its feature set was expanded to include a World Wide Web crawler by version 3.5 in 1995. Years later, it was assumed that the source for Archie had been lost. That was until the folk over at [The Serial Port] channel managed to track down a still running Archie server in Poland.

The name Archie comes from the word ‘archive’ with the ‘v’ stripped, with no relation to the Archie comics. Even so, this assumption inspired the Gopher search engines Jughead and Veronica. Of these the former is still around, and Veronica’s original database was lost, but a re-implementation of it is still around. Archie itself enjoyed a period of relative commercial success, with [Alan] starting Bunyip Information Systems in 1992 which lasted until 2003. To experience Archie today, [The Serial Port] has the Archie documentation online, along with a live server if you’re feeling like reclaiming the early Internet.

Continue reading “Finding And Resurrecting Archie: The Internet’s First Search Engine”

The Minimalistic Dillo Web Browser Is Back

Over the decades web browsers have changed from the fairly lightweight and nimble HTML document viewers of the 1990s to today’s top-heavy browsers that struggle to run on a system with less than a quad-core, multi-GHz CPU and gigabytes of RAM. All but a few, that is.

Dillo is one of a small number of browsers that requires only a minimum of system resources and will happily run on an Intel 486 or thereabouts. Sadly, the project more or less ended back in 2016 when the rendering engine’s developer passed away, but with the recent 3.10 release the project seems to be back on track, courtesy of efforts by [Rodrigo Arias Mallo].

Although a number of forks were started after the Dillo project ground to a halt, of these only Dillo+ appears to be active at this point in time, making this project revival a welcome boost, as is its porting to Atari systems. As for Dillo’s feature set, it boasts support for a range of protocols, including Gopher, HTTP(S), Gemini, and FTP via extensions. It supports HTML 4.01 and some HTML 5, along with CSS 2.1 and some CSS 3 features, and of course no JavaScript.

On today’s JS-crazed web this means access can be somewhat limited, but maybe it will promote websites to have a no-JS fallback for the Dillo users. The source code and releases can be obtained from the GitHub project page, with contributions to the project gracefully accepted.

Thanks to [Prof. Dr. Feinfinger] for the tip.

Imperva Report Claims That 50% Of The World Wide Web Is Now Bots

Automation has been a part of the Internet since long before the appearance of the World Wide Web and the first web browsers, but it’s become a significantly larger part of total traffic the past decade. A recent report by cyber security services company Imperva pins the level of automated traffic (‘bots’) at roughly fifty percent of total traffic, with about 32% of all traffic attributed to ‘bad bots’, meaning automated traffic that crawls and scrapes content to e.g. train large language models (LLMs) and generate automated content as well as perform automated attacks on the countless APIs accessible on the internet.

According to Imperva, this is the fifth year of rising ‘bad bot’ traffic, with the 2023 report noting again a few percent increase. Meanwhile ‘good bot’ traffic also keeps increasing year over year, yet while these are not directly nefarious, many of these bots can throw off analytics and of course generate increased costs for especially smaller websites. Most worrisome are the automated attacks by the bad bots, which ranges from account takeover attempts to exploiting vulnerable web-based APIs. It’s not just Imperva who is making these claims, the idea that automated traffic will soon destroy the WWW has floated around since the late 2010s as the ‘Dead Internet theory‘.

Although the idea that the Internet will ‘die’ is probably overblown, the increase in automated traffic makes it increasingly harder to distinguish human-generated content and human commentators from fake content and accounts. This is worrisome due to how much of today’s opinions are formed and reinforced on e.g. ‘social media’ websites, while more and more comments, images and even videos are manipulated or machine-generated.

On Cloud Computing And Learning To Say No

Do you really need that cloud hosting package? If you’re just running a website — no matter whether large or very large — you probably don’t and should settle for basic hosting. This is the point that [Thomas Millar] argues, taking the reader through an example of a big site like Business Insider, and their realistic bandwidth needs.

From a few stories on Business Insider the HTML itself comes down to about 75 kB compressed, so for their approximately 200 million visitors a month they’d churn through 30 TB of bandwidth for the HTML assuming two articles read per visitor.

This comes down to 11 MB/s of HTML, which can be generated dynamically even with slow interpreted languages, or as [Thomas] says would allow for the world’s websites to be hosted on a system featuring single 192 core AMD Zen 5-based server CPU. So what’s the added value here? The reduction in latency and of course increased redundancy from having the site served from 2-3 locations around the globe. Rather than falling in the trap of ‘edge cloud hosting’ and the latency of inter-datacenter calls, databases should be ideally located on the same physical hardware and synchronized between datacenters.

In this scenario [Thomas] also sees no need for Docker, scaling solutions and virtualization, massively cutting down on costs and complexity. For those among us who run large websites (in the cloud or not), do you agree or disagree with this notion? Feel free to touch off in the comments.

The Most Annoying Thing On The Internet Isn’t Really Necessary

We’re sure you’ll agree that there are many annoying things on the Web. Which of them we rate as most annoying depends on personal view, but we’re guessing that quite a few of you will join us in naming the ubiquitous cookie pop-up at the top of the list. It’s the pesky EU demanding consent for tracking cookies, we’re told, nothing to do with whoever is demanding you click through screens and screens of slider switches to turn everything off before you can view their website.

Now [Bite Code] is here to remind us that it’s not necessary. Not in America for the somewhat obvious reason that it’s not part of the EU, and perhaps surprisingly, not even in the EU itself.

The EU does have a consent requirement, but the point made in the article is that its requirements are satisfied by the Do Not Track header standard, an HTTP feature that’s been with us since 2009 but which almost nobody implemented so is now deprecated. This allowed a user to reject tracking at the browser level, making all the cookie popups irrelevant. That popups were chosen instead, the article concludes, is due to large websites preferring to make the process annoying enough that users simply click on the consent button to make it go away, making tracking much more likely. We suspect that the plethora of cookie popups also has something to do with FUD among owners of smaller websites, that somehow they don’t comply with the law if they don’t have one.

So as we’d probably all agree, the tracking cookie situation is a mess. This post is being written of Firefox which now silos cookies to only the site which delivered them, but there seems to be little for the average user stuck with either of the big browsers. Perhaps we should all hope for a bit more competition in the future.

Cookies header: Lisa Fotios, CC0.

Internet Radio Built In Charming Cassette-Like Form Factor

You can listen to plenty of broadcast radio these days. There’s a lot of choice too, with stations on AM, FM, and digital broadcasts to boot. However, if you want the broadest possible choice, you want an internet radio. If that’s your bag, why not build a fun one like [indoorgeek’s] latest design?

The build is based around a PCB and 3D-printed components that roughly ape the design of a cassette tape. It even replicates the typical center window of a cassette tape by using a transparent OLED screen, which displays the user interface. In a neat way, the graphics on the display are designed to line up with those on the PCB, which looks excellent.

An ESP32 is the heart of the operation, which is responsible for streaming audio over the Internet via its WiFi connection. It’s powered by a small lithium-polymer battery, and hooked up with a MAX98357 Class D amplifier driven via the chip’s I2S hardware. Audio is played out over a small speaker salvaged from an old smartphone.

While it’s obviously possible to play whatever you like on a smartphone these days, sometimes it’s fun to have simple devices that just do a single job. Plus, we can’t deny this project looks really neat. Video after the break.

Continue reading “Internet Radio Built In Charming Cassette-Like Form Factor”

The Gopher Revival Is Upon Us

A maxim for anyone writing a web page in the mid 1990s was that it was good practice to bring the whole thing (including graphics) in at around 30 kB in size. It was a time when the protocol still had some pretence of efficient information delivery, when information was self-published, before huge corporations brought everything under their umbrellas.

Recently, this idea of the small web has been experiencing something of a quiet comeback. [Serge Zaitsev]’s essay takes us back to a time before the Internet as we know it was born, and reminds us of a few protocols that have fallen by the wayside. Finger or Gopher, both things we remember from our student days, but neither of which was a match for the browser.

All is not lost though, because the Gemini protocol is a more modern take on minimalist Internet information sharing. It’s something like the web, but intentionally without the layer upon layer of extraneous stuff, and it’s been slowly gathering some steam. Every time we look at its software list it becomes more extensive, and we live in hope that it might catch on for use with internet-connected microcontroller-based computing. The essay is a reminder that the internet doesn’t have to be the web, and doesn’t have to be bloated either.