Ask Hackaday: Is Bigger (E-mail) Better?

While pundits routinely predict the end of e-mail, we still get a ton of it and we bet you do too. E-mail has been around for a very long time and back in the day, it was pretty high-tech to be able to shoot off a note asking everyone where they wanted to go to lunch. What we had on our computers back then was a lot different, too. Consider that the first e-mail over ARPANET was in 1971. Back then some people had hardcopy terminals. Graphics were unusual and your main storage was probably a fraction of the smallest flash drive you currently have on your desk. No one was sending photographs, videos, or giant PDF files.

Today, things are different. Our computers have gigabytes of RAM and terabytes of storage. We produce and consume richly formatted documents, photographs at high resolutions, and even video. Naturally, we want to share those files with others, yet e-mail has turned up woefully short. Sure, some systems will offer to stash your large file in the cloud and send a link, but e-mailing a multi-megabyte video to your friend across town is more likely to simply fail. Why?

The Life of an E-mail

As you might expect, some of this is due to historical artifacts. The system wasn’t made to handle your latest vacation video. More importantly, e-mail is meant to be widely interoperable over a wide variety of systems. Sure, these days, it is unlikely any of your e-mails travel over a twisted pair linking two far-flung outposts, but it could.

To understand the genesis of the limits, you have to know about how an e-mail lives its life. It is a lot more complicated than you might expect. Today, mail travels using a protocol called SMTP — Simple Mail Transfer Protocol. Let’s take the simple cases first.

Let’s assume that we have a Hackaday.com e-mail address and that we run our own e-mail servers (we don’t, but go with me for a minute). A user connects to the SMTP server and sends e-mail to another address at Hackaday.com. That’s easy. The server recognizes that it is the final destination for the e-mail, so it accepts it and delivers it. What does deliver mean? That depends, but it usually means putting in some file that the user will check with a mail program or with another kind of server like POP3 or IMAP. But for our purposes, it means the mail is put somewhere that the user can find it.

A slightly more difficult case is when someone from, say, wd5gnr.com wants to send e-mail to someone at Hackaday (again, assuming that domain is handling its own e-mail which isn’t as common as it used to be). In this case, the user connects to their own SMTP server. It realizes that @hackaday.com is an alien e-mail address (maybe in more ways than one) and looks up the domain in the DNS directory. One of the records in DNS (the MX record) is the domain’s mail server address. The SMTP server on wd5gnr.com will then send the mail to hackaday.com which will actually do the delivery.

And Then…?

That’s all there is to it, right? Not exactly. These days, that DNS entry is more than likely pointing to some service like a Gmail server or some other hosting company that will actually accept the e-mail. Probably the sender is attaching to some vendor’s server, too. So maybe the sender attaches to Zoho’s SMTP server which then connects to the Gmail server to deliver the mail.

Even more complicated is when you get into private networks. If the wd5gnr network is behind a firewall, it may not be possible to directly contact the Hackaday server regardless of where it is. What then? An SMTP server can accept mail on one interface and then relay it to another server. This could go on for some time. For example, the wd5gnr server might relay everything to Gmail which then finds the main Hackaday server which further sends it to the internal mail server for the part of the world that particular user is in.

This may sound farfetched, but it happens all the time. For example, you might be on a ship at sea with intermittent connection to a satellite link. A server might collect e-mail and wait for a connection to the satellite. Then it sends the mail to a server via the satellite link that will take care of putting it out to the real network.

You can see why e-mail has been so ripe for exploitation. Early servers would allow anyone to connect and might even attempt to relay mail for anyone. Spam changed all that, of course, and now an open relay is a rarity. Some SMTP servers go as far as to reject mail that comes through known open servers. This is controversial in some circles as a form of censorship, but the majority of servers now will require you to authenticate and will only accept e-mail destined for certain domains without authentication. There’s also a rise of servers that check an SPF record to ensure that a server is known to handle mail for a particular domain. DKIM goes a step further and uses a cryptographic signature.

The Weight of History

So the fact that mail rarely, these days, leaves the network or flows through an open relay, the point is that it can. Why does that matter? Let’s go back to the ship at sea problem. The server for the ship may have to encode our e-mails into some wacky format or compress it. The satellite ground station might only send e-mail via UUCP which has further processing to do. So large e-mails are more than just a network bandwidth issue. The server may have to process and convert files.  SMTP, therefore, can require 7-bit ASCII which is great for text. But for anything fancier, the message (including attachments) gets encoded using something like base 64 encoding as specified by MIME (Multipurpose Internet Mail Extensions). This also means that your attachment is likely to get even larger after encoding.

Even a fully connected host may want to process e-mail messages. For example, it is common to check for spam or virus payloads. So even with infinite network bandwidth, there is still a processing overhead associated with large attachments.

Granted, processing power is not in short supply these days, either, but some mail servers may not be very state-of-the-art, while others may be getting hit by millions of e-mails, many of which are spam.

Ask Hackaday

How do you send large files?  Should e-mail take bigger files? Do you think e-mail is on the way out? Replaced by what? How could it be easier? There are services like WeTransfer and SendAnywhere, but honestly, we are more likely to just upload a file to one of our servers and send a link that way. Then there’s always the cloud storage like Dropbox, Drive, or iCloud.

We’ve come a long way since carrier pigeons. Then again, maybe your mail is going over that network, too. If you want to dig even deeper into SMTP, check out the video from Iowa State University, below.

30 thoughts on “Ask Hackaday: Is Bigger (E-mail) Better?

  1. I had dialup from 1994 (BBS’s, and 1200 baud modem) to 2012 (ISP that included a shell account, 56K modem) and until 2001, Lynx, the text browser was what I used. Until 2012 I generally used Lynx, no wasting time/bandwidth on all the junk, thiugh I did use a graphic browser as necessary.

    Not only were webpages bloated, but so was email. With dialup, I had to wait if there was large email, even if I was just checking email. I remember groups sending webpages, rather than poi ting to the webpage with an email link. Nobody was thinking in terms of the user. Even thiugh abprime considerstion was “getting the message out”.

    So high speed internet since Oct 2012, and I pretty much use a graphic browser, since I use the tablet most of the time. But a lot of waste, and pages don’t load immediately.

    There’s very little reason for larger email. Stash the file somewhere and point to it. Then you can deal with it when you want

    1. While I agree with your conclusion, your personal example raises questions: Why didn’t you have broadband internet before 2012? Was your neighborhood not served by broadband?

      1. Maybe it was needs-related… there wasn’t the “need” to have broadband.

        And even then, what defines “broadband” is regional. The 25Mbps link I have now is considered low-end broadband, in other parts of the world, it might as well be dial-up. It serves me just fine though.

        1. in 2012 I think anything not dialup would be considered broadband, in 06 there was a service in our area that did around 1mb over radio links (clearwire) and since no one else would serve our house it might as well have been fiber (and it was only a few bucks more than dialup … and portable, took it with us when we moved out of the armpit nuteral zone of cockmast and farter cable)

        2. 25 MBit/s? Slow? You’re lucky!😁

          Here in Germany, a 56k connection is still considered “sufficient” for a “working internet connection” by the officials.
          Really, don’t laugh. Please. It hurts.

          https://marketresearchtelecast.com/right-to-fast-internet-56-kbit-s-upwards-there-are-still-many-questions-unanswered/223091/

          What’s even funnier, 10 MBit/s is considered the new minimum for a “quick” connection. It *maybe* will be lifted to 15 MBit/s minimum, even! Cool, now we can finally use our 10Base2 NICs from the 1980s to their full extent!

          https://www.thelocal.com/20220504/explained-how-germany-is-trying-to-tackle-its-slow-internet-problem/

          And now tell me people in Germany have no sense of humor. 😂

      2. Also, why no ISDN? 😉
        It offered stable 64kbit/s or 128kbit/s, respectively.

        Using 56k modems was no guarantee that the connection was running at 56k, afaik.
        It merely was the theoretical maximum.

        The quality of the land line (noise, electrical mismatch, loss etc) was a big factor, but also the compatibility to the ISP’s modems itself was.

        An then there was the computer’s serial port.
        A proper 16550A or higher FiFo was required, at least, for proper functioning.
        Without a working buffer, transfers weren’t working stable.

        Old 486es with ISA Multi I/O cards had still 16450 FiFos installed in some cases.
        That caused issues with speeds over 19200 Baud.
        Sure, it was possible to remove the 16450, add a socket and install a 16550A.
        But not all people knew of the problem.

        The latest V.90/V.92 used compression, also, which worked sometimes better, sometimes worse.

        External USB modems had a much higher theoretical bandwidth to the computer, also.

        RS232=115,200 KBit/s
        USB 1.0=1,5 MBit/s (low speed)

        ISDN cards/adapters always could handle the full bandwith, I believe.

        (*Baud vs Bit/s.. Baud is a physical symbol rate. Ie, it’s uncompressed. And early modems did not use compression. If no compression is used, Baud equals Bit/s)

  2. well, since you asked…

    I have not deleted an email in 20 years – it is a searchable compendium of my entire history. If I am trying to figure out what the brand of that roof tar stuff I bought 10 years ago was – email. If I’m trying to remember who I was talking to about a particular movie? Email. When did my co-worker get married? Email. And as a bonus, all those attachments are still there – the documents, the pictures, even videos are still in the sent folder.

    Except not. Now there’s text, Teams, FB Messenger, Whatsapp and those are just the ones I use.

    The problem, for me, is not size, hell, I’ve go 2 Ter in my pocket right now, the problem is cross-platform searchability. Solve that problem (which is actually pretty easy to solve, not so much to implement) and you’ve solved a much larger problem than storage or transfer speeds. For me at least.

    Google took a stab at the size problem, I think instead of sending a huge file they post it to Drive and send a link. So if I email that video to 30 people the infrastructure suffers 1 upload and 30 downloads.

    Unfortunately Email is not the primary communications medium at work, Teams is, and that has me suffering terribly because now I need to search both to find information; and Teams, well, not particularly intuitive.

    So sure, create another platform to replace email, and suffer another “the great thing about standards is there’s so many to choose from”

    1. I would be very leary about saving emails from long ago. Every day our life is redesigned by people with some type of new way to bring a lawsuit. Geico insurance company may have to pay 5.2 million dollars to a woman who got an STD in an insured car. Who knows what kind of craziness can happen going back through people’s life in emails with Things that we’ve said were things that we’ve done.

  3. Build a better mousetrap? Silicon Valley and the likes have been doing this for decades and the problem is their next big idea is just a variation of an old idea with techno-fashonistas worrying about which logo is on their device.

  4. To be fair.
    I personally think Email is one of few things that is implemented very decently. (Dear I say “perfect”? No…)

    Sending basic formatted text to and from practically any device, is a very useful thing to have. Its wide support on devices is the killer feature of Email above all else.

    Not to mention that it is decentralized. However, setting up one’s own Email server is a bit of a hassle for most individuals and businesses and centralized solutions are generally having the advantage of portability where one can more or less access said server from almost anything as well. (not that one’s own email server couldn’t offer a web interface for remote access, or have an app. But this is more work to setup.)

  5. It’s a matter of Trust. Email was built for Arpanet, a network of people who knew each other or knew *of* each other. The first list of email addresses was published, like a phone book! It was a DEC salesman who broke the trust by spamming everyone in the published list.

    Every authentication layer, every trust layer, every feature (autoresponders, forwarders [by the way there’s no such thing as a forwarder, it’s just a re-send-without-saving-er] attachments etc) has been added on. DKIM, SPF, etc are all just band aids to try to a product that’s built for Arpanet safe on Internet, which it isn’t.

    As for attachments, base64 encoding adds about 33% to the attachment size. So, a 30MB attachment becomes 40-45MB in base64. It’s a mess.

    1. THIS. VERY MUCH THIS.

      And depending on how many users your email server has, the mail stores are.. large. VERY large. Because no one deletes any email, even if it’s trivial (canonal example: Dave, Cindy, Bob and Alice conversing over email about going somewhere for lunch) and most clients unless they are specifically using POP3, are storing all their messages on the server.

      I wear the “email admin” hard hat for my employer; the servers are a four node distributed HA cluster, and the mailstores for the ~1700 active accounts on it tip the scales at ~3 TB per node.

      I’m fond of saying “[mail server] is not a file storage device”, because some people use it in exactly that manner.

      As far as message transmission limits? while it’s historical reasons (limited bandwidth between companies, said internet links used to be quite expensive for the amount of bandwidth you did get, etc.), it largely does boil down now to performance throttling on mail servers. If I’m sending that 30 MB video to 50 of my internet friends, the mail server has to make 50 copies of that ~45 MB encoded file and send it to 50 different places that ranges from the desk on the other side of the cube farm to the other side of the world. that 45 MB attachment has multiplied to a shade over 2 Gbytes worth of data to send out.

      SMTP has always been at the core, a ‘store and forward’ protocol by it’s very design; mail gateway relay servers generally are not configured with a lot of space as it’s very much intended to be a ‘hold this email while I figure out what to do with it and where it needs to go’ for anything between 2 seconds to 2 days, depending on how the retry is configured.

      Where the bolted on security features kick in are to prevent things like having some random chucklehead send a message to half a million people at random with your server name as the return (forged sender attacks), relay abuse (which is why We Do Not Use Open Relays Anymore), and other fun things that spammers used to use before things like SBLs, reputation services, and other pre- SPF/DKIM/DMARC technology became a working thing.

      Speaking as a company that only got those three working proper-like in the last couple months, it’s still a work in progress because not everyone is an expert in configuring those technologies or has the resources to set it up correctly.

      I don’t think email is going to ever go away, because it is also at it’s core, a robust and deterministic means of sending messages from Point A to Point B and not have to worry if part of that path is over a pair of barbed wires with a field telephone at each end. :)

  6. It’s like no messaging application really wants to try to replace email. Have you ever seen a messaging service using the same paradigm of individually manageable messages with as much flexibility as you want?

    Decentralization and security is a factor in email’s survival, but I think the ease of interoperability and management as well as the customization capabilities (although these cause most of the bloat) are irreplaceable

  7. I love reading the world book encyclopedias from years ago. Some of the ideas in the field of science that grace the pages of the old books reveals alot about how people thought back when. I don’t remember if it was a 1967 or 1968 book but it stated that the postal service was looking into a system where mail could be delivered by internet conected computers! They of course would charge per piece of mail sent just like a letter. If you have never seen a world book go to the library and see if they have copies. I think they are about 22 books arranged alphabetically and released yearly from about 1920 to present. There is some great reading in some of the older ones.

  8. ‘There are services like WeTransfer and SendAnywhere, but honestly, we are more likely to just upload a file to one of our servers and send a link that way. Then there’s always the cloud storage like Dropbox, Drive, or iCloud.’

    And that’s why email will continue. All of those are proprietary services that exist at the whim of their owners. They temporarily rise and fall but none have the long life of a standard that’s owned by everybody.

  9. Interesting stuff, but what I want to know is why text messages sent via phone sometimes don’t get delivered for days after they are sent. I notice this mostly when sending from an iPhone to an Andriod phone or the reverse. Where do those messages hide (or bounce around) for the days before they get delivered?

      1. Not a cell phone tech, but since I thought SMS was supposed to be best effort, like UDP, and that SMS, RCS, and iMessage could get lost, I went and did some reading.

        According to wiki https://en.wikipedia.org/wiki/SMS#GSM
        SMS as it currently works is part of the GSM 03.41 spec, and the messages go from your phone to a Short Message Service Center. Once there, that part of the network keeps trying to send it until it actually gets received.

        Looking at https://en.wikipedia.org/wiki/Short_Message_service_center
        It seems that the sending device can set a validity period; but that’s only a suggestion to the SMSC. Dialing the “secret” Android phone number *#*#4636#*#* might get you an SMSC configuration menu if your phone supports it and if Verizon (or whoever) hasn’t shut down all the cool android secret phone numbers.

  10. Subject:

    Email is a Document Delivery System, NOT a Document Storage System !!

    Once your post office is in house, email will keep IT people awake at night.

    Users don’t want to think about their email, and want the attachments to remain connected to them.

    I used to manage technology for a large (>500 users) law firm.

    Starting in the 90’s, we maintained our own email server starting with a 15GB Microsoft Exchange Server. We were taught the hard way that our users were using the system for their own personal document management system when we quickly exceeded the 15GB limit of Exchange as it stopped delivering emails without warning in a few months. We had plenty of disk. A quick update to “Exchange Enterprise” solved the storage issue, but taught us an important lesson about users on the loose.

    After a few years, an investigation showed that certain attorneys had 15GB inboxes with millions of items in the main folder – which they claimed to be able to find without trouble (and they could, even though no one else could). They just looked by date and then client name. Never mind that their secretaries and staff had to search their inbox for a document, rather than a quick search of the DMS.

    Later, Microsoft abandoned “singe instance store” saying disk was cheap, so we now stored many copies of each email and their attachments (Duplicated many times due to the ease of forwarding an email, rather than a link to it).

    The lawyers were happy to buy multiple Exchange servers on a large SAN in order to “not think” about culling their emails.

    When I asked the partners and board for an extra million dollars plus 200K a year maintenance, plus an integrated external archiver for Exchange – they agreed to get real.

    We always had an industrial-strength Document Management System that was fully integrated into the “file-save” menu in all of our applications. Saving to DMS was the default choice, but forced people to choose a few keywords, and make a short description — so few of the attorneys would use it, even though the values were automatically defaulted to each person.

    We even hired people to invisibly move emailed documents to the DMS, replacing the actual document in the Exchange store with a link to the document in the DMS. The attorneys hated me, even though it was invisible to them due to a very good linking system.

    My replacement started a program to have secretaries and hired help “vet” all digital work product within the Firm using an email extension to the DMS, declaring items to be “documents” (worthy of keeping or archiving) or transitory. The system was so complicated and hard to maintain, they eventually moved everything to the DMS vendor’s cloud.

    Email in its present form is one of the biggest challenges an IT department might have. Yes, computers can handle big files, but try to manage a “library” of 200 million emails with big file attachments and back them up (Properly!!!!) – and it’s a nightmare.

  11. Every week I get a 20MB newsletter email, which contains text arranged in slides with adjacent pictures and animations, ALL ENCODED AS IMAGES.

    Every week I delete that email without reading it.

  12. Is there any server anywhere on the net now that still only passes 7 bits? That’s why BinHex was invented for encoding Macintosh files. By using only ASCII characters where the LSB was always zero the attachments or downloads would survive passing through any 7 bit server or relay.

    MacBinary was made to text encode Macintosh files with their resource forks into 8 bit ASCII, making it more compact than BinHex while still being able to go through servers that would only pass text.

    Remember when sites with downloads for Macintosh had some small test files? One in Stuffit archive, one in MacBinary, and one in BinHex. Since the resource fork of a Stuffit archive only contains an icon and the filletype code SIT! they can survive losing their resource fork and be extracted.

    So in the era of not-unlimited internet one would first try the Stuffit archive. If that got wrecked you’d try the MacBinary. If Stuffit Expander couldn’t decode it you’d try the BinHex test file. If *that* was also janked then something somewhere was seriously messing up. If one of the test files worked, then you’d know which version of the full download to get, and watch your data limit decrease by that much.

    With the elimination (we hope) of all 7 bit systems on the internet and web, and all of them passing both text and binary files (except gmail that blocks anything it considers executable and password protected atrchives it can’t scan), and with most places not having download limits, there’s no need for the test files. Though if you’re still using Classic Macintosh computers you still want to have downloads packed in Stuffit archives, BinHex, or MacBinary.

    So why were there 7 bit servers and relays in the first place? Since the Internet and ASCII were American inventions, the inventors of ASCII decided to assign bytes to all the characters used for English, plus some control codes, all the possible combinations of 8 bits where the least significant bit was zero. That way the amount of text data that needed to be sent was reduced by 1/8th and could be re-constituted on the receiving end simply by putting a 0 on that end of every byte.

    When your transfer speed is slow, not having to send 1/8th of the data makes quite a difference.

    But when methods were developed to attach and send non-text files with e-mail, or a text e-mail needed to be sent in some other language that needed Extended ASCII characters where that LSB was a 1, there had to be workarounds invented for all that 7bit-nes-mess.

    BTW, with Extended ASCII, there’s no need to use Unicode, UTF-8, “Friendly HTML” codes or any other special encoding for languages whose alphabet is fully contained within the Extended ASCII set. I looked into this years ago when I used Palm OS PDAs for ebooks. Of all the languages whose alphabets mostly use ‘english’ characters, there’s only one uncommonly used character in Norwegian that’s not in the EX-ACSII set. It even includes ‘fancy’ characters like left and right single and double quotes, en-dash, em-dash, letters with various accent marks, and many others. If you’re contemplating using unicode etc for punctuation, STOP, check to see if the character is in EX-ASCII. If it is, don’t waste extra bytes when not needed, especially if among your intended use cases are platforms (like Palm OS) that don’t support Unicode or similar methods.

    A HaD article on not using multi-byte character encoding when it’s not necessary would be a good one. Compare Extended ASCII to Unicode, UTF-8, “Friendly HTML” etc and how those overlap the Extended ASCII set, duplicating all the single byte characters while needlessly using extra, wasted, bits.

  13. Attachments are already a bane of the sysadmin’s existence. I have users with 50GB mailboxes that grew to that size over a period of three months. Others that have hit 1TB in their total history, and for legal reasons they have to keep everything. Plenty of file sharing services exist, getting people to actually use them and stop trying to stuff 50MB attachments on their emails is the real problem. None of that even touches on the problems that come with multiple recipients, shared mailboxes, groups, etc. That 50MB attachment suddenly becomes much larger when sent to a distribution list of a few dozen or hundred recipients.

    1. I work for a government organisation and for security and privacy reasons we are not allowed to use any third party services, which means that most of the internal file exchanges are through SharePoint. Externally it’s often a matter of encrypting and splitting large files with 7zip (gratuitously provided by IT) before sending.

  14. There’s work underway at IETF on standardising a link format for large attachments to email to make it easier to manage copies of the attached file.

    Lots of problems still to solve, but I think the appetite is there now for having something easier for users than uploading to drive and creating a link, in a way that works across email providers.

    And yeah, email isn’t going anywhere. The death of email has been predicted many times, but it’s still the biggest and best social network, as well as being your electronic memory!

  15. Having had my own mail server for more than 20 years, I have been through a lot of anti-spam measures.

    First the big mail providers would not accept mail from a private ISP-connection, so I had to move my server to a friend who had a company line.

    After working for some time we were flagged as spammers in Spamhaus. I checked both my servers and none of them were misconfigured. I had the firewall count all outgoing mail and there were very few. I tried to manually ask for delisting in Spamhaus but were flagged again shortly after. It got a bit “spicy” when we were flagged with NO outgoing mails sent from our systems. I turned off the router, did the delisting … and everything was good until I turned on the router again. Only the router, nothing connected to it on the inside. And Boom! We were flagged shortly after. I contacted the ISP and told them that their router/modem was infected, expecting them to do a quick firmware update or send a new router. But no, that had a better solution that prevented recurrence.

    They just stopped all outgoing mail on the line, i.e. port 25, 465 and 587. I quickly set up a VPS and set the mail servers up to relay all mail through that server on port 24 while we changed to another ISP.

    Then came the demand for SPF and DKIM, which is properly set up, but broken at Microsoft’s end when sending there, often getting reports that SPF has failed despite coming from one of my SPF-designated outgoing servers. Sometimes Microsoft also messes up the headers in a way DKIM fails. Only Microsoft, not Google nor the other gaggle of mail providers have problems with my SPF and DKIM, and there are many post on the internet about people complaining about this.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.