Archiving The Entirety Of DPReview Before It’s Gone

Despite the popular adage about everything on the internet being there forever, every day pages of information and sometimes entire websites are lost to the sands of time. With the imminent shutdown of the DPReview website, nearly 25 years of reviews and specifications of cameras and related content are at risk of vanishing. Also lost will be the content of forum posts, which can still be requested from DPReview staff until April 6th. All because the owner of the site, Amazon, is looking to cut costs.

As announced on r/photography, the Archive.org team is busy trying to download as much of the site as possible, but due to bottlenecks may not finish in time. One way around these bottlenecks is what is called the Archive Team Warrior, which involves either a virtual machine or Docker image that runs on distributed systems. In early April an archiving run using these distributed systems is planned, in a last-ditch attempt to retain as much of the  decades of content.

The thus archived content will be made available in the WARC (Web ARChive) format, in order to retain as much information as possible, including meta data and different versions of content.

28 thoughts on “Archiving The Entirety Of DPReview Before It’s Gone

  1. Ok, Im dumb. How much would it cost amazon to keep that up and running per year?
    Can’t archive.org just ask for amazon to donate the gear? Or is that yet another dumb question?

    1. I couldn’t imagine it being hosted on anything other than an AWS instance.
      Turning off a VM isn’t going to save much money, so all I can see it doing is freeing up storage space that could be made available to a paying customer.

      I wouldn’t think that would be a huge cost at that scale. It would be like you or I claiming to delete a small file from our huge HDs in order to save money somehow.

      1. One small edit: Amazon’s claim of cutting costs is almost certainly based around keeping the ‘company’ behind dpreview up and running, not just their website, which I’m sure really is a notable expense.
        However for just the scope of keeping the site up, with Amazons existing resources, that cost should be negligible.

        1. Agreed, maintenance and updates, moderating the forums and other user content, maybe a little bit of internet traffic is where the costs are. If archive.org could just get a deal to get the database of all the content, that would save them from using the infrastructure to download it all. And much of the content is floating around the internet anyway, I’ve seen my posts and images on other websites as well, even people claiming to be me and having shot the pictures that I’ve taken lol!
          Also, there’s an initiative to continue a forum by and for the users of dpr (dprforum.com, final name is still pending). There are entire communities build around the dpr forums that are suddenly gone as well, it’s not just about the content but also a great space where people can share their knowledge, hang out and learn.
          It’s trivial to have a wordpress/phpbb forum installed on a server instance, vm or container, it’s another cup of tea to keep it running securely and maintain a good general healthy eco system for people to participate in.

    2. According to a former editor (i guess they’re all former now) at DPReview, they had “15-18 people, tops. By my rough math once, our entire operating cost for the year was about 10-12 minutes worth of Amazon revenue.”
      Which if google is to be believed is in the $2-3 million range, so chump change to amazon.

      1. Docker is containerization software. It’s kind of like a VM lite, with a fair amount of isolation, while still having relatively low overhead to run. The idea is that you can very easily install a container and it will run without a bunch of configuration.

        Unraid is a Linux flavor that, among other things, allows easy use of docker containers. Never used it myself, but it would be pretty trivially easy to run the container on basically any Linux machine (and Windows, though I’ve never used docker on it before).

    1. You might want to add a “–restart=always” to the ‘extra parameters’ section for the Warrior Docker. Otherwise I found it kept stopping for some reason and had to be manually restarted.

  2. This is almost certainly not motivated by cost, instead Amazon want to drive traffic to their own site to be the single source of reviews. Yes, we all know Amazon reviews are shit, doesn’t change their greed.

        1. I’m starting to think that if possible, sites like this in question and especially my beloved Hackaday, should include in any site purchase contracts a legal stipulation that says something like:

          “condition of sale is only valid if owner gives a specific amount of time frame upon wanting to wind down the site for proper backup of X months, with notice to specific people, otherwise site becomes transferred to Y person legally immediately.

          And this is legally valid for the company buying and agreeing to make this valid and legally binding to any further entity that buys it. ”

          Something like that. I don’t know whether that’s legally enforceable that way or not, but surely there has to be a way for sites that archive very specific information to include this in a sale clause just in case

  3. With most resources nowadays being published only online, archive.org has become invaluable.
    For written books, many countries have laws that require sending a copy to the national library.
    It would be silly to require that for every web page, but some provisions could be made to make archival easier, more than just the copyright exception.

    1. There’s no national body for that. Plus the internet really doesn’t have an attitude of permanence. All our HaD comments could disappear overnight, and who would weep for the loss?

      1. Some of the comments are tremendously valuable. Hackers sharing information with their peers is the point of the whole darn show, actually. Through projects, through comments.

        So, um, me. I would weep.

        If there were no value, we’d just close down the comment section and save the effort moderating.

  4. One big problem archive.org has is with one tiny file. ROBOTS.TXT
    When their web crawler finds ROBOTS.TXT it automatically stops, doesn’t copy anything in that directory or anything below it.

    The problem is that in ROBOTS.TXT can be explicit instructions to *allow archiving by crawlers*.

    There’s a massive amount of stuff lost forever thanks to whomever at the archive didn’t program the archiving crawler to actually open and read robots.txt files to see if it contained permission to archive. Instead, the person(s) just took the mere presence of the file as a blanket negative.

    It’s especially annoying when I’m looking for drivers for some old piece of hardware and the archive didn’t get it because of robots.txt and I open that file (which is the only file they saved) and inside is archive yes. A directive the site is allowed to be archived and the archive did the opposite.

    There’s this from 2017. Did they go ahead with ignoring it and archiving anyway? https://teleread.org/2017/04/24/the-internet-archive-will-soon-stop-honoring-robots-txt-files/

  5. their copy is probably going a bit slow given how many people are (independently) doing it..

    Given the cost of the web site would be 99% made up of staff costs, they should just leave it up as read only for a year or two, but as the comments above say – they aren’t getting rid of it because of cost.

  6. Instead of asking Amazon to keep the site up, perhaps Archive Team could just ask them for a copy. An external hard drive or tape cartridge would be much cheaper and hopefully Amazon aren’t such a-holes as to deny that simple request to a charitable organization. Or is that a false hope?

  7. There have been many times that camera companies have refused warranty service for cameras purchased off amazon and ebay. Many camera companies still refuse warranty service if a receipt is shown from either of these sources. Canon is one of the largest offenders. The only reasoning I can find is that amazon and ebay are outlets for reselling gray market cameras. Gray market cameras are those indented for europe, asia or latin american countries are often imported and resold in the usa for discount. Camera companies really don’t like this practice. Canon has sued vendors many times for this practice. To me, Amazon was simply not a good fit for dpreview for these reasons. Why would anyone purchase an expensive camera off amazon if they know their warranty is null and void?

    1. Amazon actually started policing this quite a few years ago and you’d usually see one grey/international listing (or multiple ones) for any given piece of camera gear besides the official/US one, and there’d be some relatively clear stipulation about the warranty implications.

      I’m not *as familiar* with Canon as with Sony/Pana/Oly-OM, and they seem to be one of the more litigious of the bunch; the one most pressed by 3rd party reverse engineering and the one who’s done the most to stop it on their newer mount… So I’m not surprised they clashed with Amazon.

Leave a Reply to fiddlingjunkyCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.