Offering PDFs that download

This article was originally written in April 2006. By now, the situation with PDFs and browsers has changed a bit, as browsers are pretty good PDF viewers nowadays. Still, the info might be relevant to some people, so I decided to keep the article around.

When a PDF opens inside a browser window, most people are confused. They are using a browser, and they expect only a browser. To work around this, many websites open PDFs in a new window, which is also annoying and confusing. I have a much better solution.

There is also a translation to Spanish available thanks to Pablo Noel. It’s a bit outdated, though.

Offer for download

A browser can open only a couple of file types: HTML, CSS, XML and some images. If you open another type of file, you’re asked if you want to save it, or open it in it’s native application.

The ideal way for offering PDF files would be just that: ask the user to either save it, or view it in their native PDF viewer. Here is a demonstration, my MSc thesis.

Unfortunately, the most widely used PDF viewer, Adobe Acrobat Reader, installs itself as a browser plugin. This is the cause of the annoying PDF-opening-in-browser effect. There is a simple way of preventing this, though.

Every file that is offered in a response to an HTTP request is contained in an “HTTP response”. Easy, isn’t it? This HTTP response contains headers that tell your browser about the type of file, when it was last modified, what size it has, etc. In order to prevent Adobe Acrobat Reader from opening your file directly in the browser window, add the following HTTP header to the response:

Content-disposition: attachment

How do you do that? That’s what the following sections are about.

Apache 2

On Apache 2, this is all trivially easy. Just place the following in a .htaccess file in the same directory as your PDFs, and you’re done.

SetEnvIf Request_URI "\.pdf$" requested_pdf=pdf
Header add Content-Disposition "attachment" env=requested_pdf

This requires mod_headers, which is shipped with Apache2. On Ubuntu, it needs to be enabled with the “a2enmod headers” command.

Apache 1.3

On Apache 1.3 it is almost as easy as with Apache 2.0:

<FilesMatch "\.pdf$">
    Header add Content-Disposition "attachment"
</FilesMatch>

This also works with Apache2 so if you add the <FilesMatch... block above into your virtual host container. For more info, see the Apache documentation of the Files and FilesMatch directives.

Courtesy of Darren Clark

Django

This is very easily done in Django. Put the following in your urls.py:

from django.conf.urls.defaults import *

urlpatterns = patterns('',
    ...
    ...
    (r'^pdfs/(?P<filename>[a-z0-9A-Z_\-]*\.pdf)$', 'pdf'),
    ...
    ...
)

The pdf() function in your view could then be something like this:

def pdf(request, filename):

    fullpath = os.path.join(PDF_PATH, filename)

    response = HttpResponse(file(fullpath).read())
    response['Content-Type'] = 'application/pdf'
    response['Content-disposition'] = 'attachment'
    return response

Please note that Django has been designed for dynamic content. If you can, use another solution for your static files, like Apache 2.

PHP

If you’re using PHP, you can write a simple PHP file that reads the PDF and outputs it to the browser. In this article, I’ll assume the script is named ‘mypdfscript.php’.

The best way of linking to that PHP script is by using ‘mypdfscript.php/somefile.pdf’ as the URL. Your webserver will understand it has to execute ‘mypdfscript.php’, but your browser thinks it’s downloading ‘somefile.pdf’ and will name the downloaded file as such.

<?php
$pdf = substr($_SERVER['PATH_INFO'], 1);

if(preg_match('/^[a-zA-Z0-9_\-]+.pdf$/', $pdf) == 0) {
	print "Illegal name: $pdf";
	return;
}

header('Content-type: application/pdf');
header('Content-disposition: attachment; filename=' . $pdf);
readfile('path_to_pdfs' . $pdf);
?>

Crappy server?

If your server doesn’t understand it has to execute ‘mypdfscript.php’ for the URL ‘mypdfscript.php/somefile.pdf’, try this code:

<?php
$pdf = $_GET['pdf'];

if(preg_match('/^[a-zA-Z0-9_\-]+.pdf$/', $pdf) == 0) {
	print "Illegal name: $pdf";
	return;
}

header('Content-type: application/pdf');
header('Content-disposition: attachment; filename=' . $pdf);
readfile('path_to_pdfs' . $pdf);
?>

With the above code, you should link to the PDF as mypdfscript.php?pdf=somefile.pdf.

Publicfile

Publicfile is a very small and simple web server. Kenji Rikitake sent me a patch for version 0.52, the latest as of this writing. The patch adds the proper Content-disposition header to the HTTP reply, in case of a PDF.

Download the patch: filetype.c.diff

Microsoft IIS

How to apply this technique to Microsoft IIS is described by Microsoft themselves in How To Raise a “File Download” Dialog Box for a Known MIME Type.