Uploading Files

Anyone who’s built a rails application that deals with large file uploads probably has a few horror stories to tell about it. While some people love to overstate the issues for their own purposes, it’s still something that can be quite challenging to do well.

What’s the Problem?

As I mentioned in the article on File Downloads, your rails processes are a scarce resource. You need them to be free to handle your applications’ requests, if they’re all busy, your users will be left waiting. When we optimised the download processes we made sure that we used our webservers instead of tying up a rails process to spoon feed the file out over the network to your users. Dealing with uploads has a similar problem.

When a browser uploads a file, it encodes the contents in a format called ‘multipart mime’ (it’s the same format that gets used when you send an email attachment). In order for your application to do something with that file, rails has to undo this encoding. To do this requires reading the huge request body, and matching each line against a few regular expressions. This can be incredibly slow and use a huge amount of CPU and memory.

While this parsing is happening your rails process is busy, and can’t handle other requests. Pity the poor user stuck behind a request which contains a 100M upload!

What’s not the problem?

Some people seem to think that the File Upload problem with rails is that the entire process is blocked while the browser sends the encoded body to you. This isn’t not true, and hasn’t been for a long time. Whether you’re using nginx + mongrel, apache + mongrel or apache + passenger, your web server buffers the entire request before rails locks itself for processing. So no matter how slow a user’s connection is, your application isn’t locked while they upload their file.

What can you do?

There are a number of unattractive options to work around this slow multipart-parsing. The most common I’ve seen is to send uploads to a non-rails process such as a CGI script or a merb/mongrel/rack application. CGI scripts have the obvious disadvantage that you need to write a script simple enough to start up quickly and featured enough to process your uploads. Doing it in rack leaves you relying on ruby’s threading to handle parallelism. This is probably not what you want and your throughput is probably much lower than it would be without that upload being processed.

What else can you do?

Because neither of these options were acceptable Pratik Naik and I have built a Mod Porter an Apache module that does the heavy lifting for your file uploads. All of the hard stuff is done by libapreq though, so you don’t have to worry about using C code written by two ruby programmers!

Porter is essentially the inverse of X-SendFile. It parses the multipart post in C inside your apache process and writes the files to disk. Once that work is done it changes the request to look like a regular form POST which contains pointers to the temp files on disk. To maintain system security it also signs the modified parameters so people can’t attack your system like those old PHP apps.

This means that your rails processes don’t have to deal with anything more than a regular form post which is nice and fast. In addition to the apache module, Porter also includes a Rails Plugin which hides all of this from you. It makes an upload handled by Porter, look just like a regular Rails Upload.

How fast is it?

The speed of upload parsing isn’t particularly relevant, the reduced locking is far more important. Your user’s internet connection is much more important for the round-trip upload performance than your upload handler’s parser.

Having said all that, Porter runs significantly faster than the equivalent pure-ruby parsing code. Depending on the size and number of uploads we’ve seen response times between 30 and 200 times as fast. That’s not just compared to rails’ upload parser, it’s that much faster than every other ruby mime parser we tried.

Isn’t this just like the Nginx module?

Kinda. We’ve been thinking about this module ever since we started using lighttpd’s X-SendFile header. When I saw the nginx module get released I decided to start planning the Apache equivalent. Porter is completely transparent to your application, you don’t need a special form action, and you don’t need to tell Porter what form fields to pass through to the web application. This means you can use porter in production, and mongrel or thin in development, without any changes to your application.

The biggest improvement from this is that you don’t need to change your nginx config every time you add a new input to a form, or a new file upload to your application. This is extremely tedious and error prone, especially when making these changes involves a support ticket with your hosting provider. The major goal we have with Porter is to make sure it always ‘Just Works’, so you can put a file upload into any form without having to worry about your web server.

Getting Started

Porter is still beta software, so you’re strongly advised to test it first, but you already knew that. The porter website has the installation instructions. Once you’ve got that done you’ll need to add the rails plugin, and configure them to share a nice secure secret. Then, hopefully, your application will Just Work but your uploads will be much less painful.

If you have any issues getting it running, leave us a note on the git hub issues page.