Introduction to Scaling PHP Applications – Part 1
This is the first blog in a two-part series on scaling PHP applications. Part one will focus on replacing Apache while part two will go into more advanced topics such as Master-Master replication and session storage.
Making a website able to handle high amounts of traffic is one of the cornerstones of modern web applications. It’s a process that takes time, and it’s obvious to everyone when a website isn’t able to handle traffic. (i.e. HealthCare.gov) A lot of work is needed to make a web application able to handle high loads. I’ve done this specifically with code written in PHP.
I started out developing in PHP when LAMP (Linux + Apache + PHP + MySQL) was all the rage. It was great for quick prototyping and for small-sized applications. It seemed that at the time, LAMP was the greatest thing since sliced bread. That was until I started writing code that was used by hundreds of people every day. Traffic spikes, high CPU and RAM usage, and manual restarts just seemed like part of being a PHP developer.
PHP has a bad reputation for not handling well under pressure. After refactoring and refactoring with no success, many PHP developers will feel like calling it quits. If our LAMP model limits us just a few dozen users, why use PHP at all?
I was in that boat not too long ago, until I came across Steve Carona’s life-changing Scaling PHP to Millions of Users. I found ways to run my code +500% faster, and increase uptime to nearly 99%.
In this blog, I’ll go over my experience with scaling PHP applications. I’ll describe some of the strategies and pitfalls, and hopefully provide guidance to all those wayward developers who want to swear off PHP completely. My first-hand experience may help clarify areas of Steve’s book. Or, I could inspire someone to take control of their stack and save it from oblivion.
When I talk about scaling, I am referring to the ability of an application architecture to gracefully respond to high loads. This involves everything from the DNS to the web server to the version of PHP to the database. Some of the strategies described in this blog are specific to PHP, but others (e.g. Nginx, Percona XtraDB Cluster) can apply to other web applications.
The goals any scaling project should have are:
- Maximize performance under high loads
- Maximize automatic handling of failure
- Minimize code changes necessary to achieve previous two goals
Step 1 – Use Nginx for Static File Serving
Taking the first step is usually the hardest. If it fails, the rest of your plans slowly crumble. If it succeeds, the struggles in the next steps at least have some previous success to drive them forward. That is why I chose Nginx as the first step. It’s simple, easy to use, and gives huge performance gains over Apache. It’s so easy that even I can’t screw it up.
In LAMP, Apache has two primary jobs: serving static files and calling mod_php to evaluate PHP scripts. One service doing two jobs could be a pro, but in terms of scaling it is a con. When a user hits Apache with a request for a static file, the forked Apache process that retrieves the file has mod_php built into it. So all that RAM and CPU time used by that fork with mod_php is wasted for that request. That’s waste we can easily cut out.
This area is where Nginx shines. It can serve us static files, and not care that we’re using PHP as our server-side language. When it receives a PHP request, it can send that off to Apache without blocking the other forks. It doesn’t have mod_php in the forks, so no waste there either. Nginx forks are magnitudes smaller in both RAM and CPU usage than Apache forks. Nginx is event-driven while Apache is process-driven, so we get an asynchronous and non-blocking as a bonus.
Putting Nginx on the front with PHP requests served to Apache on the back is a really easy first step to take in this process. The only caveat I’ve experienced is with gzip and PHP’s flush() method. Nginx needs some special configuration for long running processes whose output is gzipped. I’ll leave that and reading the Nginx pitfalls/best practices as a homework assignment.
Step 2 – Use PHP-FPM for PHP Processing
Once you’ve made the first step and have committed to making your architecture better, you can’t wait to see how much better the other steps will make your stack. I know the first time that I ran benchmarks of Nginx vs. Apache, I was stunned. I calculated that the stack I was scaling could handle 200% to 300% more traffic while using less resources with Nginx on front.
Now, for my favorite part of this whole thing: getting rid of Apache. I remember back in the day when Apache was the bee’s knees. Sure, it took me a few hours to install due to resolving dozens of dependencies, but it worked and it worked well. Oh what a fool I was.
When it comes to evaluating PHP on a large scale, PHP-FPM is the gold standard. It and Nginx are the de facto winner of high load PHP processing. Rather than being somewhat good at hundreds of things like Apache, PHP-FPM is only concerned about one thing: evaluating PHP as quickly as possible.
And it’s really, really good at it. So good that it went from a user-maintained patch to being incorporated by the PHP team into PHP itself. PHP-FPM is a FastCGI server that bundles PHP with each fork. This allows for PHP requests to be evaluated as quickly as possible with as little red tape to go through as possible.
PHP-FPM has been supported by the PHP team since 5.3. If you’re not on 5.3 or newer yet, get on the bandwagon. It’s faster, more reliable and consistent, and allows for some pretty cool syntactic sugar. The process of upgrading can be hard, especially with a large code base. The advantages though make all the time spent worth it.
Along with using PHP-FPM, I’d highly recommend using an opcode cache. These extensions will cache PHP’s evaluation of a given file so that subsequent requests won’t have to do the evaluation again. I’ve tried the Zend OpCache, but had issues with seg faults for long-running processes. APC is the best one I’ve used, and would recommend it to anyone and everyone.
If you’re stuck on the single-server LAMP model, you could stop here and be very happy with your results. We still have no code changes (barring upgrades to a newer version of PHP), and will have anywhere from 300% to 500% better performance over Apache. The next steps are geared more towards a true scaled application, but finishing at this step is a great starting point.
Don’t miss Part Two!
— Zach Gardner, firstname.lastname@example.org