Scaling PHP

Introduction to Scaling PHP Applications – Part 1

by on November 18, 2013 11:59 am

This is the first blog in a two-part series on scaling PHP applications. Part one will focus on replacing Apache while part two will go into more advanced topics such as Master-Master replication and session storage.

Introduction

Making a website able to handle high amounts of traffic is one of the cornerstones of modern web applications. It’s a process that takes time, and it’s obvious to everyone when a website isn’t able to handle traffic. (i.e. HealthCare.gov) A lot of work is needed to make a web application able to handle high loads. I’ve done this specifically with code written in PHP.

I started out developing in PHP when LAMP (Linux + Apache + PHP + MySQL) was all the rage. It was great for quick prototyping and for small-sized applications. It seemed that at the time, LAMP was the greatest thing since sliced bread. That was until I started writing code that was used by hundreds of people every day. Traffic spikes, high CPU and RAM usage, and manual restarts just seemed like part of being a PHP developer.

PHP has a bad reputation for not handling well under pressure. After refactoring and refactoring with no success, many PHP developers will feel like calling it quits. If our LAMP model limits us just a few dozen users, why use PHP at all?

I was in that boat not too long ago, until I came across Steve Carona’s life-changing Scaling PHP to Millions of Users. I found ways to run my code +500% faster, and increase uptime to nearly 99%.

In this blog, I’ll go over my experience with scaling PHP applications. I’ll describe some of the strategies and pitfalls, and hopefully provide guidance to all those wayward developers who want to swear off PHP completely. My first-hand experience may help clarify areas of Steve’s book. Or, I could inspire someone to take control of their stack and save it from oblivion.

Scaling Goals

When I talk about scaling, I am referring to the ability of an application architecture to gracefully respond to high loads. This involves everything from the DNS to the web server to the version of PHP to the database. Some of the strategies described in this blog are specific to PHP, but others (e.g. Nginx, Percona XtraDB Cluster) can apply to other web applications.

The goals any scaling project should have are:

  1. Maximize performance under high loads
  2. Maximize automatic handling of failure
  3. Minimize code changes necessary to achieve previous two goals

Step 1 – Use Nginx for Static File Serving

Taking the first step is usually the hardest. If it fails, the rest of your plans slowly crumble. If it succeeds, the struggles in the next steps at least have some previous success to drive them forward. That is why I chose Nginx as the first step. It’s simple, easy to use, and gives huge performance gains over Apache. It’s so easy that even I can’t screw it up.

In LAMP, Apache has two primary jobs: serving static files and calling mod_php to evaluate PHP scripts. One service doing two jobs could be a pro, but in terms of scaling it is a con. When a user hits Apache with a request for a static file, the forked Apache process that retrieves the file has mod_php built into it. So all that RAM and CPU time used by that fork with mod_php is wasted for that request. That’s waste we can easily cut out.

This area is where Nginx shines. It can serve us static files, and not care that we’re using PHP as our server-side language. When it receives a PHP request, it can send that off to Apache without blocking the other forks. It doesn’t have mod_php in the forks, so no waste there either. Nginx forks are magnitudes smaller in both RAM and CPU usage than Apache forks. Nginx is event-driven while Apache is process-driven, so we get an asynchronous and non-blocking as a bonus.

Putting Nginx on the front with PHP requests served to Apache on the back is a really easy first step to take in this process. The only caveat I’ve experienced is with gzip and PHP’s flush() method. Nginx needs some special configuration for long running processes whose output is gzipped. I’ll leave that and reading the Nginx pitfalls/best practices as a homework assignment.

Step 2 – Use PHP-FPM for PHP Processing

Once you’ve made the first step and have committed to making your architecture better, you can’t wait to see how much better the other steps will make your stack. I know the first time that I ran benchmarks of Nginx vs. Apache, I was stunned. I calculated that the stack I was scaling could handle 200% to 300% more traffic while using less resources with Nginx on front.

Now, for my favorite part of this whole thing: getting rid of Apache. I remember back in the day when Apache was the bee’s knees. Sure, it took me a few hours to install due to resolving dozens of dependencies, but it worked and it worked well. Oh what a fool I was.

When it comes to evaluating PHP on a large scale, PHP-FPM is the gold standard. It and Nginx are the de facto winner of high load PHP processing. Rather than being somewhat good at hundreds of things like Apache, PHP-FPM is only concerned about one thing: evaluating PHP as quickly as possible.

And it’s really, really good at it. So good that it went from a user-maintained patch to being incorporated by the PHP team into PHP itself. PHP-FPM is a FastCGI server that bundles PHP with each fork. This allows for PHP requests to be evaluated as quickly as possible with as little red tape to go through as possible.

PHP-FPM has been supported by the PHP team since 5.3. If you’re not on 5.3 or newer yet, get on the bandwagon. It’s faster, more reliable and consistent, and allows for some pretty cool syntactic sugar. The process of upgrading can be hard, especially with a large code base. The advantages though make all the time spent worth it.

Along with using PHP-FPM, I’d highly recommend using an opcode cache. These extensions will cache PHP’s evaluation of a given file so that subsequent requests won’t have to do the evaluation again. I’ve tried the Zend OpCache, but had issues with seg faults for long-running processes. APC is the best one I’ve used, and would recommend it to anyone and everyone.

Conclusion

If you’re stuck on the single-server LAMP model, you could stop here and be very happy with your results. We still have no code changes (barring upgrades to a newer version of PHP), and will have anywhere from 300% to 500% better performance over Apache. The next steps are geared more towards a true scaled application, but finishing at this step is a great starting point.

Stay tuned for part two, coming soon!

– Zach Gardner, asktheteam@keyholesoftware.com

  • Share:

11 Responses to “Introduction to Scaling PHP Applications – Part 1”

  1. Jeff says:

    I recently switched from Apache+mod_php to Nginx+php_fpm, and while I really like the increased efficiency and simple configuration it offers, I might have to switch to Nginx+php (via a reverse proxy to Apache+mod_php), or go back to Apache+mod_php altogether because of an issue I’ve been having with php_fpm. The issue didn’t show up in testing. Only in production, at scale. The issue is that a super small percentage (way less than 1%) of PHP requests don’t return anything at all. I’m talking like a white page! There are a bunch of 499, 502, and 504 errors in the access/error logs, but no other errors that I’ve found that might lead me to what is causing the issue. I’ve tried everything, but nothing has fixed the problem. It’s uber frustrating!

    • Zach Gardner says:

      Might be that Nginx is out of workers to process the request, and none free up before the timeout hits. Could also be true of PHP-FPM running out of workers.

      Did you do any benchmarks to validate it’s running faster? I’d recommend using siege because it uses less memory than ab. You can find out at what load you start getting 499, 502, and 504s. I’d bet it’s not due to either technology but by not having it configured for really high loads.

      I didn’t talk about how to configure Nginx or PHP-FPM for high loads. The Scaling PHP book does talk about it. There are a lot of tuning Nginx sites out there you can also look at. They should also go into what settings are needed on the OS to make it better for high loads.

      • Jeff says:

        No, I was silly and did not think to benchmark before switching over.

        I’ve purchased the book, and have tweaked a few things. I’ll report back on my progress!

        • Zach Gardner says:

          Good luck on the process. I’m not sure if the book’s author will continue working on it or not, but it’s a treasure trove of information. I was able to get a lot out of it, even in its beta form.

      • Steve says:

        Disable Apc altogether and use zend opcache. We had the same issue and tracked it down to a bug with Apc not being compatible with the latest versions of php.

        • Zach Gardner says:

          Running APC on 5.4 worked really well for me. I had issues with long running transactions (e.g. an import script ran needed to run for 1.5 hours) with Zend opcache causing seg faults.

          Out of curiosity, how did you both stumble on this article? Trying to get the word out about this and the next post.

    • Zach Gardner says:

      It could also be a seg fault. I would define an auto_prepend_file in your php.ini with something like this: https://gist.github.com/zgardner/7658233 I didn’t test it, but it should give you the idea. If you make sure that you write a START file when a request comes in and an END file when the script finishes, you can look for all requests that have a START but no END. If you still have scripts giving non-200′s and all STARTS have an END, then you know your issue isn’t with PHP.

      • Jeff says:

        I’ll give that a shot next. Thanks! I’m reading through the book now. It’s excellent! I haven’t implemented all of the suggestions yet, but was able to eliminate the 504 errors I was seeing with Socket.io, so that’s good! I’ll give this a try too…

      • Jeff says:

        Update update!

        It turns out it wasn’t nginx or PHP. It was the REST API I was using to communicate between PHP and Node.js. I replaced it with Redis Pub/Sub and the problem went away.

        Thanks for all of your help!

        Also, I never answered your question… I found this blog post through the PHP Weekly e-mail newsletter.

  2. […] is the second blog in a two-part series on scaling PHP applications. The first blog in the series focused on replacing Apache+mod_php with Nginx+PHP-FPM. This blog will go into advanced topics […]

  3. […] This was originally posted on the Keyhole Software blog on 11/18/2013. […]

Leave a Reply

Things Twitter is Talking About
  • A huge welcome to John Holland who joined the Keyhole team today!
    July 28, 2014 at 4:56 PM
  • We think #JavaScript Promises are cool. Here's a good introduction from @mauget - http://t.co/6wCz9b7e4v
    July 28, 2014 at 2:19 PM
  • There's a new post on the Keyhole blog by @mauget: #JavaScript Promises Are Cool - http://t.co/6wCz9b7e4v
    July 28, 2014 at 9:52 AM
  • Thank your #Sysadmin - today is System Administrator Appreciation Day. http://t.co/LcvDNa9kPg
    July 25, 2014 at 8:05 AM
  • @rickincanada Thx for your tweet! Shoot us an email at asktheteam@keyholesoftware.com so we can set up a time to talk. Have a good day.
    July 24, 2014 at 3:33 PM
  • Never used JAXB? Check out a simple usage pattern that pairs #JAXB’s data binding capabilities with JPA - http://t.co/Ki9G04HV5e
    July 24, 2014 at 9:53 AM
  • Guess what today is? Tell An Old Joke Day - http://t.co/835ORWMX6N! So, why do programmers always confuse Halloween & Xmas? 31 Oct = 25 Dec
    July 24, 2014 at 8:45 AM
  • MT @midwestio: Posted another #midwestio talk recording to our YouTube channel: @MinaMarkham on modular CSS. Watch: http://t.co/aU3LpfUoi4
    July 24, 2014 at 8:25 AM
  • We just posted pictures from our National Hot Dog Day Lunch Cookout. Check them out - http://t.co/To06plaw1C
    July 23, 2014 at 4:14 PM
  • Good free cheat sheet - #Java Performance Optimization Refcard from @DZone: http://t.co/7vBgsmqy08
    July 23, 2014 at 10:48 AM
  • Did you know today is a holiday? It's National Hot Dog Day! We're gearing up for our team lunch hot dog cookout & can't wait to celebrate.
    July 23, 2014 at 9:43 AM
  • Check out our newest blog: #JAXB – A Newcomer’s Perspective, Part 1 http://t.co/Ki9G04HV5e
    July 22, 2014 at 1:22 PM
  • New post on the Keyhole blog by Mark Adelsberger: #JAXB – A Newcomer’s Perspective, Part 1 http://t.co/Ki9G04HV5e
    July 21, 2014 at 2:27 PM
  • If you're a Java dev, you're likely familiar with Annotations. But have you created your own #Java Annotations? Ex - http://t.co/BgCsYjxZKF
    July 18, 2014 at 12:10 PM
  • RT @gamasutra: Don't Miss: Unconventional Tips for Improving your Programming Skills http://t.co/6TFox7CKHU
    July 16, 2014 at 3:20 PM
  • We're about to send out our free monthly tech newsletter. Dev tips/articles via email. Not on the list yet? Sign up - http://t.co/F8h0NSiicZ
    July 15, 2014 at 11:57 AM
  • Have you ever tried creating your own #Java annotations? See a situation where it was beneficial - http://t.co/BgCsYjxZKF
    July 15, 2014 at 8:36 AM
  • There's a new post on the Keyhole blog by @jhackett01: Creating Your Own #Java Annotations - http://t.co/BgCsYjxZKF
    July 14, 2014 at 1:43 PM
  • We love development! Have you seen our weekly team blog? We show how to be successful with the tech we use. See it - http://t.co/nlRtb1XNQH
    July 12, 2014 at 2:35 PM
  • Rapid appdev has a bad rep, but there are ways to bring development time down the right way. Don't Fear the Rapid - http://t.co/aTPcAKOj0r
    July 11, 2014 at 3:10 PM
Keyhole Software
8900 State Line Road, Suite 455
Leawood, KS 66206
ph: 877-521-7769
© 2014 Keyhole Software, LLC. All rights reserved.