Scaling PHP

Introduction to Scaling PHP Applications – Part 1

by on November 18, 2013 11:59 am

This is the first blog in a two-part series on scaling PHP applications. Part one will focus on replacing Apache while part two will go into more advanced topics such as Master-Master replication and session storage.

Introduction

Making a website able to handle high amounts of traffic is one of the cornerstones of modern web applications. It’s a process that takes time, and it’s obvious to everyone when a website isn’t able to handle traffic. (i.e. HealthCare.gov) A lot of work is needed to make a web application able to handle high loads. I’ve done this specifically with code written in PHP.

I started out developing in PHP when LAMP (Linux + Apache + PHP + MySQL) was all the rage. It was great for quick prototyping and for small-sized applications. It seemed that at the time, LAMP was the greatest thing since sliced bread. That was until I started writing code that was used by hundreds of people every day. Traffic spikes, high CPU and RAM usage, and manual restarts just seemed like part of being a PHP developer.

PHP has a bad reputation for not handling well under pressure. After refactoring and refactoring with no success, many PHP developers will feel like calling it quits. If our LAMP model limits us just a few dozen users, why use PHP at all?

I was in that boat not too long ago, until I came across Steve Carona’s life-changing Scaling PHP to Millions of Users. I found ways to run my code +500% faster, and increase uptime to nearly 99%.

In this blog, I’ll go over my experience with scaling PHP applications. I’ll describe some of the strategies and pitfalls, and hopefully provide guidance to all those wayward developers who want to swear off PHP completely. My first-hand experience may help clarify areas of Steve’s book. Or, I could inspire someone to take control of their stack and save it from oblivion.

Scaling Goals

When I talk about scaling, I am referring to the ability of an application architecture to gracefully respond to high loads. This involves everything from the DNS to the web server to the version of PHP to the database. Some of the strategies described in this blog are specific to PHP, but others (e.g. Nginx, Percona XtraDB Cluster) can apply to other web applications.

The goals any scaling project should have are:

  1. Maximize performance under high loads
  2. Maximize automatic handling of failure
  3. Minimize code changes necessary to achieve previous two goals

Step 1 – Use Nginx for Static File Serving

Taking the first step is usually the hardest. If it fails, the rest of your plans slowly crumble. If it succeeds, the struggles in the next steps at least have some previous success to drive them forward. That is why I chose Nginx as the first step. It’s simple, easy to use, and gives huge performance gains over Apache. It’s so easy that even I can’t screw it up.

In LAMP, Apache has two primary jobs: serving static files and calling mod_php to evaluate PHP scripts. One service doing two jobs could be a pro, but in terms of scaling it is a con. When a user hits Apache with a request for a static file, the forked Apache process that retrieves the file has mod_php built into it. So all that RAM and CPU time used by that fork with mod_php is wasted for that request. That’s waste we can easily cut out.

This area is where Nginx shines. It can serve us static files, and not care that we’re using PHP as our server-side language. When it receives a PHP request, it can send that off to Apache without blocking the other forks. It doesn’t have mod_php in the forks, so no waste there either. Nginx forks are magnitudes smaller in both RAM and CPU usage than Apache forks. Nginx is event-driven while Apache is process-driven, so we get an asynchronous and non-blocking as a bonus.

Putting Nginx on the front with PHP requests served to Apache on the back is a really easy first step to take in this process. The only caveat I’ve experienced is with gzip and PHP’s flush() method. Nginx needs some special configuration for long running processes whose output is gzipped. I’ll leave that and reading the Nginx pitfalls/best practices as a homework assignment.

Step 2 – Use PHP-FPM for PHP Processing

Once you’ve made the first step and have committed to making your architecture better, you can’t wait to see how much better the other steps will make your stack. I know the first time that I ran benchmarks of Nginx vs. Apache, I was stunned. I calculated that the stack I was scaling could handle 200% to 300% more traffic while using less resources with Nginx on front.

Now, for my favorite part of this whole thing: getting rid of Apache. I remember back in the day when Apache was the bee’s knees. Sure, it took me a few hours to install due to resolving dozens of dependencies, but it worked and it worked well. Oh what a fool I was.

When it comes to evaluating PHP on a large scale, PHP-FPM is the gold standard. It and Nginx are the de facto winner of high load PHP processing. Rather than being somewhat good at hundreds of things like Apache, PHP-FPM is only concerned about one thing: evaluating PHP as quickly as possible.

And it’s really, really good at it. So good that it went from a user-maintained patch to being incorporated by the PHP team into PHP itself. PHP-FPM is a FastCGI server that bundles PHP with each fork. This allows for PHP requests to be evaluated as quickly as possible with as little red tape to go through as possible.

PHP-FPM has been supported by the PHP team since 5.3. If you’re not on 5.3 or newer yet, get on the bandwagon. It’s faster, more reliable and consistent, and allows for some pretty cool syntactic sugar. The process of upgrading can be hard, especially with a large code base. The advantages though make all the time spent worth it.

Along with using PHP-FPM, I’d highly recommend using an opcode cache. These extensions will cache PHP’s evaluation of a given file so that subsequent requests won’t have to do the evaluation again. I’ve tried the Zend OpCache, but had issues with seg faults for long-running processes. APC is the best one I’ve used, and would recommend it to anyone and everyone.

Conclusion

If you’re stuck on the single-server LAMP model, you could stop here and be very happy with your results. We still have no code changes (barring upgrades to a newer version of PHP), and will have anywhere from 300% to 500% better performance over Apache. The next steps are geared more towards a true scaled application, but finishing at this step is a great starting point.

Don’t miss Part Two!

— Zach Gardner, asktheteam@keyholesoftware.com

  • Share:

11 Responses to “Introduction to Scaling PHP Applications – Part 1”

  1. Jeff says:

    I recently switched from Apache+mod_php to Nginx+php_fpm, and while I really like the increased efficiency and simple configuration it offers, I might have to switch to Nginx+php (via a reverse proxy to Apache+mod_php), or go back to Apache+mod_php altogether because of an issue I’ve been having with php_fpm. The issue didn’t show up in testing. Only in production, at scale. The issue is that a super small percentage (way less than 1%) of PHP requests don’t return anything at all. I’m talking like a white page! There are a bunch of 499, 502, and 504 errors in the access/error logs, but no other errors that I’ve found that might lead me to what is causing the issue. I’ve tried everything, but nothing has fixed the problem. It’s uber frustrating!

    • Zach Gardner says:

      Might be that Nginx is out of workers to process the request, and none free up before the timeout hits. Could also be true of PHP-FPM running out of workers.

      Did you do any benchmarks to validate it’s running faster? I’d recommend using siege because it uses less memory than ab. You can find out at what load you start getting 499, 502, and 504s. I’d bet it’s not due to either technology but by not having it configured for really high loads.

      I didn’t talk about how to configure Nginx or PHP-FPM for high loads. The Scaling PHP book does talk about it. There are a lot of tuning Nginx sites out there you can also look at. They should also go into what settings are needed on the OS to make it better for high loads.

      • Jeff says:

        No, I was silly and did not think to benchmark before switching over.

        I’ve purchased the book, and have tweaked a few things. I’ll report back on my progress!

        • Zach Gardner says:

          Good luck on the process. I’m not sure if the book’s author will continue working on it or not, but it’s a treasure trove of information. I was able to get a lot out of it, even in its beta form.

      • Steve says:

        Disable Apc altogether and use zend opcache. We had the same issue and tracked it down to a bug with Apc not being compatible with the latest versions of php.

        • Zach Gardner says:

          Running APC on 5.4 worked really well for me. I had issues with long running transactions (e.g. an import script ran needed to run for 1.5 hours) with Zend opcache causing seg faults.

          Out of curiosity, how did you both stumble on this article? Trying to get the word out about this and the next post.

    • Zach Gardner says:

      It could also be a seg fault. I would define an auto_prepend_file in your php.ini with something like this: https://gist.github.com/zgardner/7658233 I didn’t test it, but it should give you the idea. If you make sure that you write a START file when a request comes in and an END file when the script finishes, you can look for all requests that have a START but no END. If you still have scripts giving non-200’s and all STARTS have an END, then you know your issue isn’t with PHP.

      • Jeff says:

        I’ll give that a shot next. Thanks! I’m reading through the book now. It’s excellent! I haven’t implemented all of the suggestions yet, but was able to eliminate the 504 errors I was seeing with Socket.io, so that’s good! I’ll give this a try too…

      • Jeff says:

        Update update!

        It turns out it wasn’t nginx or PHP. It was the REST API I was using to communicate between PHP and Node.js. I replaced it with Redis Pub/Sub and the problem went away.

        Thanks for all of your help!

        Also, I never answered your question… I found this blog post through the PHP Weekly e-mail newsletter.

  2. […] is the second blog in a two-part series on scaling PHP applications. The first blog in the series focused on replacing Apache+mod_php with Nginx+PHP-FPM. This blog will go into advanced topics […]

  3. […] This was originally posted on the Keyhole Software blog on 11/18/2013. […]

Leave a Reply

Things Twitter is Talking About
  • How do we harness the power of callbacks without the confusing mess of nested functions in #JavaScript? Promises - http://t.co/obK811q48q
    October 21, 2014 at 2:18 PM
  • Pssst... Our free monthly newsletter comes out tomorrow with dev tips/articles via email. Not on the list? Sign up: http://t.co/F8h0NSzleZ
    October 21, 2014 at 12:05 PM
  • Did you know today is Clean Your Virtual Desktop Day? It really is: https://t.co/TCRpWgTmxg Celebrate by organizing your desktop files.
    October 20, 2014 at 4:50 PM
  • Don't miss the newest post from @bricemciver: Make Me a Promise - http://t.co/obK811q48q #JavaScript
    October 20, 2014 at 10:43 AM
  • RT @DZone: #Docker 1.3 Releases with Security, Signed Images, and Process Injection by @bendzone #devops http://t.co/uytIwFPgO6
    October 17, 2014 at 10:04 AM
  • If you have 15+ years #Java exp, you don't expect to be puzzled debugging a null pointer exception. See an exception: http://t.co/m2iDgNEleK
    October 17, 2014 at 9:51 AM
  • Many on our team attended the #Royals victory last night & @cdesalvo even got a selfie with the Gov. Go #KansasCity! http://t.co/N1Psooe2CE
    October 16, 2014 at 3:39 PM
  • Interesting ExplainLikeI'm5 talk: Why do companies develop iOS first when Android holds 70% of the 'Smart' Market? http://t.co/fxgjIBmqBi
    October 16, 2014 at 12:26 PM
  • We're looking for a top-notch #Java developer to join our team. Learn more about our company culture & the role - http://t.co/0fKsFmN0Ql
    October 16, 2014 at 9:08 AM
  • Want to learn to create custom #Java annotations & process them using the Reflection API? @jhackett01's tutorials - http://t.co/mf1F3eIDY3
    October 15, 2014 at 11:43 AM
  • Happy Ada Lovelace Day! It's a celebration of the achievements of women in STEM - if there's a woman in tech that you admire, tell her today
    October 15, 2014 at 9:13 AM
  • .@fpmoles We absolutely agree - thanks for reading!
    October 15, 2014 at 8:13 AM
  • With 15 yrs exp, @bmongar didn't expect surprise when debugging a null pointer exception. Why it puzzled him - http://t.co/m2iDgNEleK #Java
    October 14, 2014 at 11:20 AM
  • #Royals fans with tickets to tonight's canceled game, here's what you need to know - http://t.co/EErHht3zoN
    October 13, 2014 at 4:23 PM
  • RT @UzilitySoftware: Watch as Wayne explains to the boss, Marvin, what an agile board is about. #scrumalliance #scrum http://t.co/5MzB1bNw…
    October 13, 2014 at 12:01 PM
  • Getting started with #MongoDB? (Flexible #NoSQL for Humongous Data) Here's a free cheat sheet from the folks @Dzone - http://t.co/oBMvICzfcL
    October 13, 2014 at 11:10 AM
  • Brad Mongar's newest post is live on the Keyhole blog - #Java and the Sweet Science http://t.co/m2iDgNEleK
    October 13, 2014 at 8:59 AM
  • RT @housecor: If users have share links to your web app like this: "Go to here. Then click here. Then here." You're doing it wrong. #de
    October 10, 2014 at 2:18 PM
  • CSS is 20 years old today! Happy birthday, #CSS - web design would not be the same without you. http://t.co/8tEMoUjorI
    October 10, 2014 at 9:55 AM
  • Expansion update: remodel, electrical & mudding done; painting in process; carpet to go. We can't wait for our bigger team rooms!
    October 10, 2014 at 8:42 AM
Keyhole Software
8900 State Line Road, Suite 455
Leawood, KS 66206
ph: 877-521-7769
© 2014 Keyhole Software, LLC. All rights reserved.