Wednesday, March 16, 2011

How To Improve Website Performance (With Drupal, PHP, MySQL and Apache)

SkyHi @ Wednesday, March 16, 2011
Recently, when my new website security scanning startup site launched, I was inundated with traffic – receiving several thousand hits in the space of a few hours. Coupled with several  website security scans my site was running, and the load was more than my server could handle, forcing me to restart Apache several times. After this, I decided to focus on tuning my application performance, based on number of requests served per second, and page load time. In this post, I’ll cover how I tuned the following tech stack components to improve results:
  1. Infrastructure (Hardware) improvements
  2. Apache Settings
  3. PHP
  4. Application performance
Prepare yourself – this is a very in depth tuning article, so grab a cup of coffee and get your tuning hat on!

Web Application Performance Tuning Methodology

When tuning server based applications, it is best to use a pyramid approach. Rather than immediately focusing on profiling or tuning specific functions or code, it’s better to focus on places in the technology stack where a single change improves as many other downstream components as possible. Thus, whenever I approach a tuning problem, I start with the hardware it’s running on, and progress up the stack until I get to code profiling and specific code bottlenecks.
Website Tuning Priority Matrix (pyramid)
For my current tuning exercise, I needed a way to benchmark various changes I made to the environment, and settled on two tools:

Request/second benchmarking: Apache Benchmark (ab) for tuning server requests per second, running with 50 concurrent users and 1000 requests.

Page load speed: Web Page Test, since it allows for an independent look at page load speed.
The fundamental problem I am trying to solve is dealing with high traffic loads, so I am more concerned with the Apache benchmark results than page load speed right now, though I want to improve loading speed as much as possible as well.
I am running my site on a VPS with 1.5Ghz dedicated processor, 1GB RAM, and CentOS 4.9. The application stack uses MySQL for the database, PHP with Apache for the web server, and Drupal 6 with normal caching as the application framework. My development server runs on Fedora Core 14, so all commands shown are the ones used in Fedora, while benchmarks took place against production.

Tuning the Hardware

Hardware is generally not something which can be tuned – we can pay to upgrade it, or potentially upgrade the operating system to take advantage of new hardware optimizations. In my case, the VPS I run on was purchased several years ago, and has not been upgraded since. I looked at various other hosting providers to see what options I had in terms of better hardware at the same price point. Once I saw some competitive offerings, I went back to my hosting company and negotiated with them for new hardware. As a result, they offered to upgrade me to new hardware without changing my pricing structure! All the benchmarks were performed on the old hardware prior to upgrading, but the same general results should hold true. When I have completed the hardware upgrade, I will update this post with the results.

Tuning Apache – Enable GZIP, modify configuration

There are a few things which are relatively painless to implement with Apache. One of these is enabling gzip. I actually thought I had this enabled anyway… but it pays to check using yslow or a similar tool. I also tuned a few key Apache parameters. The result? Page load time decreased nearly 32%, while requests served per second barely moved (they remained within the same confidence interval)

GZIP Page Speed Optimizations Realized

How to enable GZIP with Apache
Enabling gzip is a very easy process. Open your Apache configuration file, and make sure mod_deflate is enabled. You should see a line like this (if you don’t, add it)
LoadModule deflate_module modules/mod_deflate.so
You generally want to compress everything except images (which are compressed already), so add the following lines to your httpd.conf file, and restart apache. You may want to customize this using the Apache documentation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Insert filter
SetOutputFilter DEFLATE
# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
# Don't compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary
# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

Apache Configuration Tuning

There are a few key parameters which should also be tuned:
KeepAlive – Set it to ON, this reduces overhead as users load pages
KeepAliveTimeout – Set this to something reasonably low, 1.5 to 2 times your page load speed.
TimeOut – The default is around 2 minutes – way too long to tie up a single process. I set mine to 20 seconds.
StartServers – The number of threads Apache starts with. I set this to 2 threads. Most sites should do fine with 2-5 startup threads. Busy sites should tune based on the number and speed of CPU’s.
MinSpareServers – How many idle threads should be running to quickly accept new connections. I keep this to one since I am CPU bound, and it has no major impact on connection speed with my current traffic levels.
MaxSpareServers – I set this to 2, so I don’t have many spare servers sitting around after a traffic burst.
MaxClients – The maximum number of clients Apache threads will handle before adding new connection requests to the queue. The main constraint Apache usually hits is server memory, so a good rule of thumb is to set this based on Apache thread memory and your system free memory. I did the following:
1)      See total memory used by each Apache thread:
1
ps -eafly | grep httpd | awk '{print $8}' | sort –n
2)      Check free memory on the system
1
free
Divide the second number into an average of the first to get an idea of what MaxClients should be. Keep in mind that most Unix systems allocate free memory to a file cache, so you should take this into account when determining what is safely usable by Apache (real free memory is free memory + system cache).

Tuning PHP

Drupal and other content management systems are known to be a bit demanding on server resources, using up significant CPU and Memory per Apache thread. My Drupal site was taking upwards of 18MB per process, and WordPress used around 12MB.
The greatest PHP speed boost will generally be gained through opcode caching and full page caching. Opcode caching is pre-compiling PHP scripts in memory, to avoid the overhead of parsing and executing a script every time it is called. Full page caching goes a step further and caches the entire output of a page, serving it as if it were static content. Blogs which are updated once a week for instance, could cache all pages for a week at a time, and serve them as if they were static content, giving huge performance boosts (Although the cached version would need to be invalidated after every comment). My site is too dynamic to allow for full page caching, but pages running largely static content sites may find it useful.

Enabling opcode caching on the server

To enable OpCode caching, I bench-marked three systems: Zend community edition server, eAccelerator, and APC. Each will automatically begin working once installed without any extra work. I also took additional factors into account such as availability and API offerings.

APC is part of PHP and available as an add-in module. It offers an excellent caching API for disk and memory caching. Supposedly, it will make it into PHP core in a future release.
To install:
1
2
3
4
yum install php-devel    #install pre-req
pecl install apc                   #install APC with pecl
echo "extension=apc.so" > /etc/php.d/apc.ini   #enable APC
/etc/init.d/apachectl restart           #restart apache

Zend is a well-known performance server with a powerful API and development framework. There is an open source community edition server available, which includes a nice API, and which is the version I tested with.
To install, Zend has a great guide for various platforms.

eAccelerator is the continuation of MMCache, though in the latest version the caching API was removed. Now it essentially only performs opcode caching.
To install eAccelerator:
1
yum install php-eaccelerator

Opcode caching benchmark results

Overall, APC gave me the best results, increasing requests per second by 98%, compared to 87% for eAccelerator and only 51% for Zend CE. All slightly increased page load times, but not by a statistically significant amount.
PHP Opcode benchmark improvementsI also wanted to use partial page caching for commonly used elements in my code, which is possible using both APC and Zend. Given the API’s offered by these tools, I removed eAccelerator from the running and chose APC as my opcode caching framework.  After monitoring APC for some time, I found I needed to increase the cache size to 64MB from the default 32 to prevent fragmentation.

Additional benchmarking considerations

I can’t say these figures will hold true for every site, since other benchmarks have shown different results for Drupal overall. Most notable, 2bits benchmarked APC vs. eAccelerator and APC vs. Zend CE
Zend also published a benchmark study (using their commercial server) which showed even better results than shown by 2bits.
Finally, on my development server, eAccelerator performed the best, hitting more than 200 requests/second, compared to 150 for APC and 110 for Zend, so do some testing on your own install to see what will work for you.
There are other PHP optimization methods I didn’t cover, some of which are covered in a great set of slides for Apache and PHP performance tuning.

Application Tuning

Because I use Drupal and am not willing to modify core, I can generally only tune my own custom code and modules. I profiled several of the highest traffic pages using Xdebug but found that Drupal functions were using up more than 95% of the system resources. Thus, I started to look at how I could reduce the number of function calls in my application. I wrote my own custom theme, and two places with a substantial number of function calls are in the generation of the site header and footer. The footer always remains the same for users, but the header and menus are different for every user and page, due to title and meta tags. Thus, I decided to use APC shared memory caching to cache header and footer output on a per-user and per-page basis. I wanted to remove the dependency on APC from my code, in case it was disabled or not working for whatever reason.
To accomplish this, I first removed the header and footer generation code from my page.tpl.php (the main template file in Drupal) and put them into their own include files. Then, I generated the header and footer sections into a variable, and finally stored that variable in the APC cache if it was available. On the next page request, I return the cached version instead of regenerating it.

Header generation include file:
1
2
3
4
5
/*generate the entire page header and store in a
 *single variable for caching. Actual code
 *not shown, since there is a lot.
 */
$custom_header = 'lots of HTML and PHP'

Top section of page.tpl.php (header cache):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
//make sure APC is enabled and working
if(function_exists('apc_fetch'))
{
    //check to see if we have a header in the cache for this user and this page
    $custom_header_apc = apc_fetch("header-{$user->uid}-{$head_title}");
    if($custom_header_apc === false)    //nothing in cache
    {
        //include file generates a header variable custom_header
        require_once 'generate_header.php';
        //store the header in the cache for next time
        apc_store("header-{$user->uid}-{$head_title}",$custom_header, 24*3600);
        //prepare to print the header
        $custom_header_apc = $custom_header;
    }
    echo $custom_header_apc;    //send the header
}
else    //APC is not installed
{
    //just generate the header variable and print it
    require_once 'generate_header.php';
    echo $custom_header;
}

This resulted in an additional 15% gain in requests served per second, with no change in page load time.
Performance Improvements from Shared Memory Cache
Additional gains could be made if I cached node contents, or entire pages. Nearly all of my high traffic pages are not cacheable, as they are different for each user, and may vary during a single user session greatly, but the more static a site, the more can be cached.

What Else?

Notably missing from this analysis is MySQL tuning. I left this off since MySQL currently takes up no more than 15% of server resources during high load periods, and most queries were optimized at the time of writing them, along with the data model. However, database tuning should be an integral part of holistic application tuning in general, so be sure to check MySQL slow query logs, and make sure all sql joins use indexes where possible when doing your own application tuning.
Are there other areas readers have modified to see improved requests per second?

REFERENCES