Friday, March 8, 2013

Benchmarking Composer autoloading, Symfony and eZ Publish 5

Context

One fine day, I decided to benchmark how fast can eZ Publish 5, and the underlying stack, serve requests without using a true reverse proxy, in which case I would benchmark the reverse proxy and not the PHP implementation that lies behind the product.
What I did in the first place, is installing a fresh version of eZ Publish 5 directly from GitHub (commit ID: 3af1f1ad78264f9b36eb250a14ce4301d556e7ab), setting up a site based on ezdemo and take some base results:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    31.38 [#/sec] (mean)
Time per request:       31.870 [ms] (mean)
Time per request:       31.870 [ms] (mean, across all concurrent requests)
Transfer rate:          1181.94 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    30   32   2.7     32     109
Waiting:       28   30   2.6     30     106
Total:         30   32   2.7     32     109
About 32 reqs/second is what I can expect from a stock eZ Publish 5 installation of ezdemo.
The first thing to speed this up is to activate the content view caching using TTL, this is done by adding:
eng:
    content:
        view_cache: true
        ttl_cache: true
        default_ttl: 30
under the configuration of the siteaccess (here: "eng") in the ezpublish/config/ezpublish.yml file.
After I cleaned the cache manually ($ rm -rf ezpublish/cache/prod/*), I ran the benchmark again:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    213.50 [#/sec] (mean)
Time per request:       4.684 [ms] (mean)
Time per request:       4.684 [ms] (mean, across all concurrent requests)
Transfer rate:          8030.29 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     4    5   0.2      5       7
Waiting:        4    4   0.2      4       6
Total:          4    5   0.2      5       7
This is of course a very impressive win since it allows caching the content of the whole page without having to connect to the database. Again, a real proxy server would outperform this by far, but this isn't the goal of this article which focus more on the PHP parts.
213 reqs/second is certainly not bad, but let's analyze where the time is spent to see if this can be improved further. To diagnose where the possible bottlenecks are, I am going to use xhprof, that gives me the following numbers:
Total Incl. Wall Time (µs): 6,510
Number of Function Calls: 2,700
Function Name Calls Calls% Incl. Wall Time (µs) IWall% Excl. Wall Time (µs) EWall%
load::prod/classes.php 1 0.0% 518 8.0% 518 8.0%
Composer\Autoload\ClassLoader::findFile 17 0.6% 389 6.0% 308 4.7%
Symfony\Component\HttpFoundation\ResponseHeaderBag::set 41 1.5% 630 9.7% 263 4.0%
Symfony\Component\HttpFoundation\HeaderBag::set 59 2.2% 303 4.7% 237 3.6%
Symfony\Component\HttpFoundation\ServerBag::getHeaders 4 0.1% 231 3.5% 230 3.5%
Symfony\Component\HttpFoundation\HeaderBag::get 48 1.8% 194 3.0% 194 3.0%
run_init::composer/autoload_classmap.php 1 0.0% 157 2.4% 157 2.4%
run_init::prod/classes.php 1 0.0% 136 2.1% 130 2.0%
Symfony\Component\HttpFoundation\Request::normalizeQueryString 12 0.4% 127 2.0% 120 1.8%
Symfony\Component\HttpFoundation\HeaderBag::has 45 1.7% 119 1.8% 119 1.8%
Symfony\Component\HttpFoundation\ParameterBag::get 92 3.4% 114 1.8% 114 1.8%
main() 1 0.0% 6,51 100.0% 112 1.7%
Symfony\Component\HttpFoundation\Request::prepareBaseUrl 4 0.1% 462 7.1% 111 1.7%
file_get_contents 6 0.2% 89 1.4% 89 1.4%
Symfony\Component\HttpFoundation\Response::prepare 4 0.1% 154 2.4% 87 1.3%
DateTime::__construct 12 0.4% 84 1.3% 84 1.3%
Symfony\Component\HttpFoundation\HeaderBag::__construct 8 0.3% 635 9.8% 81 1.2%
ComposerAutoloaderInit5b5f57c871b8c5b6e57138c3683a3315::getLoader 1 0.0% 399 6.1% 81 1.2%
Symfony\Component\HttpFoundation\Response::sendContent 1 0.0% 80 1.2% 80 1.2%
Symfony\Component\HttpFoundation\HeaderBag::parseCacheControl 12 0.4% 127 2.0% 70 1.1%
file_exists 17 0.6% 70 1.1% 70 1.1%
Symfony\Component\HttpFoundation\Request::create 3 0.1% 420 6.5% 69 1.1%
Symfony\Component\HttpKernel\HttpCache\Store::lookup 4 0.1% 1,198 18.4% 66 1.0%
DateTime::createFromFormat 8 0.3% 61 0.9% 61 0.9%
Symfony\Component\HttpFoundation\Response::getAge 11 0.4% 343 5.3% 61 0.9%
preg_match_all 12 0.4% 57 0.9% 57 0.9%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle@1 2 0.1% 2,351 36.1% 56 0.9%
Symfony\Component\HttpFoundation\Request::isSecure 16 0.6% 87 1.3% 55 0.8%
Symfony\Component\HttpFoundation\Request::prepareRequestUri 4 0.1% 250 3.8% 53 0.8%
Symfony\Component\ClassLoader\ClassCollectionLoader::load 1 0.0% 827 12.7% 52 0.8%
Symfony\Component\HttpFoundation\Request::initialize 4 0.1% 398 6.1% 52 0.8%
get_declared_classes 1 0.0% 51 0.8% 51 0.8%
Symfony\Component\HttpFoundation\Request::getHost 8 0.3% 112 1.7% 51 0.8%
Note that the initial benchmark was done without xhprof loaded so that the performances are optimal. This is why you can see request time about 6ms with xhprof, while the average is in reality about 4.684ms. This is due to xhprof's own overhead.
The first thing to note is that the most time consuming operation (based on "Excl. Wall Time" which doesn't take into account the time spent in children calls) is: load::prod/classes.php with about 8%. This file is supposed to contain the most frequently used classes combined in one file to minimize stats calls when a PHP bytecode cache mechanism is used like APC or Zend Optimizer+. This is triggered by: $kernel->loadClassCache(); in web/index.php file.
Removing that call and benchmarking the page again, here is the result I get:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    254.53 [#/sec] (mean)
Time per request:       3.929 [ms] (mean)
Time per request:       3.929 [ms] (mean, across all concurrent requests)
Transfer rate:          9573.73 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     4    4   0.2      4       5
Waiting:        3    4   0.2      4       5
Total:          4    4   0.2      4       5
The improvement from 213.50 to 254.53 requests/sec. by removing the call to loadClassCache() means that by including this file we are loading way more classes than what is necessary to serve a cached page. This can probably be tuned, but since we just have better performance by omitting this optimization, let's just continue without it.
Running xhprof again, those are the new profiling numbers.
Total Incl. Wall Time (µs): 6,061
Number of Function Calls: 2,784
Function Name Calls Calls% Incl. Wall Time (µs) IWall% Excl. Wall Time (µs) EWall%
Composer\Autoload\ClassLoader::findFile 23 0.8% 468 7.7% 394 6.5%
Symfony\Component\HttpFoundation\ResponseHeaderBag::set 41 1.5% 647 10.7% 261 4.3%
Symfony\Component\HttpFoundation\HeaderBag::set 59 2.1% 310 5.1% 245 4.0%
Symfony\Component\HttpFoundation\ServerBag::getHeaders 4 0.1% 239 3.9% 238 3.9%
Symfony\Component\HttpFoundation\HeaderBag::get 48 1.7% 208 3.4% 200 3.3%
run_init::composer/autoload_classmap.php 1 0.0% 149 2.5% 149 2.5%
Symfony\Component\HttpFoundation\HeaderBag::has 45 1.6% 135 2.2% 135 2.2%
Symfony\Component\HttpFoundation\Request::normalizeQueryString 12 0.4% 140 2.3% 129 2.1%
Symfony\Component\HttpFoundation\Request::prepareBaseUrl 4 0.1% 513 8.5% 114 1.9%
Symfony\Component\HttpFoundation\ParameterBag::get 92 3.3% 111 1.8% 110 1.8%
main() 1 0.0% 6,061 100.0% 106 1.7%
Symfony\Component\HttpFoundation\Response::prepare 4 0.1% 182 3.0% 90 1.5%
Symfony\Component\HttpFoundation\HeaderBag::__construct 8 0.3% 665 11.0% 88 1.5%
file_get_contents 6 0.2% 86 1.4% 86 1.4%
DateTime::__construct 12 0.4% 84 1.4% 84 1.4%
ComposerAutoloaderInit5b5f57c871b8c5b6e57138c3683a3315::getLoader 1 0.0% 372 6.1% 78 1.3%
Composer\Autoload\ClassLoader::loadClass 13 0.5% 1,022 16.9% 75 1.2%
Symfony\Component\HttpFoundation\HeaderBag::parseCacheControl 12 0.4% 128 2.1% 74 1.2%
Symfony\Component\HttpFoundation\Response::sendContent 1 0.0% 71 1.2% 71 1.2%
file_exists 23 0.8% 71 1.2% 71 1.2%
Symfony\Component\HttpFoundation\Request::initialize 4 0.1% 586 9.7% 69 1.1%
Symfony\Component\HttpKernel\HttpCache\Store::lookup 4 0.1% 1,380 22.8% 68 1.1%
Symfony\Component\HttpFoundation\Request::create 3 0.1% 414 6.8% 66 1.1%
Symfony\Component\HttpFoundation\Response::getAge 11 0.4% 339 5.6% 61 1.0%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle@1 2 0.1% 2,512 41.4% 60 1.0%
Symfony\Component\HttpFoundation\Request::isSecure 16 0.6% 90 1.5% 58 1.0%
DateTime::createFromFormat 8 0.3% 57 0.9% 57 0.9%
Symfony\Component\HttpFoundation\Request::getHost 8 0.3% 125 2.1% 56 0.9%
Symfony\Component\HttpFoundation\Request::prepareRequestUri 4 0.1% 281 4.6% 54 0.9%
preg_match_all 12 0.4% 54 0.9% 54 0.9%
Symfony\Component\HttpFoundation\HeaderBag::getCacheControlHeader 7 0.3% 66 1.1% 52 0.9%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle 1 0.0% 4,172 68.8% 50 0.8%
eZ\Publish\Core\MVC\Symfony\Cache\Http\LocationAwareStore::getPath 8 0.3% 99 1.6% 50 0.8%
Symfony\Component\HttpKernel\HttpCache\HttpCache::restoreResponseBody@1 2 0.1% 1,047 17.3% 49 0.8%
load::HttpFoundation/Request.php 1 0.0% 47 0.8% 47 0.8%
Those numbers shows us that autoloading is one of the part that takes the most time and made me remember about the web/index.php file containing some APC related commented code:
// Use APC for autoloading to improve performance:
// Change 'ezpublish' to a unique prefix in order to prevent cache key conflicts
// with other applications also using APC.
//
// ( Not needed when using `php composer.phar dump-autoload --optimize` )
/*
$loader = new ApcClassLoader( 'ezpublish', $loader );
$loader->register( true );
*/
Reading at this comment, I realize I should also give a try to the --optimize flag proposed by Composer. Let's try both approaches.

Approach 1: Activating ApcClassLoader

After uncommenting the relevant lines in web/index.php file, I run the benchmark again:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    270.42 [#/sec] (mean)
Time per request:       3.698 [ms] (mean)
Time per request:       3.698 [ms] (mean, across all concurrent requests)
Transfer rate:          10171.41 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     3    4   0.2      4       5
Waiting:        3    3   0.2      3       4
Total:          3    4   0.2      4       5
Performing that simple change decreased the request time by about 6%, let's look at xhprof's data again:
Total Incl. Wall Time (µs): 5,696
Number of Function Calls: 2,549
Function Name Calls Calls% Incl. Wall Time (µs) IWall% Excl. Wall Time (µs) EWall%
Symfony\Component\HttpFoundation\ServerBag::getHeaders 4 0.2% 278 4.9% 269 4.7%
Symfony\Component\HttpFoundation\ResponseHeaderBag::set 41 1.6% 627 11.0% 249 4.4%
Symfony\Component\HttpFoundation\HeaderBag::set 59 2.3% 307 5.4% 241 4.2%
Symfony\Component\HttpFoundation\HeaderBag::get 48 1.9% 187 3.3% 187 3.3%
run_init::composer/autoload_classmap.php 1 0.0% 164 2.9% 164 2.9%
Symfony\Component\HttpFoundation\HeaderBag::has 45 1.8% 133 2.3% 129 2.3%
Symfony\Component\HttpFoundation\Request::normalizeQueryString 12 0.5% 132 2.3% 119 2.1%
main() 1 0.0% 5,696 100.0% 116 2.0%
Symfony\Component\HttpFoundation\Request::prepareBaseUrl 4 0.2% 469 8.2% 116 2.0%
Symfony\Component\HttpFoundation\ParameterBag::get 92 3.6% 111 1.9% 111 1.9%
Symfony\Component\HttpFoundation\HeaderBag::__construct 8 0.3% 644 11.3% 89 1.6%
Symfony\Component\HttpFoundation\Response::prepare 4 0.2% 167 2.9% 87 1.5%
file_get_contents 6 0.2% 87 1.5% 87 1.5%
DateTime::__construct 12 0.5% 86 1.5% 86 1.5%
ComposerAutoloaderInit155d41b24b236d0fd561c62d5063d8bb::getLoader 1 0.0% 400 7.0% 84 1.5%
Symfony\Component\HttpFoundation\HeaderBag::parseCacheControl 12 0.5% 129 2.3% 72 1.3%
Symfony\Component\ClassLoader\ApcClassLoader::loadClass 12 0.5% 633 11.1% 72 1.3%
Symfony\Component\HttpFoundation\Response::sendContent 1 0.0% 71 1.2% 71 1.2%
Symfony\Component\HttpKernel\HttpCache\Store::lookup 4 0.2% 1,313 23.1% 70 1.2%
Symfony\Component\HttpFoundation\Request::create 3 0.1% 452 7.9% 69 1.2%
Symfony\Component\HttpFoundation\Request::initialize 4 0.2% 561 9.8% 69 1.2%
Symfony\Component\HttpFoundation\Response::getAge 11 0.4% 341 6.0% 63 1.1%
load::HttpFoundation/Request.php 1 0.0% 59 1.0% 59 1.0%
Symfony\Component\HttpFoundation\Request::isSecure 16 0.6% 90 1.6% 58 1.0%
DateTime::createFromFormat 8 0.3% 58 1.0% 58 1.0%
Composer\Autoload\ClassLoader::findFile 2 0.1% 68 1.2% 57 1.0%
preg_match_all 12 0.5% 57 1.0% 57 1.0%
Symfony\Component\HttpFoundation\Request::getHost 8 0.3% 114 2.0% 55 1.0%
From those numbers, we can see that Composer's methods that appeared previously in the topmost part of the most costing operations have disappeared
Now is time to compare this with the suggested second approach.

Approach 2: Using Composer classmap generation

Composer has a feature that generates a classmap of all classes by running:
$ ./composer.phar dump-autoload --optimize
This command created a 662KiB vendor/composer/autoload_classmap.php file containing an array that is a hash composed of the class name as index and the path to the file containing the class definition as value. At the time I am writing this post, this array is composed of 4168 entries.
Benchmarking again to see how it improves the average request time:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    197.95 [#/sec] (mean)
Time per request:       5.052 [ms] (mean)
Time per request:       5.052 [ms] (mean, across all concurrent requests)
Transfer rate:          7445.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     5    5   4.0      5     131
Waiting:        4    5   3.8      4     126
Total:          5    5   4.0      5     131
Although it should give us the most efficiant autoloading mechanism, it actually slows things down (from 254.53 reqs/second to 197.95). The reason being that even if the file is cached by APC, the PHP array containing the map with more than 4100 entries needs to be re-created at every single request. This is easily discovered looking at xhprof run report:
Total Incl. Wall Time (µs): 6,169
Number of Function Calls: 2,480
Function Name Calls Calls% Incl. Wall Time (µs) IWall% Excl. Wall Time (µs) EWall%
run_init::composer/autoload_classmap.php 1 0.0% 775 12.6% 775 12.6%
Symfony\Component\HttpFoundation\ServerBag::getHeaders 4 0.2% 260 4.2% 258 4.2%
Symfony\Component\HttpFoundation\ResponseHeaderBag::set 41 1.7% 616 10.0% 248 4.0%
Symfony\Component\HttpFoundation\HeaderBag::set 59 2.4% 303 4.9% 237 3.8%
Symfony\Component\HttpFoundation\HeaderBag::get 48 1.9% 187 3.0% 187 3.0%
Symfony\Component\HttpFoundation\HeaderBag::has 45 1.8% 122 2.0% 122 2.0%
Symfony\Component\HttpFoundation\Request::normalizeQueryString 12 0.5% 133 2.2% 118 1.9%
Symfony\Component\HttpFoundation\Request::prepareBaseUrl 4 0.2% 499 8.1% 115 1.9%
main() 1 0.0% 6,169 100.0% 112 1.8%
Symfony\Component\HttpFoundation\ParameterBag::get 92 3.7% 114 1.8% 111 1.8%
file_get_contents 6 0.2% 97 1.6% 97 1.6%
Symfony\Component\HttpFoundation\HeaderBag::__construct 8 0.3% 629 10.2% 86 1.4%
Symfony\Component\HttpFoundation\Response::prepare 4 0.2% 156 2.5% 85 1.4%
ComposerAutoloaderInit4fe043dafc1de5ca2f306c3fb5c6bd90::getLoader 1 0.0% 1,012 16.4% 82 1.3%
DateTime::__construct 12 0.5% 80 1.3% 80 1.3%
Symfony\Component\HttpFoundation\Response::sendContent 1 0.0% 77 1.2% 77 1.2%
Composer\Autoload\ClassLoader::loadClass 13 0.5% 593 9.6% 75 1.2%
Symfony\Component\HttpFoundation\HeaderBag::parseCacheControl 12 0.5% 129 2.1% 72 1.2%
Symfony\Component\HttpFoundation\Request::create 3 0.1% 434 7.0% 68 1.1%
Symfony\Component\HttpKernel\HttpCache\Store::lookup 4 0.2% 1,296 21.0% 66 1.1%
Symfony\Component\HttpFoundation\Request::initialize 4 0.2% 535 8.7% 66 1.1%
Symfony\Component\HttpFoundation\Response::getAge 11 0.4% 342 5.5% 63 1.0%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle@1 2 0.1% 2,384 38.6% 58 0.9%
DateTime::createFromFormat 8 0.3% 58 0.9% 58 0.9%
preg_match_all 12 0.5% 57 0.9% 57 0.9%
Symfony\Component\HttpFoundation\Request::isSecure 16 0.6% 89 1.4% 57 0.9%
Symfony\Component\HttpFoundation\Request::prepareRequestUri 4 0.2% 268 4.3% 54 0.9%
Symfony\Component\HttpFoundation\Request::getHost 8 0.3% 129 2.1% 53 0.9%
load::HttpFoundation/Request.php 1 0.0% 48 0.8% 48 0.8%
Knowing that most of the classes involved in the stack relies on PSR-0 let's try a simpler approach...

Simplifying the autoloading

PSR-0 gives us a nice "fully qualified classname to filename" approach but things gets a bit more complicated given that the different dependencies installed by Composer may have common namespaces (sub)paths and all of them starting from a different root directory.
Using Linux, I thought about leveraging my filesystem abilities and PSR-0 at the same time: let's create a tree made of symbolink links to recreate a straightforward "fully qualified classname to filename" strategy again:
$ mkdir vendor/PSR-0
$ cd vendor/PSR-0
$ ln -s ../kriswallsmith/assetic/src/Assetic/ ../ezsystems/ezpublish-kernel/eZ/ ../ezsystems/demobundle/EzSystems/ ../monolog/monolog/src/Monolog/ ../psr/log/Psr/ ../sensio/generator-bundle/Sensio/ ../twig/twig/lib/Twig/ ../qafoo/rmf/src/main/Qafoo/ .
$ mkdir Symfony
$ cd Symfony
$ ln -s ../../symfony/symfony/src/Symfony/Bridge/ ../../../vendor/symfony-cmf/routing/Symfony/Cmf/ ../../symfony/symfony/src/Symfony/Component/ .
$ mkdir Bundle
$ cd Bundle
$ ln -s ../../../symfony/assetic-bundle/Symfony/Bundle/AsseticBundle/ ../../../symfony/symfony/src/Symfony/Bundle/FrameworkBundle/ ../../../symfony/monolog-bundle/Symfony/Bundle/MonologBundle/ ../../../symfony/symfony/src/Symfony/Bundle/SecurityBundle/ ../../../symfony/swiftmailer-bundle/Symfony/Bundle/SwiftmailerBundle/ ../../../symfony/symfony/src/Symfony/Bundle/TwigBundle/ ../../../symfony/symfony/src/Symfony/Bundle/WebProfilerBundle/ .
... and modify the autoloader file (ezpublish/autoload.php) to contain only:
spl_autoload_register(
    function ( $className )
    {
        if (
            strpos( $className, "Symfony\\" ) === 0 ||
            strpos( $className, "Sensio\\" ) === 0 ||
            strpos( $className, "eZ\\" ) === 0 ||
            strpos( $className, "EzSystems\\" ) === 0 ||
            strpos( $className, "Monolog\\" ) === 0 ||
            strpos( $className, "Psr\\" ) === 0 ||
            strpos( $className, "Twig_" ) === 0 ||
            strpos( $className, "Assetic\\" ) === 0
        )
        {
            @include __DIR__ . "/../vendor/PSR-0/" . str_replace( array( "_", "\\" ), "/", $className ) . ".php";
        }
    },
    true
);

require __DIR__ . "/../ezpublish_legacy/autoload.php";
Now that this is in place, let's benchmark it:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    286.38 [#/sec] (mean)
Time per request:       3.492 [ms] (mean)
Time per request:       3.492 [ms] (mean, across all concurrent requests)
Transfer rate:          11286.49 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     3    3   0.2      3       6
Waiting:        3    3   0.2      3       5
Total:          3    3   0.2      3       6
Using this technique, we gained yet another 16 requests/sec over ApcClassLoader that was the best results so far. Let's inspect xhprof once again:
Function Name Calls Calls% Incl. Wall Time (µs) IWall% Excl. Wall Time (µs) EWall%
Symfony\Component\HttpFoundation\ResponseHeaderBag::set 42 1.7% 669 11.8% 267 4.7%
Symfony\Component\HttpFoundation\ServerBag::getHeaders 4 0.2% 262 4.6% 259 4.6%
Symfony\Component\HttpFoundation\HeaderBag::set 60 2.4% 307 5.4% 243 4.3%
Symfony\Component\HttpFoundation\HeaderBag::get 48 1.9% 204 3.6% 204 3.6%
Symfony\Component\HttpFoundation\HeaderBag::has 45 1.8% 134 2.4% 129 2.3%
{closure} 15 0.6% 713 12.6% 126 2.2%
Symfony\Component\HttpFoundation\Request::normalizeQueryString 12 0.5% 131 2.3% 119 2.1%
Symfony\Component\HttpFoundation\ParameterBag::get 92 3.6% 118 2.1% 118 2.1%
Symfony\Component\HttpFoundation\Request::prepareBaseUrl 4 0.2% 479 8.5% 116 2.1%
main() 1 0.0% 5,646 100.0% 110 1.9%
DateTime::createFromFormat 8 0.3% 103 1.8% 103 1.8%
Symfony\Component\HttpFoundation\Response::prepare 4 0.2% 171 3.0% 97 1.7%
DateTime::__construct 12 0.5% 96 1.7% 96 1.7%
Symfony\Component\HttpFoundation\HeaderBag::__construct 8 0.3% 678 12.0% 92 1.6%
file_get_contents 6 0.2% 88 1.6% 88 1.6%
Symfony\Component\HttpFoundation\HeaderBag::parseCacheControl 12 0.5% 137 2.4% 71 1.3%
Symfony\Component\HttpFoundation\Request::create 3 0.1% 434 7.7% 70 1.2%
Symfony\Component\HttpKernel\HttpCache\Store::lookup 4 0.2% 1,347 23.9% 70 1.2%
Symfony\Component\HttpFoundation\Request::initialize 4 0.2% 549 9.7% 68 1.2%
Symfony\Component\HttpFoundation\Response::getAge 11 0.4% 411 7.3% 66 1.2%
preg_match_all 12 0.5% 66 1.2% 66 1.2%
Symfony\Component\HttpFoundation\Response::sendContent 1 0.0% 63 1.1% 63 1.1%
Symfony\Component\HttpFoundation\HeaderBag::getCacheControlHeader 7 0.3% 75 1.3% 62 1.1%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle@1 2 0.1% 2,500 44.3% 61 1.1%
Symfony\Component\HttpFoundation\Request::isSecure 16 0.6% 93 1.6% 60 1.1%
Symfony\Component\HttpFoundation\Request::prepareRequestUri 4 0.2% 259 4.6% 53 0.9%
load::HttpFoundation/Request.php 1 0.0% 52 0.9% 52 0.9%
Symfony\Component\HttpFoundation\Request::getHost 8 0.3% 115 2.0% 51 0.9%
It looks like from the latest technique used, we finally get rid of the autoloading performance impact. Maybe Composer could automate the creation of such a PSR-0 tree? Well, if you care, this has been asked already ;-) https://github.com/composer/composer/issues/1529
From the latest xhprof report, it seems that Symfony HttpFoundation components are the next bottlenecks.

Some craziness: s/Symfony/Cymfony/

Now, what if... those Symfony components were built in C? Well, the idea sounds a bit crazy at first but I started creating a simple PHP extension written in C (that I named "Cymfony") that implements natively the Symfony\\Component\\HttpFoundation\\HeaderBag and Symfony\\Component\\HttpFoundation\\ParameterBag classes. Those were simple enough to be written quickly while appearing in the xhprof report.
Let's activate the extension and run yet another benchmark:
$ sudo ab -k -n 1000 http://ezpublish-community/eng/
[...]
Requests per second:    328.93 [#/sec] (mean)
Time per request:       3.040 [ms] (mean)
Time per request:       3.040 [ms] (mean, across all concurrent requests)
Transfer rate:          12963.84 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     3    3   0.2      3       5
Waiting:        3    3   0.2      3       4
Total:          3    3   0.2      3       5
This is a pretty good result! By reimplementing classes natively, with 100% compatibility, I have been able to jump from 286.38 request/sec to 328.93 which is about a 15% boost. This percentage is very close to the original amount of time spent in the classes in their PHP implementation:
Symfony\Component\HttpFoundation\HeaderBag::set 4.3%
Symfony\Component\HttpFoundation\HeaderBag::get 3.6%
Symfony\Component\HttpFoundation\HeaderBag::has 2.3%
Symfony\Component\HttpFoundation\ParameterBag::get 2.1%
Symfony\Component\HttpFoundation\HeaderBag::__construct 1.6%
Symfony\Component\HttpFoundation\HeaderBag::parseCacheControl 1.3%
Symfony\Component\HttpFoundation\HeaderBag::getCacheControlHeader 1.1%
17.4%
Let's see if xhprof confirms those methods are not being the top most consuming parts:
Function Name Calls Calls% Incl. Wall Time (µs) IWall% Excl. Wall Time (µs) EWall%
Symfony\Component\HttpFoundation\ResponseHeaderBag::set 42 2.3% 436 9.2% 283 6.0%
Symfony\Component\HttpFoundation\ServerBag::getHeaders 4 0.2% 255 5.4% 253 5.3%
Symfony\Component\HttpFoundation\Request::prepareBaseUrl 4 0.2% 396 8.3% 129 2.7%
Symfony\Component\HttpFoundation\Request::normalizeQueryString 12 0.7% 134 2.8% 125 2.6%
{closure} 13 0.7% 591 12.4% 119 2.5%
Symfony\Component\HttpFoundation\Response::prepare 4 0.2% 123 2.6% 112 2.4%
main() 1 0.1% 4,756 100.0% 111 2.3%
file_get_contents 6 0.3% 95 2.0% 95 2.0%
DateTime::__construct 12 0.7% 89 1.9% 89 1.9%
Symfony\Component\HttpFoundation\HeaderBag::set 42 2.3% 80 1.7% 80 1.7%
Symfony\Component\HttpFoundation\Request::create 3 0.2% 380 8.0% 77 1.6%
Symfony\Component\HttpFoundation\HeaderBag::__construct 8 0.4% 400 8.4% 76 1.6%
Symfony\Component\HttpFoundation\Request::initialize 4 0.2% 421 8.9% 70 1.5%
Symfony\Component\HttpKernel\HttpCache\Store::lookup 4 0.2% 1,121 23.6% 69 1.5%
Symfony\Component\HttpFoundation\Response::getAge 11 0.6% 257 5.4% 67 1.4%
Symfony\Component\HttpFoundation\Request::prepareRequestUri 4 0.2% 197 4.1% 67 1.4%
Symfony\Component\HttpFoundation\HeaderBag::getDate 8 0.4% 66 1.4% 66 1.4%
Symfony\Component\HttpFoundation\Request::isSecure 16 0.9% 63 1.3% 63 1.3%
Symfony\Component\HttpFoundation\Response::sendContent 1 0.1% 63 1.3% 63 1.3%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle@1 2 0.1% 2,049 43.1% 59 1.2%
load::HttpFoundation/Request.php 1 0.1% 55 1.2% 55 1.2%
Symfony\Component\HttpFoundation\Request::getHost 8 0.4% 90 1.9% 55 1.2%
Symfony\Component\HttpFoundation\Response::getMaxAge 10 0.6% 95 2.0% 54 1.1%
Symfony\Component\HttpKernel\HttpCache\HttpCache::restoreResponseBody@1 2 0.1% 885 18.6% 51 1.1%
Symfony\Component\HttpKernel\HttpCache\HttpCache::handle 1 0.1% 3,460 72.8% 49 1.0%
load::HttpKernel/Kernel.php 1 0.1% 49 1.0% 49 1.0%
eZ\Publish\Core\MVC\Symfony\Cache\Http\LocationAwareStore::getPath 8 0.4% 87 1.8% 48 1.0%
The rewritten classes are effectively more efficient and doesn't take as much processing time. This is encouraging to try reimplementing classes like Symfony\\Component\\HttpFoundation\\ResponseHeaderBag, Symfony\\Component\\HttpFoundation\\ServerBag or Symfony\\Component\\HttpFoundation\\Request. Those are, however, much more time consuming to implement in C.

Conclusion

In my previous blog post I was suggesting using a PSR-0 tree already, but I lacked some stats, this issue is now solved.
Spending some time tracking the bottlenecks can drastically improve the throughput, autoloading being frequently one of them. Hopefully, we do have Xdebug and xhprof that are two great tools to profile a PHP application.
The Cymfony project that I started as a proof-of-concept might, in fact, interest some people working with Symfony 2 and that want to optimize some parts of the framework just by installing a PHP extension. However, speeding up the path of delivering cached page is probably not the best use case because if someone is able to install a PHP extension, he is probably able to install a decent reverse proxy which would speed up the throughput much more! What are the classes that might be worth being optimized in Symfony 2 according to you?

6 comments:

Jordi Boggiano said...

Very interesting results, but I think that a few things are missing to properly test the classmap vs PSR-0 approach in Composer:

- What about testing a real page view (not pre-cached html)? The current runs you showed don't load many classes so they probably don't offset the cost of loading the classmap.
- Could you do the test on 5.5 with O+ enabled? Curious to see if it handles loading the huge classmap better than APC (who knows).

Anonymous said...

I wonder how things would look if you cache the class map array insure the APC user cache which is optimized for fetching arrays.

regards
Lukas

Larry Garfield said...

Cool data! I agree that the mass-concat-file approach requires proper tuning, and that tuning is probably per-install-specific.

Another advantage of a unified PSR-0 tree for autoloading is that it reduces the number of directories to go through for developers to find a class. The Drupal community has been struggling with that for a while. Symlinks would address both problems.

"Drop the expensive stuff to C" is one of the key advantages of autoloading and cleanly injected code. I'm a bit surprised that the Bags are so expensive. They may have optimization opportunity on their own. Worth investigating. I suspect the EventDispatcher would be another good C-ification target. Especially the ContainerAwareEventDispatcher has a ton of stack calls for every dispatch. IMO it needs to be compiled like the Container, but C-ifying it is an interesting idea, too.

Florin said...

Thank you for your efforts.

Could you also repeat the tests by adding the following patches?
https://github.com/symfony/symfony/pull/7071
and
https://github.com/symfony/symfony/pull/7064

Also, if you look on the last patch, it should help you to add some classes from ezPublish as well to the cache that Symfony2 creates (ofc, don't add too many classes).

Best regards.

Patrick Allaert said...

@Jordi:
Using fresh content (non cached HTML):

With APC[1]:
stock: 30.94 reqs/sec (avg: 32.320 ms)
removing loadClassCache(): 27.53 reqs/sec (avg: 36.324 ms)
(restoring loadClassCache() as it seems interesting for non cached pages)
activating ApcClassLoader: 35.52 reqs/sec (avg: 28.152 ms)
Registering the whole classmap in APC as suggested by Anonymous: 30.95 reqs/sec (avg: 32.307 ms)
Using Composer --optimize: 32.26 reqs/sec (avg: 31.000 ms)
Using PSR-0 tree: 35.50 reqs/sec (avg: 28.166 ms)

ApcClassLoader seems a very little faster than the PSR-0 tree but I would say they would be about the same

With Zend Optimizer+[2]:
stock: 29.21 reqs/sec (avg: 34.238 ms)
removing loadClassCache(): 27.75 reqs/sec (avg: 36.042 ms)
(restoring loadClassCache() as it seems interesting for non cached pages)
Using Composer --optimize: 30.67 reqs/sec (avg: 32.608 ms)
Using PSR-0 tree: 32.69 reqs/sec (avg: 30.590 ms)

[1] APC settings:
apc.max_file_size=5M
apc.shm_segments=4

[2] ZO+ settings:
zend_optimizerplus.memory_consumption=256
zend_optimizerplus.interned_strings_buffer=16
zend_optimizerplus.max_accelerated_files=4000
zend_optimizerplus.revalidate_freq=0
zend_optimizerplus.fast_shutdown=1
zend_optimizerplus.enable_cli=1
zend_optimizerplus.save_comments=0

jxn said...

I was a bit shocked by the performance of the class map. I'm going to see if I can get a chance to informally reproduce it, and also check if the same holds true for some of my other symfony stack projects.

Also, Cymfony sounds exciting... have you released anything?