December 4 Johannes Schmitt schmittjoh

Composer, the GC performance improvement, and who else might be affected

Chances are you might have heard of sensational performance improvements in PHP’s package manager, composer, of 30-90% of original runtime by changing not even a single line of code.

I initially suggested this change to the composer team on December 1st, and after some positive reports, it made its way into the production version of composer soon after, and is since then available to all PHP users.

How was the issue found?

At Scrutinizer, we run a lot of CLI-based analysis tools, some of which are written in PHP, and some of these create a lot of objects f.e. during type-inference.

Very early in Scrutinizer’s development (maybe something like 1.5 years ago), we noticed that the analysis time of bigger packages was not increasing linearly, but rather expontentially. Of course, we used profilers like XHProf to track down the issue, but the results were inconclusive; time was spent randomly in different places.

After some research, we came to the conclusion that garbage collection might be responsible, and indeed after disabling it, the analysis of a bigger package like Symfony2 went down from over 50 minutes to less than 10 minutes (this was more than a year ago).

It was only until a few days ago, that we realized that composer did not contain this fix, and might benefit from this, too. When a performance improvement that Nils made had a significantly different impact depending on how many dependencies someone had. The rest of the story is history now :)

Is your app affected, too?

Chances are most likely, no. This particular bad behavior of PHP’s GC only manifests itself in long running CLI processes that create tens of thousands of objects. Web-requests are likely not affected.

If you profile your application, and time is randomly spent in different places from run to run. This is the usual pattern to look for. If you see this, only then try running with GC disabled.

There are also some initiatives now to provide better support for profilers to measure the time spent in GC which will make this even easier.

Is memory consumption increased when disabling GC?

The short, and counter-intuitive answer is no. You will not see a change in your memory consumption.

First, the call to gc_disable() does not turn off garbage collection entirely, it just disables one particular GC strategy that cleans up circular references.

Second, if you see this particular performance degradation, the garbage collection is not working anyway. It basically tries to clean-up your objects, only to recognize that it cannot clean them up, and it does that frequently. This is where the time is lost and the huge improvement is coming from when disabling it.

For a detailed explanation of the different GC strategies in PHP, check out this blog post from Anthony.

Which other tools might be affected?

Like said above, this is mostly something that affects CLI tools. One prime candidate that comes to mind is phpunit. If you have a big test-suite, try running it with GC disabled:

php -d zend.enable_gc=0 vendor/bin/phpunit

October 28 Johannes Schmitt schmittjoh

Security Audit of Request Data for PHP

The Open Web Application Security Project (OWASP) considered Injection attacks to be the most critical security risk in 2013. There are many different forms of injection attacks such as SQL injection, Path injection, Command injection, and many more. Injection attacks especially hurtful because they are easy to exploit, and can cause severe damage.

At Scrutinizer, we already notified you of SQL injection issues f.e. if you did not use parameter binding offered by PHP’s PDO abstraction, or libraries like Doctrine. However, there is a whole range of other PHP functions that are safe per-se, but can lead to severe security issues if passed arbitrary user input. file_get_contents for example is one of these functions. Malicious users can use it to gain access to sensitive credentials from your application.

Flagging each call to file_get_contents would be a very naive approach and not really helpful. However, at Scrutinizer we have invested into a solid foundation. Our PHP analyzer is almost like a compiler for PHP and uses advanced techniques such as data flow analysis, and abstract interpretation which distinguishes us from other competitors, or also existing open-source tools that only use AST-based approaches. It are these very analysis techniques that help us understand how data flows through your application.

In the most recent version of PHP analyzer, we have added a security analysis framework that specifically focuses on making sure your request data is not ending up in one of PHP’s sensitive functions, and that consequentially could make you vulnerable to injection attacks. Let’s take a look at an example:

use Symfony\Component\Yaml\Yaml;
use Symfony\Component\HttpFoundation\Request;

class Instance
{
    private $config;

    public function __construct($config)
    {
        $this->config = $config;
    }

    public function getParsedConfig()
    {
        return Yaml::parse($this->config);
    }
}

class MyController
{
    public function createAction(Request $request)
    {
        $instance = new Instance($request->request->get('config'));
        // ... save new instance, etc.
    }
}

In this example, we are assigning a YAML configuration from request data to a class property, and then save the new instance. Let’s see what Scrutinizer finds when analyzing this code:

File-Expansion Security Issue

This might come a bit unexpected. There is no problem with database persistence here, Doctrine takes care of escaping the content for us, but we still introduced a security issue by passing raw request data to the Yaml::parse() function which in turn calls file_get_contents(). Let’s take a look at it to see what is happening there:

class Yaml
{
    public static function parse($input, $exceptionOnInvalidType = false, $objectSupport = false)
    {
        // if input is a file, process it
        $file = '';
        if (strpos($input, "\n") === false && is_file($input)) {
            if (false === is_readable($input)) {
                throw new ParseException(sprintf('Unable to parse "%s" as the file is not readable.', $input));
            }

            $file = $input;
            $input = file_get_contents($file);
        }

        // ...
    }
}

This function has a feature that expands the input in case it is a valid file path, and replaces it with the contents of that path before parsing its actual contents. If someone were to pass a path such as ../app/config/parameters.yml, he might gain access to the parameters file of a Symfony2 application.

I would like to point out that we have not hard-coded the Yaml::parse function in PHP analyzer, but since we not only analyze your application, but also all your dependencies. We also assess the security relevance of the specific dependency version that your application uses. Analyzing your dependencies is even more important as you might not be as familiar with your dependencies’ code and consequentially also not aware of some of their features such as the file expansion shown above. This also allows us to compile a stack trace for you that aids in assessing the security issue more easily.

We are proud to now add this additional level of security to your PHP applications on Scrutinizer. Currently, we support tracking request data from Symfony2’s, and Zend’s request abstraction, as well as usage of PHP’s super globals like $_GET. If you are using another request data abstraction, just let us know by opening an issue.

Happy & secure coding!


September 25 Johannes Schmitt schmittjoh

Better insight in how containers are used

This is a commonly requested feature, and I’m happy to announce that this is now available. If one of your tasks is waiting and not executed, you can now view which other tasks are currently using your plan’s containers.

Image that shows the tasks that use a container

We also provide some statistics on how long your tasks are in waiting state before being run on your billing page helping you decide whether you might need to add another container to your plan:

Image from the billing page that shows containers

Enjoy!


July 28 Johannes Schmitt schmittjoh

PHP Analyzer: New Features and Bug Fixes

We have recently deployed a new version of PHP Analyzer. This minor version upgrade contains mostly bug fixes and a couple of new features/improvements.

New: Deadlock Detection

One of the major new features in this upgrade is deadlock detection. This is mainly intended for CLI scripts where such deadlocks might go unnoticed for a while. PHP Analyzer now checks the exit conditions of loops for whether they are either always satisfied, or can never be satisfied.

Let’s have a look at an example:

function isMergeable(GitHubRepository $repository)
{
    $i = 0;
    do {
        if ($i > 0) {
            $waitTime = pow(2, $i) * 1;
            sleep($waitTime);
        }

        $prDetails = $this->api->getPrDetails($repository->getLogin(), $repository->getName(), $prNumber);
        if ($prDetails['mergeable'] === true) {
            return true;
        }
    } while ($i < 3);

    return false;
}

This code fragment comes from our code-base. It is run as a background process whenever GitHub notifies us of a pull-request. We pull the GitHub API to check whether a pull-request is mergeable and return true or false. However, you might have spotted a small mistake in the loop, we are actually missing a $i++ or similar to increase the counter, and as such the condition $i < 3 is always true.

This did go unnoticed for a while since the actual result, i.e. creating an inspection when the pull-request was mergeable or ignoring the pull-request if it was not mergeable was achieved. However, we saw some errors because jobs exceeded their maximum runtime which eventually led us to find this error.

Even if the cost of this error was not high in this case, PHP Analyzer now makes sure that we do not slip in any unsatisfiable loops. This check is enabled by default if you are using the tools configuration, and can be enabled via the checks configuration as follows:

# .scrutinizer.yml
# Only add this if you already have a "checks" section.
checks:
    php:
        deadlock_detection_in_loops: true

Bug Fixes & Improvements

This release also contains a couple of bug fixes, and minor improvements. Some of the highlights below:

  • Various fixes to our PHP stubs: This mainly improves checks where we verify that you pass the right number of arguments, and the right types. PHP stubs tell the analyzer which types are built-in into the PHP runtime.
  • Type inference around associative arrays: PHP Analyzer is more accurate at inferring types of arrays that are used as maps with named keys.
  • Side-effects analysis supports some cases better: This makes various checks more accurate that draw on the results of the side-effects analysis. The side-effects analysis checks whether a function invocation has any effect after the invocation is done.

... and some more improvements, see the full changelog.

Thanks, and happy inspecting! :)


June 2 Johannes Schmitt schmittjoh

Support for Merging Code Coverage Data

A frequent request we got, especially when you start to parallelize your test-suite, is to merge the code coverage data that is produced. We now support this feature natively. All you need to do, is to define the runs option in your configuration:

# .scrutinizer.yml
tools:
    external_code_coverage:
        runs: 2    # Wait for two code coverage submissions

This will make Scrutinizer wait for two code coverage submissions instead of proceeding directly after receiving the first one.

Happy inspecting :)