Array Functions: Initializing an Array Using [...] or array()? - PHP Review #2

This is another installment in the PHP review series. This time, I decided to take a closer look at the array declaration function and what really lies behind it. From what I remember, using array() was considered outdated by my colleagues at work and should no longer be used. To verify this, I decided to pull the repository of a PHP code refactoring and updating framework—Rector. After all, what tool is better at handling arrays than one designed for refactoring and adhering to best coding practices?

A Deep Dive into the Shallow End

To my surprise, I didn’t find many examples. In fact, the only example where array() was actually used—or rather its documentation, recommending against its use—was in Rector’s framework under the golden rule for LongArrayToShortArray:

public function getRuleDefinition() : RuleDefinition
{
    return new RuleDefinition('Long array to short array', [new CodeSample(<<<'CODE_SAMPLE'
class SomeClass
{
    public function run()
    {
        return array();
    }
}
CODE_SAMPLE
, <<<'CODE_SAMPLE'
class SomeClass
{
    public function run()
    {
        return [];
    }
}
CODE_SAMPLE
)]);
}

This is thought-provoking, considering that the method in LongArrayToShortArrayRector.php is located in the Php54 package. In PHP 5.4, the short array syntax was introduced. Since then, it has been recommended to use the shorter array notation.

However, this still doesn’t fully answer the question: What is the fundamental difference between calling array() and []? So what really happens behind the scenes? When executing a PHP script, you first need to bake a cake… To bake a cake, you need to:

Creating an Abstract Syntax Tree – The Cake

The key is understanding AST (Abstract Syntax Tree). This is a tree structure necessary to separate the compilation process from the parser. The Lexer returns tokens, which are then processed by the parser to generate the AST structure. This structure is then compiled into opcode, stored in opcache, and later interpreted by the computer as executable instructions. To learn more about this topic, I recommend watching the following video:

Climbing the Abstract Syntax Tree - James Titcumb - Forum PHP 2017

So, Where Exactly is the Difference Between [] and array()?

The first difference I noticed was in the compiler:

#define ZEND_ARRAY_SYNTAX_LIST 1  /* list() */
#define ZEND_ARRAY_SYNTAX_LONG 2  /* array() */
#define ZEND_ARRAY_SYNTAX_SHORT 3 /* [] */

And in the case of assignment validation:

static void zend_verify_list_assign_target(zend_ast *var_ast, zend_ast_attr array_style) {
    if (var_ast->kind == ZEND_AST_ARRAY) {
        if (var_ast->attr == ZEND_ARRAY_SYNTAX_LONG) {
            zend_error_noreturn(E_COMPILE_ERROR, "Cannot assign to array(), use [] instead");
        }
        if (array_style != var_ast->attr) {
            zend_error_noreturn(E_COMPILE_ERROR, "Cannot mix [] and list()");
        }
    } else if (!zend_can_write_to_variable(var_ast)) {
        zend_error_noreturn(E_COMPILE_ERROR, "Assignments can only happen to writable values");
    }
}

In PHP code, this has the following consequences:

array(1, 2, 3) = [4, 5, 6];  // Error: cannot assign to array()
[1, 2, 3] = [4, 5, 6];  // Correct

Honestly, I didn’t expect to find such quirks in the compiler, but this is a significant difference.

Another difference is visible in the Lexer, specifically in the language_scanner.l file:

<ST_IN_SCRIPTING>"array" {
    RETURN_TOKEN_WITH_IDENT(T_ARRAY);
}

<ST_IN_SCRIPTING>"["|"(" {
    enter_nesting(yytext[0]);
    RETURN_TOKEN(yytext[0]);
}

<ST_IN_SCRIPTING>"]"|")" {
    /* Check that ] and ) match up properly with a preceding [ or ( */
    RETURN_EXIT_NESTING_TOKEN(yytext[0]);
}

What Conclusions Can We Draw?

  • For [], the lexer must return two tokens, and the parser then merges them into the appropriate structure. This requires more work during tokenization and parsing compared to array(), where the lexer directly returns a T_ARRAY token, and the parser simply processes it as a single expression.
  • In a PHP version without opcache, code must be processed every time it is executed, meaning that both array() and [] must go through the lexer and parser to generate AST and opcodes.
  • Longer parsing time for []: Since [] requires additional analysis by the parser (to recognize brackets), processing takes longer.
  • Shorter parsing time for array(): The T_ARRAY token is easier to recognize, and the parser does not have to perform additional syntax analysis.

Who would have thought that to determine the real difference between [] and array(), I would have to delve so deeply into PHP? It seems that array() might still have some use cases:

  • When code is compiled without access to temporary memory (opcache), which is rare.
  • When using the eval() function, where opcache is not utilized.



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Array Functions: ['array_filter', 'array_merge'] - PHP Review #1
  • Testing Error Handling in Godot Using GUT
  • How to store big binary files with git lfs on Google Drive or One Drive?
  • How to solve rebase and merge conflicts with GitExtensions?
  • How to start with GIT?