Sharing variables between child processes in PHP?

10,637

Solution 1

forked children will gain their own dedicated copy of their memory space as soon as they write anywhere to it - this is "copy-on-write". While shmop does provide access to a common memory location, the actual PHP variables and whatnot defined in the script are NOT shared between the children.

Doing $x = 7; in one child will not make the $x in the other children also become 7. Each child will have its own dedicated $x that is completely independent of everyone else's copy.

Solution 2

As long as father and children know the key/keys of the shared memory segment is ok to do a shmop_open before pcnlt_fork. But remember that pcnlt_fork returns 0 in the child's process and -1 on failure to create the child (check your code near the comment /confusion/). The father will have in $pid the PID of the child process just created.

Check it here:

http://php.net/manual/es/function.pcntl-fork.php

Share:
10,637
stevendesu
Author by

stevendesu

I like to code. That's about it. I prefer web-based applications. PHP, MySQL, HTML, CSS, JavaScript... I can also code in Perl or Python for shell automation and C++ when I need something to run fast. I'm learning ASM. I'm a Computer Engineering graduate with a Masters in Business Administration. Math and algorithms come easily for me. Art and graphics do not. I'm also a certified Mac Genius. Summer jobs. Whee. I don't own a Macintosh, but I can tear them apart, rebuild them, and fix just about any problem with them.

Updated on July 20, 2022

Comments

  • stevendesu
    stevendesu almost 2 years

    I'm sure what I'm trying is very simple, but I've never quite worked with multithreading before so I'm not sure where to start.

    I'm using PCNTL to create a multithreaded PHP application. What I wish to do is have 3 functions running concurrently and I want their returned values merged into a single array. So logically I need either some variable shared among all children to which they append their results, or three variables shared only between a single child and the parent - then the parent can merge the results later.

    Problem is - I have no idea how to do this. The first thing that comes to mind is using shared memory, but I feel like there should be an easier method.

    Also, if it has any effect, the function which forks the process is a public class method. So my code looks something like the following:

    <?php
        class multithreaded_search {
            /* ... */
            /* Constructors and such */
            /* ... */
            public function search( $string = '' ) {
                $search_types = array( 'tag', 'substring', 'levenshtein' );
                $pids = array();
                foreach( $search_types as $type ) {
                    $pid = pcntl_fork();
                    $pids[$pid] = $type;
                    if( $pid == 0 ) { // child process
                        /* confusion */
                        $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                        /* What do we do with $results ? */
                    }
                }
                for( $i = 0; $i < count( $pids ); $i++ ) {
                    $pid = pcntl_wait();
                    /* $pids[$pid] tells me the type of search that just finished */
                    /* If we need to merge results in the parent, we can do it here */
                }
                /* Now all children have exited, so the search is complete */
                return $results;
            }
            private function tag_search( $string ) {
                /* perform one type of search */
                return $results;
            }
            private function substring_search( $string ) {
                /* perform one type of search */
                return $results;
            }
            private function levenshtein_search( $string ) {
                /* perform one type of search */
                return $results;
            }
        }
    ?>
    

    So will I need to use shmop_open before I call pcntl_fork to create shared memory and save the results there, or do the children share class variables? Or do they only share global variables? I'm sure the answer is easy... I just don't know it.

    Answers (for anyone who finds this)

    I've got a few more years of experience, so I'll try to impart some knowledge.

    First, there are two important distinctions to understand when it comes to implementing multiprocessing in your applications:

    • Threads versus processes versus forked processes
    • Shared memory versus message passing

    Threads, processes, forked processes

    • Threads: Threads are very low overhead since they run in the same process space as the parent and share the parent's memory address. This means fewer OS calls in order to create or destroy a thread. Threads are the "cheap" alternative if you plan to be creating and destroying them often. PHP does not have native support for threads. However as of PHP 7.2, there are PHP extensions (written in C) that provide threaded functionality. For example: pthreads
    • Processes: Processes have a much larger overhead because the operating system must allocate memory for it, and in the case of interpreted languages like PHP, there's often a whole runtime that must be loaded and processed before your own code executes. PHP does have native support for spawning processes via exec (synchronous) or proc_open (asynchronous)
    • Forked processes: A forked process splits the difference between these two approaches. A separate process is run in the current processes's memory space. There is also native support for this via PCNTL

    Choosing the proper tool for the job often is a matter of asking the question: "How often will you be spinning up additional threads/processes"? If it's not that often (maybe you run a batch job every hour and the job can be parallelized) then processes might be the easier solution. If every request that comes into your server requires some form of parallel computation and you receive 100 requests per second, then threads are likely the way to go.

    Shared memory, message passing

    • Shared memory: This is when more than one thread or process is allowed to write to the same section of RAM. This has the benefit of being very fast and easy to understand - it's like a shared whiteboard in an office space. Anyone can read or write to it. However it has several drawbacks when it comes to managing concurrency. Imagine if two processes write to the exact same place in memory at the exact same time, then a third process tries to read the result. Which result will it see? PHP has native support for shared memory via shmop, but to use it correctly requires locks, semaphores, monitors, or other complex systems engineering processes
    • Message passing: This is the "hot new thing"™ that has actually been around since the 70's. The idea is that instead of writing to shared memory, you write into your own memory space and then tell the other threads / processes "hey, I have a message for you". The Go programming language has a famous motto related to this: "Don't communicate by sharing memory, share memory by communicating". There are a multitude of ways to pass messages, including: writing to a file, writing to a socket, writing to stdout, writing to shared memory, etc.

    A basic socket solution

    First, I'll attempt to recreate my solution from 2012. @MarcB pointed me towards UNIX sockets. This page explicitly mentions fsockopen, which opens a socket as a file pointer. It also includes in the "See Also" section a link to socket_connect, which gives you a bit lower-level control over sockets.

    At the time I likely spent a long time researching these socket_* functions until I got something working. Now I did a quick google search for socket_create_pair and found this helpful link to get you started

    I've rewritten the code above writing the results to UNIX sockets, and reading the results into the parent thread:

    <?php
    /*
     * I retained the same public API as my original StackOverflow question,
     * but instead of performing actual searches I simply return static data
     */
    
    class multithreaded_search {
        private $a, $b, $c;
        public function __construct($a, $b, $c) {
            $this->a = $a;
            $this->b = $b;
            $this->c = $c;
        }
    
        public function search( $string = '' ) {
            $search_types = array( 'tag', 'substring', 'levenshtein' );
            $pids = array();
            $threads = array();
            $sockets = array();
            foreach( $search_types as $type ) {
                /* Create a socket to write to later */
                $sockets[$type] = array();
                socket_create_pair(AF_UNIX, SOCK_STREAM, 0, $sockets[$type]);
                $pid = pcntl_fork();
                $pids[] = $pid;
                $threads[$pid] = $type;
                if( $pid == 0 ) { // child process
                    /* no more confusion */
                    $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                    /* What do we do with $results ? Write them to a socket! */
                    $data = serialize($results);
                    socket_write($sockets[$type][0], str_pad($data, 1024), 1024);
                    socket_close($sockets[$type][0]);
                    exit();
                }
            }
            $results = [];
            for( $i = 0; $i < count( $pids ); $i++ ) {
                $pid = $pids[$i];
                $type = $threads[$pid];
                pcntl_waitpid($pid, $status);
                /* $threads[$pid] tells me the type of search that just finished */
                /* If we need to merge results in the parent, we can do it here */
                $one_result = unserialize(trim(socket_read($sockets[$type][1], 1024)));
                $results[] = $one_result;
                socket_close($sockets[$type][1]);
            }
            /* Now all children have exited, so the search is complete */
            return $results;
        }
    
        private function tag_search() {
            return $this->a;
        }
    
        private function substring_search() {
            return $this->b;
        }
    
        private function levenshtein_search() {
            return $this->c;
        }
    }
    
    $instance = new multithreaded_search(3, 5, 7);
    var_dump($instance->search());
    

    Notes

    This solution uses forked processes and message passing over a local (in-memory) socket. Depending on your use case and setup, this may not be the best solution. For instance:

    • If you wish to split the processing among several separate servers and pass the results back to a central server, then create_socket_pair won't work. In this case you'll need to create a socket, bind the socket to an address and port, then call socket_listen to wait for results from the child servers. Furthermore, pcntl_fork wouldn't work in a multi-server environment since a process space can't be shared among different machines
    • If you're writing a command-line application and prefer to use threads, then you can either use pthreads or a third-party library that abstracts pthreads
    • If you don't like digging through the weeds and just want simple multiprocessing without having to worry about the implementation details, looks into a library like Amp/Parallel
  • stevendesu
    stevendesu over 12 years
    That makes sense, although how do I allow the child to edit a variable in the parent's scope, then? Do I have to use shared memory, or is there an alternate method?
  • stevendesu
    stevendesu over 12 years
    Thank you for the correction. I actually hadn't run the code yet (since I haven't solved the shared memory issue) so that would have caused some interesting debugging down the road. I'll edit the question, as well. Is there a way to modify a variable in the parent's scope without shared memory, or should I use shared memory?
  • Marc B
    Marc B over 12 years
    You can't. PHP doesn't provide the necessary low-level memory access that'd let you find out where the parent's memory is and talk to it directly. And even if you could access the parent process's memory via pointers or somesuch, there's no guarantee that the parent's memory layout remains the same as the child's. You'd have to deal with the php engine's internal memory maps to figure out where the variable is, etc... Use shared memory, or open a bi-directional communications channel between the two processes and build a little api to send data back and forth.
  • stevendesu
    stevendesu over 12 years
    I've written an answer using shared memory now, but I like the idea of the bi-directional communication channel (particularly because this puts no size limit on the return value, whereas you must define the size of the shared memory in bytes). How do I go about creating this?
  • Marc B
    Marc B over 12 years
    a local domain socket is easiest. have the parent open one with fsockopen for each child immediately before the fork. that way you can have one comm channel per child: php.net/manual/en/transports.unix.php and php.net/manual/en/transports.unix.php
  • MutantMahesh
    MutantMahesh about 9 years
    @stevendesu Were you able to solve this problem with php.net/manual/en/transports.unix.php ? Can you please share the code.
  • stevendesu
    stevendesu over 5 years
    @MutantMahesh This question / answer were from 2012 (and I just now noticed it again when anonymous007 commented in 2018). Unfortunately I don't know if I have the original code anymore, but I can try to put something together and amend the question with the "final answer" at the bottom for anyone else who stumbles upon this. I've gotten a lot more experience writing multithreaded apps recently.