Building a distributed app with PHP and Net_Gearman, part 1

I’d been hearing a lot of great things about Gearman, and I was just looking for a reason to use it in a project.  Well, the opportunity finally arose, and as I was playing with the Net_Gearman PEAR library, I struggled a bit with the documentation, so I thought I might share some lessons learned.

To use Gearman with PHP, you can either go the PECL route or the PEAR route.  In my case, I was working with OS X 10.4 boxes, and it’s enough trouble just to get a modern PHP on there.  I didn’t want to mess with building PECL modules.  So I opted for Net_Gearman, the PEAR library.

 A few things I learned along the way:

  • You need to submit your arguments to gearman jobs as an array; since PHP is not typed, it will gladly accept a simple string, and when your worker tries to access the individual arguments, it will get single characters from this string instead of full argument strings
  • The PEAR library doesn’t seem to have a simple $client->do() method to easily send a single job to gearman (like some other Gearman libraries)
  • Most examples I’ve seen take the approach of creating a task object, adding it to a set, and running the task set, but Net_Gearman uses the __call() mechanism to allow you to invoke a job as if it were a function, which is pretty cool; this is not obvious in the documentation
  • There are some really nice callbacks you can use in your worker process at the start, completion, or failure of the job; these are not obvious in the documentation
Download the source for this tutorial series.  Don’t be overwhelmed by the number of files in the tarball.  There is a client/server file pair for each “lesson” in the tutorial.

1. The basics

Let’s look at a basic implementation.  We’ll start by defining a class library with some configuration options.  Here is gm_shared.php:

Obviously, you’ll need to change the first three values in this class (you can use the localhost address shown above, but you’ll probably want your distributed app to run on more than one server, so you’ll need to use the right IP address.  Note that I added a log_msg function to the class — any unattended process like a gearman worker process will need to log messages somewhere reasonable so that you can diagnose problems after the fact.

Here is the source to our worker script, gm_worker1.php:

We require the shared library so we can use its centralized definition of the gearman servers as well as the logging capabilities.  We instantiate a new Net_Gearman_Worker object, and then we tell the gearman server about our capabilities.  In this case, we’re able to run tasks of type “Example”.  Once it’s informed the gearman server of its capabilities, it calls beginWork(), where it goes into an infinite loop, waiting for jobs, running them, and then waiting again.

The code responsible for actually running these tasks is found in a class called Example1.php:

This is a very simplistic job class.  It takes the first argument passed in and runs it as a command.  While this is a bit contrived, you could do real work with something like this — imagine if the command line was something like an ffmpeg call to transcode a video file.  You could build a distributed transcode solution this way.

The last piece is our client code, the code responsible for queuing up the jobs.   Here is the source for the client, gm_client1.php:

There’s really not much to this script.  We require the shared library so that we can use its centralized definition of the gearman servers, and we instantiate a new Net_Gearman_Client object.

We then use the “magic” method Example() to run atask of type “Example”. Notice how we pass in our single argument as a 1-element array, not as a string value.  Again, you have to be careful here, because if you do pass in a string value, the job class will get a single character when it references $arg[0].  Obviously, the worker won’t be able to execute that command.

So now we have a working client/server gearman application.  You can start up the gm_worker1.php script, and it will wait for incoming jobs.  Run the gm_client1.php in another shell, and you should see the worker script start up the job and show you its output.

This should be enough to whet your appetite.  In the next installment, we’ll dig a little deeper into ways you can track your jobs and get data from them.

Part 2 · Part 3 · Part 4

 

Leave a Reply

Your email address will not be published. Required fields are marked *