I’m a big fan of the Guzzle PHP HTTP client. I use it whenever I need to make requests of 3rd party APIs from my applications. If you’re still writing cURL requests by hand or have rolled your own HTTP client, I highly recommend checking out Guzzle.
I’m currently making heaviest use of Guzzle in my photo-a-day project, Flaming Archer, in order to get photo data from Flickr. To keep from hammering the Flickr API, I’m caching all of those requests. Guzzle makes caching ridiculously easy by way of their plugin system and their HTTP Cache plugin.
The problem with the caching plugin, at least at first blush, is how to bypass the cache in certain specific instances where caching might not be appropriate. The docs are a little light in this area, so it took me a few minutes to get it sorted out. Let’s start at the top.
The Guzzle Client
“Clients create requests, send requests, and set responses on a request object. When instantiating a client object, you can pass an optional "base URL” and optional array of configuration options."
Here’s an example of creating a Guzzle Client, based on my use case of making requests against the Flickr API.
1 2 3 4 5 6 7 8 |
|
I use the client for the GET
requests I need to make against the Flickr API. Each request will include
the above default options in the query string. Nice!
Adding Caching
Since I don’t want to hammer the crap out of the Flickr API and start hitting the rate limit1, I wanted to cache each request. Thankfully, Guzzle has an awesome plugin system that includes an HTTP Cache plugin.
“Guzzle can leverage HTTP’s caching specifications using the Guzzle\Plugin\Cache\CachePlugin. The CachePlugin provides a private transparent proxy cache that caches HTTP responses.”
Rather than rolling my own caching strategy (My first solution was to write a decorator for caching), I decided to use Guzzle’s native plugin and leave all the caching work to them.
1 2 3 4 5 6 7 8 9 10 11 |
|
The cache plugin will now intercept and cache GET
and HEAD
requests made by the
client.
Custom Caching Decisions
So what if, now that you’re caching each GET
request, there’s a request or
requests you don’t want cached? Guzzle makes solving that problem trivial by
allowing for “custom caching decisions”,
but the documentation on how to make those custom decisions is decidedly light.
“… you can set a custom can_cache object on the constructor of the CachePlugin and provide a Guzzle\Plugin\Cache\CanCacheInterface object. You can use the Guzzle\Plugin\Cache\CallbackCanCacheStrategy to easily make a caching decision based on an HTTP request and response.”
Wat?
That was clear as mud to me, so I spent a few minutes digging through the source. This is what I came up with:
- The
CallbackCanCacheStrategy
provides a method of providing callbacks to the cache plugin that, based on a boolean response, determine whether or not a particular request or response should be cached. - The
CallbackCanCacheStrategy
accepts two optional arguments to its constructor: a callable that will be invoked for requests and a callable that will be invoked for responses. The request callback gets an instance ofGuzzle\Http\Message\RequestInterface
, and the response callback gets an instance ofGuzzle\Http\Message\Response
.
Bypassing Cache
In my case, I want to cache everything except for calls to the
flickr.photos.search
API method. Since all of the GET
requests I’m making include a method
query string param,
it was trivial to write the callback that got me where I needed to go.
1 2 3 4 5 6 7 8 9 |
|
Putting It All Together
Now that I’ve built my $client
, $storage
, and $canCache
strategy, here’s
how I put it all together.
1 2 3 4 5 6 |
|
Now all of my GET
requests are cached except for those using the flickr.photos.search
method. BOOM.
-
I can’t find the documentation on API rate limiting right now, but I know it’s limited and I don’t want to hit that limit.↩