PHP Classes

Does it scale?

Recommend this page to a friend!

      File cache class  >  All threads  >  Does it scale?  >  (Un) Subscribe thread alerts  
Subject:Does it scale?
Summary:writers may not get a turn if the file is busy
Messages:6
Author:Colin McKinnon
Date:2013-08-01 11:11:35
Update:2013-08-02 03:06:27
 

  1. Does it scale?   Reply   Report abuse  
Picture of Colin McKinnon Colin McKinnon - 2013-08-01 11:11:35
I see the code uses shared locks for readers and exclusive locks for writers but I can't see how it ensures that a writer will not blocked for a long time.

Since we're talking about a cache (mostly read) rather than say a log file (mostly write) a writer could be blocked indefinitely by a large number of readers, e.g.

reader1 calls flock($f, LOCK_SH) - lock returned immediately
writer1 calls flock($f, LOCK_EX) - blocked
reader2 calls flock($f, LOCK_SH) - lock returned immediately
reader1 unlocks
reader3 calls flock($f, LOCK_SH) - lock returned immediately
reader2 unlocks
...

As long as there are overlapping shared locks, no LOCK_EX will be granted.

  2. Re: Does it scale?   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2013-08-01 12:05:32 - In reply to message 1 from Colin McKinnon
That is a pertinent question.

This class is widely used in the PHP Classes. Sometimes the locking fails without a trace for the actual reason it fails.

Your explanation may explain the failures. I do not have enough information to evaluate that because I am not sure about the details of whether there is an implicit timeout for failed locks. I would need to study this further.

Anyway, do you have a proposal for a better solution?

  3. Re: Does it scale?   Reply   Report abuse  
Picture of Colin McKinnon Colin McKinnon - 2013-08-01 13:23:17 - In reply to message 2 from Manuel Lemos
A better solution?

For a really scalable way of doing it, then I'd go with an event based server, a linked list of lock requests and a new lock type for cache files. But this would be overkill for most sites.

I cam across your class while trying to solve this problem myself - my code is shown below. As yet I've not tested it. It does not preclude two ore more processes from trying to refresh the cache but ensures that the cache file only contains a comlpete data set written by a single process:

<?php
define("SSICACHE", "/var/cache/ssi"); // cached output
define("ORIG_SCRIPT", "/var/cacheable/fragments"); // input
define("CACHETTL",86400);

$cachefile=SSICACHE . $_SERVER['REQUEST_URI'];
ignore_user_abort(true);

if (is_readable($cachefile)) {
if (CACHETTL>time()-filemtime($cachefile)) {
readfile($cachefile);
exit;
}
}

ob_start();
include(ORIG_SCRIPT . $_SERVER['SCRIPT_NAME']);
writecache(ob_get_flush(),$cachefile);
exit;

function writecache($content,$cachefile)
{
// check it hasn't been updated since last time
clearstatcache();
if (CACHETTL>time()-filemtime($cachefile)) {
return;
}
$scratch=$cachefile . "." . getmypid();
if (file_put_contents($scratch)) {
if (@unlink($cachefile)) {
@link($scratch, $cachefile);
}
@unlink($scratch);
}
}

  4. Re: Does it scale?   Reply   Report abuse  
Picture of Colin McKinnon Colin McKinnon - 2013-08-01 13:27:28 - In reply to message 3 from Colin McKinnon
(there's a very small window where the target file does not exist - between

if (@unlink($cachefile)) {

and

@link($scratch, $cachefile);

  5. Re: Does it scale?   Reply   Report abuse  
Picture of Colin McKinnon Colin McKinnon - 2013-08-01 13:40:45 - In reply to message 4 from Colin McKinnon
(did I mention this is currently a work in progress?)

The missing file problem can be solved (on a non MSWindows system) by:

function writecache($content,$cachefile)
{
// check it hasn't been updated since last time
clearstatcache();
if (CACHETTL>time()-filemtime($cachefile)) {
return;
}
$scratch=$cachefile . "." . getmypid();
if (file_put_contents($scratch)) {
@rename($scratch, $cachefile);
}
}

  6. Re: Does it scale?   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2013-08-02 03:06:27 - In reply to message 3 from Colin McKinnon
Well not being able to assure that only one script can update the cache at any moment is not really acceptable for many applications, as they use that guarantee to assure that the code that generates the content will only run once.

Another approach would be to rename the old cache file if it is outdated and a new one needs to be created. In the worst case, for a short while existing read lockers will be reading from the outdated cache file.

There could be still some race conditions but I think it is in general safer at least under Unix like systems that allow you to rename a file may be already opened.