Question

我正在用PHP构建日志解析器。日志解析器程序在无限循环中运行并扫描日志行，然后对每行进行一些额外的处理。

日志解析器使用inotify来检测日志文件是否被修改，然后再次打开文件，转到先前处理的行号，然后继续处理。先前处理的行号存储在变量中，并在每次处理日志行时递增。它也存储在一个文件中，因此如果日志程序崩溃，它可以在上次停止处理的地方继续。

我的问题是，如果日志被修改，解析器程序不会刷新最初在修改之前打开的文件的内容，这意味着在循环迭代到日志结束之后，它正在等待inotify表示文件已被修改，这很好，但随后它再次重新打开整个文件并再次逐行进入最后处理的行。如果日志包含很多行，这可能会导致性能问题。如何避免这种情况并立即获取文件更新，而无需重新打开文件并再次跳过N个已处理的行？

示例代码：

$ftp_log_file = '/var/log/proftpd/my_log.log';
$ftp_log_status_file = '/var/log/proftpd/log_status.log';
if ( ! file_exists($ftp_log_status_file)) {
  die("failed to load the ftp log status file $ftp_log_status_file!\n");
}
$log_status = json_decode(file_get_contents($ftp_log_status_file));

if ( ! isset($log_status->read_position)) {
  $read_position = 0;
} else {
  $read_position = $log_status->read_position;
}

// Open an inotify instance
$inoInst = inotify_init();
$watch_id = inotify_add_watch($inoInst, '/var/log/proftpd/my_log.log', IN_MODIFY);

while (1) {
  $current_read_index = 0;
  $events = inotify_read($inoInst);

  $fd = fopen($ftp_log_file, 'r+');
  if ($fd === false)
    die("unable to open $ftp_log_file!\n");

  while ($line = trim(fgets($fd))) {
    $current_read_index++;
    if ($current_read_index < $read_position) {
      continue;
    }

    // DO SOME LOG PROCESSING

    $read_position++;
    $log_status->read_position++;
    file_put_contents($ftp_log_status_file, json_encode($log_status));
  }
  fclose($fd);
}

// stop watching our directory
inotify_rm_watch($inoInst, $watch_id);

// close our inotify instance
fclose($inoInst);

Answer 1

fgets似乎记得文件结束已到达的事实，未来的fgets会无声地失败。 fgets（）之前的explict fseek（）似乎解决了这个问题。

<?php

$inoInst = inotify_init();
inotify_add_watch($inoInst, 'foo.txt', IN_MODIFY);

$f = fopen('foo.txt', 'r');

for (;;) {
    while ($line = fgets($f)) {
        echo $line;
    }

    inotify_read($inoInst);
    fseek($f, 0, SEEK_CUR); // make fgets work again
}

请注意，仍然存在不完整的行问题。您当前正在阅读的行可能尚未完成（例如，proftpd将通过下一次write（）调用完成它。）

由于fgets不会让你知道它是否到达换行符或文件末尾，我看不到从头脑中处理这个问题的便捷方法。我唯一能想到的是一次读取N个字节并自行拆分。

日志解析器PHP - 在另一个进程修改日志时解析日志

1 个答案: