将爬虫重定向到NGINX中的内部微服务

时间:2018-05-09 23:41:42

标签: php reactjs nginx opengraph

我正在运行使用create-react-app构建的客户端呈现的React应用程序,我需要使用OpenGraph元标记。我已经编写了一些PHP(基于此https://rck.ms/angular-handlebars-open-graph-facebook-share/),它基于JSON文件的内容为特定页面提供OpenGraph元标记。我需要做的是从NGINX内部将爬虫用户代理的请求传递给这个PHP页面。

server {
    server_name example.com www.example.com;

    root /var/www/example;
    index index.html;

    listen 80;

    location @crawler {
        fastcgi_pass unix:/run/php/php7.0-fpm.sock;
        fastcgi_index crawler.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
    }

    location / {
        if ($http_user_agent ~* "linkedinbot|googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {
            proxy_pass @crawler;

        }
        try_files $uri /index.html;
    }
}

这导致NGINX失败并出现以下错误:

May 10 00:01:59 ip-172-31-14-46 nginx[10400]: nginx: [emerg] invalid URL prefix in /etc/nginx/sites-enabled/example.com:23
May 10 00:01:59 ip-172-31-14-46 systemd[1]: nginx.service: Control process exited, code=exited status=1
May 10 00:01:59 ip-172-31-14-46 systemd[1]: Reload failed for A high performance web server and a reverse proxy server.

供参考 - 这里是PHP文件的内容:

<?php
// 1. get the content Id (here: an Integer) and sanitize it properly
$uri = $_SERVER[REQUEST_URI];
$hash = hash('md5', $uri);

// 2. get the content from a flat file (or API, or Database, or ...)
$contents = file_get_contents("./meta/". $hash . ".json");
$data = array();
if ($contents) {
    $data = json_decode($contents);
}
$data = array_merge(json_decode(file_get_contents("./meta/default.json")), $data);

// 3. return the page
return makePage($data); 

function makePage($data) {
    // 1. get the page
    $pageUrl = "https://example.com" . $uri;
    // 2. generate the HTML with open graph tags
    $html  = '<!doctype html>'.PHP_EOL;
    $html .= '<html>'.PHP_EOL;
    $html .= '<head>'.PHP_EOL;
    $html .= '<title>'.$data->title.'</title>'.PHP_EOL;
    $html .= '<meta property="og:title" content="'.$data->title.'"/>'.PHP_EOL;
    $html .= '<meta property="og:description" content="'.$data->description.'"/>'.PHP_EOL;
    $html .= '<meta property="og:image" content="'.$data->poster.'"/>'.PHP_EOL;
    $html .= '<meta http-equiv="refresh" content="0;url='.$pageUrl.'">'.PHP_EOL;
    $html .= '</head>'.PHP_EOL;
    $html .= '<body></body>'.PHP_EOL;
    $html .= '</html>';
    // 3. return the page
    echo $html;
}

1 个答案:

答案 0 :(得分:1)

从错误中看,您错过了传递给proxy_pass的地址的网址前缀,也许应该是:fastcgi_pass http://unix:/run/php/php7.0-fpm.sock;

针对同一问题,请参阅此问答:Nginx invalid URL prefix