Question

我正在尝试使用php和mysql创建一个社交书签网站。

当我保存网站的URL时，我希望能够在我的数据库的表格中保存网站的标题，图标和描述，然后使用ajax在我的页面上打印它们。

如何从网站中提取这些元素？

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<?php
$myServer = "localhost";
$myUser = "root";
$myPass = "'100pushups'";
$myDB = "social_bookmarking";

//connection to the database
$connect = mysqli_connect($myServer,$myUser, $myPass)
or die("Couldn't connect to SQLServer on $myServer");

//select a database to work with
$selected = mysqli_select_db($connect, $myDB)
or die("Couldn't open database $myDB");

var_dump($_POST);
//declare the SQL statement that will query the database
$url = "INSERT INTO url (url ) VALUES ('$_POST[url]')";
if (isset($_POST['value']))    
    {    
         // Instructions if $_POST['value'] exist
         echo 'Your url is ' .$url; 
            }
$data = get_meta_tags($url);
print_r($data);
if (!mysqli_query($connect, $url)) {
    die('Error: ' . mysql_error());
}
else
{
    echo "Your information was added to the database";  
}

mysqli_close($connect);
?>
</body>
</html>

我知道我的url在那里做错了，但我不知道如何在get_meta_tags中使用变量作为参数，因为该函数只接受文件名或字符串。

Answer 1

您可以使用以下方式获取标题:( https://stackoverflow.com/users/54680/jonathan-sampson提供）

<?php
    if ( $_POST["url"] ) {
        $doc = new DOMDocument();
        @$doc->loadHTML( file_get_contents( $_POST["url"] ) );
        $xpt = new DOMXPath( $doc );
        $output = $xpt->query("//title")->item(0)->nodeValue;
    } else {
        $output = "URL not provided";
    }
   echo $output;
?>

您可以使用以下方式获取图标：

<?php 
    $url = $_POST['url'];
    $doc = new DOMDocument();
    $doc->strictErrorChecking = FALSE;
    $doc->loadHTML(file_get_contents($url));
    $xml = simplexml_import_dom($doc);
    $arr = $xml->xpath('//link[@rel="shortcut icon"]');
    echo $arr[0]['href'];
?>

最后，您可以使用以下描述：

<?php
    $tags = get_meta_tags($_POST['url']);
    $description = $tags['description'];
    echo $description;
?>

Answer 2

您可以使用file_get_contents()功能获取网站的网站图标（除非它阻止您使用https）。例如：

$icon = file_get_contents("http://stackoverflow.com/favicon.ico");
// now save it

另一个选择是使用curl。如果您知道如何使用它，这是一个非常棒的PHP扩展。

使用这些方法，您也可以从网站上获取html内容。然后可以解析它们的PHP的任何HTML解析器库。或者可以使用REGEX（专家不经常推荐）。

Answer 3

有非常聪明的脚本/类可以帮助从dom获取内容。例如使用智能选择器。我建议使用其中之一。

这是一个很好的例子： http://simplehtmldom.sourceforge.net/

要获取页面内容，请使用file_get_contents或相同功能。

如何从网站中提取名称，描述和图标？

3 个答案: