Question

我想在PHP中检索链接（网页）的HTML代码。例如，如果链接是

然后我想要提供的页面的HTML代码。我想检索这个HTML代码并将其存储在PHP变量中。

我该怎么做？

Answer 1

如果你的PHP服务器允许url fopen包装器，那么最简单的方法是：

$html = file_get_contents('http://stackoverflow.com/questions/ask');

如果您需要更多控制权，那么您应该查看cURL函数：

$c = curl_init('http://stackoverflow.com/questions/ask');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

Answer 2

另外，如果你想以某种方式操纵检索到的页面，你可能想尝试一些php DOM解析器。我发现PHP Simple HTML DOM Parser非常容易使用。

Answer 3

您可能想要查看Yahoo的YQL库：http://developer.yahoo.com/yql

手头的任务就像

一样简单

select * from html where url = 'http://stackoverflow.com/questions/ask'

您可以在控制台中尝试此操作：http://developer.yahoo.com/yql/console（需要登录）

另请参阅Chris Heilmanns的截屏视频，了解您可以做些什么：http://developer.yahoo.net/blogs/theater/archives/2009/04/screencast_collating_distributed_information.html

Answer 4

简单方法：使用file_get_contents()：

$page = file_get_contents('http://stackoverflow.com/questions/ask');

请注意，allow_url_fopen true必须php.ini才能使用支持网址的fopen包装。

更高级的方式：如果您无法更改PHP配置，默认情况下allow_url_fopen为false，如果安装了ext / curl，请使用cURL library连接到所需的页面。

Answer 5

看看这个功能：

http://ru.php.net/manual/en/function.file-get-contents.php

Answer 6

如果您想将源存储为变量，则可以使用file_get_contents，但curl是更好的实践。

$url = file_get_contents('http://example.com');
echo $url;

此解决方案将显示您网站上的网页。然而，卷曲是一个更好的选择。

Answer 7

以下是两种不同的，简单的从网址获取内容的方法：

1）第一种方法

从您的主机（php.ini或某处）启用Allow_url_include

<?php
$variableee = readfile("http://example.com/");
echo $variableee;
?>

或

2）第二种方法

启用php_curl，php_imap和php_openssl

<?php
// you can add anoother curl options too
// see here - http://php.net/manual/en/function.curl-setopt.php
function get_dataa($url) {
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
  curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$variableee = get_dataa('http://example.com');
echo $variableee;
?>

Answer 8

include_once('simple_html_dom.php');
$url="http://stackoverflow.com/questions/ask";
$html = file_get_html($url);

您可以使用此代码将整个HTML代码作为数组（已解析的表单）获取在这里下载'simple_html_dom.php'文件 http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download

Answer 9

$output = file("http://www.example.com");直到我启用：{7}的allow_url_fopen, allow_url_include,中的file_uploads和php.ini for PHP7

Answer 10

您也可以使用DomDocument方法来获取单个HTML标记级变量

$homepage = file_get_contents('https://www.example.com/');
$doc = new DOMDocument;
$doc->loadHTML($homepage);
$titles = $doc->getElementsByTagName('h3');
echo $titles->item(0)->nodeValue;

Answer 11

我尝试了这段代码，并且对我有用。

$html = file_get_contents('www.google.com');
$myVar = htmlspecialchars($html, ENT_QUOTES);
echo($myVar);

如何在PHP中获取网页的HTML代码？

11 个答案: