multipart / form-data php curl

时间:2013-09-07 18:06:45

标签: php curl ocr

我正在使用i2ocr.com的OCR服务将图像转换为文本..

在我的项目中,我需要自动完成这项工作,所以我使用PHP来获取图像的文本。

在OCR网站中,postdata包含在multipart / form-data

的形式中

喜欢这样:

-----------------------------32642708628732\r\n
Content-Disposition: form-data; name="i2ocr_options"\r\n
\r\n
url\r\n
-----------------------------32642708628732\r\n
Content-Disposition: form-data; name="i2ocr_uploadedfile"\r\n
\r\n
\r\n
-----------------------------32642708629732\r\n
Content-Disposition: form-data; name="i2ocr_url"\r\n
\r\n
http://www.murraydata.co.uk/wp-content/uploads/2013/02/ocr-font-500x220.jpg\r\n
-----------------------------32642708628732\r\n
Content-Disposition: form-data; name="i2ocr_languages"\r\n
\r\n
gb,eng\r\n
-----------------------------32642708628732--\r\n

在PHP中,我正在使用

$ch = curl_init();
$dt = array();
$dt['i2ocr_options'] = 'url';
$dt['i2ocr_uploadedfile'] = '';
$dt['i2ocr_url'] = 'http://www.murraydata.co.uk/wp-content/uploads/2013/02/ocr-font-500x220.jpg';
$dt['i2ocr_languages'] = 'gb,eng';


    curl_setopt($ch, CURLOPT_URL,"http://www.i2ocr.com/process_form");    
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0");
    curl_setopt($ch,CURLOPT_ENCODING,"gzip,deflate");
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type: multipart/form-data; boundary=---------------------------32642708628732"));
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_REFERER, "http://www.i2ocr.com/");
    curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "$dt");
    $html=curl_exec($ch);

    print_r($html);

此代码不会产生任何错误,但我也没有得到任何输出。

我需要帮助从此卷曲请求中获取输出。

1 个答案:

答案 0 :(得分:0)

像这样:

<?php
function get($url, $refer, $ch)
{
        curl_setopt ($ch, CURLOPT_URL,$url); 
        curl_setopt ($ch, CURLOPT_POST, 0);  
        curl_setopt ($ch, CURLOPT_COOKIEJAR, realpath('cookie.txt')); // cookie.txt 
        curl_setopt ($ch, CURLOPT_COOKIEFILE, realpath('cookie.txt'));
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux i586; de; rv:5.0)         Gecko/20100101 Firefox/5.0');
    curl_setopt ($ch, CURLOPT_REFERER, $refer);
    $result= curl_exec($ch);
    return $result;                 
}
function post($url, $refer, $parametros, $ch)
{
    curl_setopt ($ch, CURLOPT_URL,$url); 
    curl_setopt ($ch, CURLOPT_POST, 1); 
    curl_setopt ($ch, CURLOPT_POSTFIELDS, $parametros); 
    curl_setopt ($ch, CURLOPT_COOKIEJAR, realpath('cookie.txt')); // cookie.txt 
    curl_setopt ($ch, CURLOPT_COOKIEFILE, realpath('cookie.txt'));
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux i586; de; rv:5.0) Gecko/20100101 Firefox/5.0');
    curl_setopt ($ch, CURLOPT_REFERER, $refer);
    $result= curl_exec($ch);
    return $result;                 
}
function hazlo() {
$ch = curl_init();
/* STEP 1. visito la primera pagina para coger sus cookies */
get ("http://www.i2ocr.com/", "http://www.i2ocr.com/", $ch);

//STEP 2. Creo un array con los datos del post
$data = array(
'i2ocr_options' => 'url',
'i2ocr_uploadedfile' => '',
'i2ocr_url' => 'http://www.murraydata.co.uk/wp-content/uploads/2013/02/ocr-font-    500x220.jpg',
'i2ocr_languages' => 'gb,eng'
);
$data2 = http_build_query($data);

//STEP 3. Enviamos el el array en post
echo post ("http://www.i2ocr.com/process_form", "http://www.i2ocr.com/", $data2, $ch);
}
hazlo();
?>

使用view source查看响应html,你可以看到图片的文字(对不起我的英文)。 100%工作:)