使用PHP抓取网站时如何与页面元素进行交互?

时间:2011-05-09 16:29:25

标签: php curl web-scraping

我需要转到http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx,在字段中输入数据,然后点击按钮获取结果。当进入结果页面时,我会得到一个数据表,但它被分为5个不同的页面。

我可以使用cURL执行上述操作,但此时我已陷入困境。

进入结果页面后,我需要两次点击“日期”标题,按日期减少数据顺序,然后浏览当天的结果。

知道如何做到这一点,高级细节还是概念?无论哪种方式都应该有所帮助。

谢谢!

3 个答案:

答案 0 :(得分:1)

问题是点击实际上是使用javascript执行回发,由于PHP和cURL的限制,您需要检查浏览器发送的HTTP标头(GET,POST和COOKIES)并模拟它们。请记住,某些值可能取决于会话。现在我没有时间为你做这个,但我知道在某些情况下ASP.Net网站可能会非常棘手。可能有更简单的方法,但这就是它总是归结为什么,因为这就是发生的事情。

如果你没有打开PHP的全部选项打开 - 例如,我正在研究的项目中的聚合器实际上能够专门为这些类型的任务/页面执行(控制)javascript(尽管如此)在更大的范围内。)

答案 1 :(得分:0)

我无法获得一组有效的结果 - 如果您可以发布一些提供结果的虚拟数据,那将会有所帮助。

作为一般答案,您需要能够操纵DOM的东西。您可以使用像PHP和Webdriver这样的服务器端,或者使用Selenium的纯客户端。模拟点击,获取生成的HTML并解析它。

答案 2 :(得分:0)

这应该有效。试试这个。

$url    ='http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx';

## do curl , with cookies enabled.


## after do this.

$url    =$url.'?'.'__EVENTTARGET=Search%3AdgSearch%3A_ctl2%3A_ctl1&__EVENTARGUMENT=&__VIEWSTATE=dDwtMjk2Mjk5NzczO3Q8O2w8aTwxPjs%2BO2w8dDw7bDxpPDE%2BOz47bDx0PDtsPGk8Mz47aTwxNz47aTwxOT47PjtsPHQ8dDw7cDxsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0PjtpPDU%2BOz47bDxwPDIwMDY7MjAwNj47cDwyMDA3OzIwMDc%2BO3A8MjAwODsyMDA4PjtwPDIwMDk7MjAwOT47cDwyMDEwOzIwMTA%2BO3A8MjAxMTsyMDExPjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VmlzaWJsZTs%2BO2w8bzx0Pjs%2BPjs%2BOzs%2BO3Q8QDA8cDxwPGw8Q3VycmVudFBhZ2VJbmRleDtQYWdlQ291bnQ7XyFJdGVtQ291bnQ7XyFEYXRhU291cmNlSXRlbUNvdW50O0RhdGFLZXlzOz47bDxpPDA%2BO2k8ND47aTwxMD47aTw0MD47bDw%2BOz4%2BOz47Ozs7Ozs7Ozs7PjtsPGk8MD47PjtsPHQ8O2w8aTwyPjtpPDM%2BO2k8ND47aTw1PjtpPDY%2BO2k8Nz47aTw4PjtpPDk%2BO2k8MTA%2BO2k8MTE%2BOz47bDx0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzNjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8xNjE3NzE0OSAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFNVTlRSVVNUIE1PUlRHQUdFIElOQzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TkFUSEFOSUVMIEdBQkJBUkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDEzNTQgVkFOREVSVkVFUiBBVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0MTU7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMTk2MzQ4ODUgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvMi8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxUSElSRCBGRURFUkFMIFNBVklOR1MgQU5EIExPQU4gQVNTTiBPRiBDTEVWRUxBTkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEdBWUxFIE5BU0g7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDg5NDEgQ09YIFJEIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTUwMztodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMjY1MTYxMiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS85LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFUgUyBCQU5LIE4gQTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TE9VSVMgTUlSTUFOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw2OTkxIEdBUlkgTEVFIERSIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQ5MjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMzk3NTc5MiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS82LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEZJRlRIIFRISVJEIE1PUlRHQUdFIENPOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxSQVlNT05EIFNURUlOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMzU5IFRIUlVTSCBBVkUgRkFJUkZJRUxELCBPSCA0NTAxNDs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDM4O2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI0NzgyOTYzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzMvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8V0VMTFMgRkFSR08gQkFOSyBOIEE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEpBTkVUIEJPRUhNOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw4NjA4IEdPTERGSU5DSCBXQVkgV0VTVCBDSEVTVEVSLCBPSCA0NTA2OTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDQwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI1NTkwMjAzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8RklGVEggVEhJUkQgQkFOSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VEhFT0RPUkUgQ09PSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8UE8gQk9YIDE3MTEgV0VTVCBDSEVTVEVSLCBPSCA0NTA3MTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDkwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI2ODY3MDkxICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzYvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Q0lUSUZJTkFOQ0lBTCBJTkM7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPERPTk5BIE1BUkRJUzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NjU0OSBDQU5BU1RPVEEgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0Njg7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMjk4NzU2MDIgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvNS8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxDSVRJTU9SVEdBR0UgSU5DOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxNQVRUSEVXIEJMVU5ERUxMOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxNDEyIEhFTE1BIEFWRSBIQU1JTFRPTiwgT0ggNDUwMTM7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzMjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8zMjI0MzYxNyAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFdFTExTIEZBUkdPIEJBTksgTiBBOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxKT0hOIEJPV01BTjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Jm5ic3BcOzs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDYzO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzQyMjcwMTE5ICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VSBTIEJBTksgTkFUSU9OQUwgQVNTT0NJQVRJT047Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEJSWUFOIFNDSE1JRFQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDI4OTUgV0VFUElORyBXSUxMT1cgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47Pj47Pj47Pj47Pj47Pj47PtVTse1TdIXrxq%2FXrY%2Fp22QQ7pAh&Search%3AddlMonth=5&Search%3AddlYear=2011&Search%3AtxtCompanyName=&Search%3AtxtLastName=&Search%3AtxtCaseNumber=';

## DO curl with cookies on again
相关问题