通过使用soup.findAll无法获得一些标签吗?

时间:2018-12-07 10:47:42

标签: python-2.7 beautifulsoup

这是HTML代码,您可以看到有两个标签,即<code>, <img>。 现在,我要重点关注的是,当您向右滚动 little 时,您会在code标签之后看到一个img标签。

问题

现在主要的问题是,我想要所有代码标签,为此我正在使用bs4,但是我可以获得紧接在图像标签之后的代码标签。我不知道为什么?有什么主意吗?

<code style="display: none" id="bpr-guid-1535430">
      {&quot;data&quot;:{&quot;mediaConfig&quot;:{&quot;mprConfig&quot;:{&quot;sizes&quot;:[{&quot;width&quot;:60,&quot;height&quot;:30,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:60,&quot;height&quot;:36,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:90,&quot;height&quot;:45,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:90,&quot;height&quot;:54,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:50,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:100,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:120,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:120,&quot;height&quot;:72,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:127,&quot;height&quot;:30,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:127,&quot;height&quot;:46,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:150,&quot;height&quot;:75,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:150,&quot;height&quot;:90,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:191,&quot;height&quot;:45,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:191,&quot;height&quot;:69,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:100,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:120,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:200,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:254,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:254,&quot;height&quot;:92,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:337,&quot;height&quot;:120,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:400,&quot;height&quot;:400,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:506,&quot;height&quot;:180,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:674,&quot;height&quot;:240,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:750,&quot;height&quot;:750,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;}],&quot;filters&quot;:{&quot;cover&quot;:&quot;https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}&quot;,&quot;contain&quot;:&quot;https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}&quot;,&quot;original&quot;:&quot;https://media.licdn.com/media{+id}&quot;,&quot;fill&quot;:&quot;https://media.licdn.com/mpr/mpr/shrink_{width}_{height}{+id}&quot;,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorFilters&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorConfig&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaConfig&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.Configuration&quot;},&quot;included&quot;:[]}
    </code>

<img src="" style="display: none" class="datalet-bpr-guid-1535430"><code style="display: none" id="bpr-guid-1535431">
  {&quot;data&quot;:{&quot;canBrowseProfiles&quot;:false,&quot;reactivationFeaturesEligible&quot;:false,&quot;canViewJobAnalytics&quot;:false,&quot;canViewWVMP&quot;:false,&quot;premiumFreeTrialEligible&quot;:true,&quot;canViewCompanyInsights&quot;:false,&quot;$type&quot;:&quot;com.linkedin.voyager.premium.FeatureAccess&quot;},&quot;included&quot;:[]}
</code>

<code style="display: none" id="datalet-bpr-guid-1535431">
  {"request":"/voyager/api/premium/featureAccess?name\u003DreactivationFeaturesEligible","status":200,"body":"bpr-guid-1535431"}
</code>

<img src="" style="display: none" class="datalet-bpr-guid-1535431"><code style="display: none" id="bpr-guid-1535432">
  {&quot;data&quot;:{&quot;companies&quot;:[],&quot;$deletedFields&quot;:[&quot;paidProducts&quot;,&quot;postJobsEnabled&quot;],&quot;memberGroup&quot;:&quot;FREE&quot;,&quot;showStaticLearning&quot;:false,&quot;$type&quot;:&quot;com.linkedin.voyager.common.Nav&quot;,&quot;$id&quot;:&quot;M8x5UY0Zt6eGdBCiy+iKhA&#61;&#61;,root&quot;},&quot;included&quot;:[]}
</code>

<code style="display: none" id="datalet-bpr-guid-1535432">
  {"request":"/voyager/api/nav","status":200,"body":"bpr-guid-1535432"}
</code>

下面是我在python中使用的代码。

h = HTMLParser()

companyname = sys.argv[1]

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',

}
url = 'https://www.linkedin.com/search/results/all/?keywords='+companyname+'&origin=GLOBAL_SEARCH_HEADER'
req = requests.get(url, headers=headers)
finding = BeautifulSoup(req.content, 'lxml')



for x in finding.findAll('code'):
    print x

0 个答案:

没有答案