Question

我正在使用Python从日语网站中抓取数据，该网站提供英语和日语两种语言。 Link here

问题是我得到了我需要的数据，但是使用了错误的语言（两种语言的链接相同）。我尝试检查html页面，并看到如下元素“ lang”：

struct B : public A { using A::foo; // ... };

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

这是我使用的代码：

<html xmlns="http://www.w3.org/1999/xhtml" lang="ja" xml:lang="ja" class="">

import requests
import lxml.html as lh
import pandas as pd

url='https://data.j-league.or.jp/SFMS01/search?team_ids=33&home_away_select=0'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')

这时，我从页面中获得了表格的首行，但为日语版本。我是Python的新手，而且还很熟。我不知道是否可以使用任何方法来获取英语数据？如果我可以使用任何现有的示例，模板或其他资源，那会更好。

谢谢！

Answer 1

我访问了您添加的网站，因此对于英语它添加了一个cookie（请查看“网络”标签中Request URL: https://data.j-league.or.jp/SFMS01/search?team_ids=33&home_away_select=0的标题），您将看到
Set-Cookie: SFCM01LANG=en; Max-Age=63072000; Expires=Tue, 18-Oct-2022 19:14:29 GMT; Path=/

所以我基本上已经用过了，将您的代码段更改为此

import requests
import lxml.html as lh
import pandas as pd

url='https://data.j-league.or.jp/SFMS01/search?team_ids=33&home_away_select=0'
page = requests.get(url, cookies={'SFCM01LANG':'en'})
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')

如何使用Python抓取多语言网站

1 个答案: