我确实有很长的HTML表输出,其中包含许多记录。示例如下所示:
<tr onclick="window.location='/team/90687/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">3</td>
<td class="user">
<span class="name">Mosse John</span>
</td>
<td class="number">3332</td>
<td class="number">497</td>
<td class="number">20</td>
</tr>
<tr onclick="window.location='/team/342465/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">2</td>
<td class="user">
<span class="name">Sus Peter</span>
</td>
<td class="number">3332</td>
<td class="number">450</td>
<td class="number">20</td>
</tr>
现在我想提取包含属于90687的用户的部分,所以我输入:
sed my_html_file -e '/window.location.*90687/,/window.location/ !d'
不幸的是,它也提到了下一个会话的第一行,我想避免。我确实通过了101 sed和awk技巧,但我找到的唯一解决方案是
sed my_html_file -e '/window.location.*90687/,+9 !d'
这意味着我有兴趣在模式之后获取9行。问题是我不能依赖&#34; 9&#34;或任何其他数字。是否有任何方法可以解决它? 顺便说一下,我对sed非常感兴趣。
答案 0 :(得分:0)
简单的数据解决方案:
sed my_html_file -e '/window.location.*90687/,/<\/tr>/ !d'
这将打印所有行,直到满足结束标记</tr>
。
更复杂的解决方案:
sed my_html_file -n -e '/window.location.*90687/,/window.location/ { H;x; /window.location.*window.location/ !{ x;p }} '
这将打印所有行,直到满足第二个window.location
。
答案 1 :(得分:0)
如果您不确定是否可以使用以下记录内联结束sed -n -E '/window\.location.*90687/,/<\/tr>/ {
/<\/tr>/! { p }
/<\/tr>/ { s/(.*)<\/tr>.*$/\1<\/tr>/ p } }
' input.txt
,则可以尝试此操作
<tr onclick="window.location='/team/90687/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">3</td>
<td class="user">
<span class="name">Mosse John</span>
</td>
<td class="number">3332</td>
<td class="number">497</td>
<!-- Confusing Row -->
<td class="number">20</td></tr> <tr onclick="window.location='/team/342465/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">2</td>
<td class="user">
<span class="name">Sus Peter</span>
</td>
<td class="number">3332</td>
<td class="number">450</td>
<td class="number">20</td>
</tr>
虽然可能有更优雅的解决方案,但这也会处理这样的事情:
import boto3
import psycopg2
print('Loading function')
def lambda_handler(event, context):
client = boto3.client('redshift')
dbname = 'medsynpuf'
dbuser = 'temp_user_cred'
response = client.describe_clusters(ClusterIdentifier=dbname)
pwresp = client.get_cluster_credentials(DbUser=dbuser,DbName=dbname,ClusterIdentifer=dbname,DurationSeconds=3600,AutoCreate=True, DbGroups=['vbpread'])
dbpw = pwresp['DbPassword']
dbusr = pwresp['DbUser']
endpoint = response['Clusters'][0]['Endpoint']['Address']
print(dbpw)
print(dbusr)
print(endpoint)
con = psycopg2.connect(dbname=dbname, host=endpoint, port='5439', user=dbusr, password=dbpw)
cur = con.cursor()
query1 = open("001_copd_yearly_count.sql","r")
cur.execute(query1.read())
query1_results = cur.fetchall()
result = query1_results
return result