sed&the out HTML部分

时间:2018-04-09 20:00:40

标签: sed

我确实有很长的HTML表输出,其中包含许多记录。示例如下所示:

<tr onclick="window.location='/team/90687/';" style="cursor: pointer;" class="">
  <td class="number">163124</td>
  <td class="img">3</td>
  <td class="user">
    <span class="name">Mosse John</span>
  </td>
  <td class="number">3332</td>
  <td class="number">497</td>
  <td class="number">20</td>
</tr>
<tr onclick="window.location='/team/342465/';" style="cursor: pointer;" class="">
  <td class="number">163124</td>
  <td class="img">2</td>
  <td class="user">
    <span class="name">Sus Peter</span>
  </td>
  <td class="number">3332</td>
  <td class="number">450</td>
  <td class="number">20</td>
</tr>

现在我想提取包含属于90687的用户的部分,所以我输入:

sed my_html_file -e '/window.location.*90687/,/window.location/ !d'

不幸的是,它也提到了下一个会话的第一行,我想避免。我确实通过了101 sed和awk技巧,但我找到的唯一解决方案是

sed my_html_file -e '/window.location.*90687/,+9 !d'

这意味着我有兴趣在模式之后获取9行。问题是我不能依赖&#34; 9&#34;或任何其他数字。是否有任何方法可以解决它? 顺便说一下,我对sed非常感兴趣。

2 个答案:

答案 0 :(得分:0)

简单的数据解决方案:

sed my_html_file -e '/window.location.*90687/,/<\/tr>/ !d'

这将打印所有行,直到满足结束标记</tr>

更复杂的解决方案:

sed my_html_file -n -e '/window.location.*90687/,/window.location/ { H;x; /window.location.*window.location/ !{ x;p }} '

这将打印所有行,直到满足第二个window.location

答案 1 :(得分:0)

如果您不确定是否可以使用以下记录内联结束sed -n -E '/window\.location.*90687/,/<\/tr>/ { /<\/tr>/! { p } /<\/tr>/ { s/(.*)<\/tr>.*$/\1<\/tr>/ p } } ' input.txt ,则可以尝试此操作

<tr onclick="window.location='/team/90687/';" style="cursor: pointer;" class="">
  <td class="number">163124</td>
  <td class="img">3</td>
  <td class="user">
    <span class="name">Mosse John</span>
  </td>
  <td class="number">3332</td>
  <td class="number">497</td>

  <!-- Confusing Row -->
  <td class="number">20</td></tr> <tr onclick="window.location='/team/342465/';" style="cursor: pointer;" class="">

  <td class="number">163124</td>
  <td class="img">2</td>
  <td class="user">
    <span class="name">Sus Peter</span>
  </td>
  <td class="number">3332</td>
  <td class="number">450</td>
  <td class="number">20</td>
</tr>

虽然可能有更优雅的解决方案,但这也会处理这样的事情:

import boto3
import psycopg2

print('Loading function')

def lambda_handler(event, context):

    client = boto3.client('redshift')
    dbname = 'medsynpuf'
    dbuser = 'temp_user_cred'
    response = client.describe_clusters(ClusterIdentifier=dbname)
    pwresp = client.get_cluster_credentials(DbUser=dbuser,DbName=dbname,ClusterIdentifer=dbname,DurationSeconds=3600,AutoCreate=True, DbGroups=['vbpread'])
    dbpw = pwresp['DbPassword']
    dbusr = pwresp['DbUser']
    endpoint = response['Clusters'][0]['Endpoint']['Address']
    print(dbpw)
    print(dbusr)
    print(endpoint)
    con = psycopg2.connect(dbname=dbname, host=endpoint, port='5439', user=dbusr, password=dbpw)
    cur = con.cursor()

    query1 = open("001_copd_yearly_count.sql","r")
    cur.execute(query1.read())
    query1_results = cur.fetchall()

    result = query1_results

    return result
相关问题