如何匹配多行模式

时间:2018-10-12 03:01:57

标签: r regex

我正在尝试匹配文本文件中的模式。只要图案位于一行内,它就可以很好地工作。但是,在某些情况下,模式可能会跨越两行。 我有以下代码:

#indicate the Name pattern to R
name_pattern = '<nameOfIssuer>([^<]*)</nameOfIssuer>'

#Collect information that match the pattern that we are looking #
datalines = grep(name_pattern, thepage[1:length(thepage)], value = TRUE)

#We will use gregexpr and gsub to extract the information without the html tags
#create a function first
getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
gg = gregexpr(name_pattern, datalines)
matches = mapply(getexpr, datalines, gg)
result = gsub(name_pattern, '\\1', matches)
result <- gsub("&amp;", "&", result)

names(result) = NULL

当文本为:

时效果很好
<nameOfIssuer>Posco ADR</nameOfIssuer>

如果文本如下所示,则不能播放:

<nameOfIssuer>Bank of
  America Corp</nameOfIssuer>

有人知道如何动态处理这两种情况吗?

全文如下:

<SEC-DOCUMENT>0001437749-18-018038.txt : 20181009
<SEC-HEADER>0001437749-18-018038.hdr.sgml : 20181009
<ACCEPTANCE-DATETIME>20181005183736
ACCESSION NUMBER:       0001437749-18-018038
CONFORMED SUBMISSION TYPE:  13F-HR
PUBLIC DOCUMENT COUNT:      2
CONFORMED PERIOD OF REPORT: 20180930
FILED AS OF DATE:       20181009
DATE AS OF CHANGE:      20181005
EFFECTIVENESS DATE:     20181009

FILER:

    COMPANY DATA:   
        COMPANY CONFORMED NAME:         DAILY JOURNAL CORP
        CENTRAL INDEX KEY:          0000783412
        STANDARD INDUSTRIAL CLASSIFICATION: NEWSPAPERS:  PUBLISHING OR PUBLISHING & PRINTING [2711]
        IRS NUMBER:             954133299
        STATE OF INCORPORATION:         SC
        FISCAL YEAR END:            0930

    FILING VALUES:
        FORM TYPE:      13F-HR
        SEC ACT:        1934 Act
        SEC FILE NUMBER:    028-15782
        FILM NUMBER:        181111587

    BUSINESS ADDRESS:   
        STREET 1:       915 EAST FIRST STREET
        CITY:           LOS ANGELES
        STATE:          CA
        ZIP:            90012
        BUSINESS PHONE:     2132295300

    MAIL ADDRESS:   
        STREET 1:       915 EAST FIRST STREET
        CITY:           LOS ANGELES
        STATE:          CA
        ZIP:            90012

    FORMER COMPANY: 
        FORMER CONFORMED NAME:  DAILY JOURNAL CO
        DATE OF NAME CHANGE:    19870427
</SEC-HEADER>
<DOCUMENT>
<TYPE>13F-HR
<SEQUENCE>1
<FILENAME>primary_doc.xml
<TEXT>
<XML>
<?xml version="1.0" encoding="UTF-8"?>
<edgarSubmission xmlns="http://www.sec.gov/edgar/thirteenffiler" xmlns:com="http://www.sec.gov/edgar/common">
  <headerData>
    <submissionType>13F-HR</submissionType>
    <filerInfo>
      <liveTestFlag>LIVE</liveTestFlag>
      <flags>
        <confirmingCopyFlag>false</confirmingCopyFlag>
        <returnCopyFlag>true</returnCopyFlag>
        <overrideInternetFlag>false</overrideInternetFlag>
      </flags>
      <filer>
        <credentials>
          <cik>0000783412</cik>
          <ccc>XXXXXXXX</ccc>
        </credentials>
      </filer>
      <periodOfReport>09-30-2018</periodOfReport>
    </filerInfo>
  </headerData>
  <formData>
    <coverPage>
      <reportCalendarOrQuarter>09-30-2018</reportCalendarOrQuarter>
      <filingManager>
        <name>DAILY JOURNAL CORP</name>
        <address>
          <com:street1>915 EAST FIRST STREET</com:street1>
          <com:city>LOS ANGELES</com:city>
          <com:stateOrCountry>CA</com:stateOrCountry>
          <com:zipCode>90012</com:zipCode>
        </address>
      </filingManager>
      <reportType>13F HOLDINGS REPORT</reportType>
      <form13FFileNumber>028-15782</form13FFileNumber>
      <provideInfoForInstruction5>N</provideInfoForInstruction5>
    </coverPage>
    <signatureBlock>
      <name>Gerald L. Salzman</name>
      <title>Chief Executive Officer, President, CFO, Treasurer</title>
      <phone>213-229-5300</phone>
      <signature>/s/ Gerald L. Salzman</signature>
      <city>Los Angeles</city>
      <stateOrCountry>CA</stateOrCountry>
      <signatureDate>10-05-2018</signatureDate>
    </signatureBlock>
    <summaryPage>
      <otherIncludedManagersCount>0</otherIncludedManagersCount>
      <tableEntryTotal>4</tableEntryTotal>
      <tableValueTotal>159459</tableValueTotal>
      <isConfidentialOmitted>false</isConfidentialOmitted>
    </summaryPage>
  </formData>
</edgarSubmission>
</XML>
</TEXT>
</DOCUMENT>
<DOCUMENT>
<TYPE>INFORMATION TABLE
<SEQUENCE>2
<FILENAME>rdgit100518.xml
<TEXT>
<XML>
<?xml version="1.0" encoding="us-ascii"?>
<informationTable xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sec.gov/edgar/document/thirteenf/informationtable">
<infoTable>
<nameOfIssuer>Bank of
  America Corp</nameOfIssuer>
<titleOfClass>Common Stock</titleOfClass>
<cusip>060505104</cusip>
<value>67758</value>
<shrsOrPrnAmt>
<sshPrnamt>2300000</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>2300000</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
<infoTable>
<nameOfIssuer>Posco ADR</nameOfIssuer>
<titleOfClass>Sponsored ADR</titleOfClass>
<cusip>693483109</cusip>
<value>643</value>
<shrsOrPrnAmt>
<sshPrnamt>9745</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>9745</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
<infoTable>
<nameOfIssuer>US Bancorp</nameOfIssuer>
<titleOfClass>Common Stock</titleOfClass>
<cusip>902973304</cusip>
<value>7393</value>
<shrsOrPrnAmt>
<sshPrnamt>140000</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>140000</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
<infoTable>
<nameOfIssuer>Wells Fargo &amp;amp; Co</nameOfIssuer>
<titleOfClass>Common Stock</titleOfClass>
<cusip>949746101</cusip>
<value>83665</value>
<shrsOrPrnAmt>
<sshPrnamt>1591800</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>1591800</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
</informationTable>
</XML>
</TEXT>
</DOCUMENT>
</SEC-DOCUMENT>

2 个答案:

答案 0 :(得分:2)

假设您的文档中可能有多个个匹配<nameOfIssuer>的标签,并且您想匹配所有标签,那么我们可以尝试将grepexpr与{{1} }:

regmatches

答案 1 :(得分:0)

使用Tim的解决方案以及粘贴粘贴的折叠选项,程序可以正常工作。代码如下:

Access-Control-Allow-Origin: *