PHP:跳过/删除以###开头的行

时间:2018-04-12 21:37:21

标签: php file

我需要能够从以###开头的文件中跳过前20行。 (实际上18行以###开头,两行以#;#开头。)。

我尝试过的所有内容都没有跳过相同的两行 - 我不知道为什么。

这是我尝试过的(这只是我的代码的相关部分):

elseif($sourceformat == "Babylon") {

    $line = fgets($source_file);
    if($line[0] === '#') {
        continue;
    }
    if(strpos(trim($line), '#') === 0) {
        continue;
    }
    if(substr($line, 0, 1) == "#") {
        continue;
    }

    $source = trim(fgets($source_file));

    if(empty($source)) {
        continue;
    }

    $target = trim(fgets($source_file));
}
// then I proceed to writing the extracted terms into a new file that has a different format.

我已经分别尝试了上述三种方法,并且一起尝试了$line[0] === '#'strpos(trim($line), '#')以及substr($line, 0, 1) == '#') - 但总是跳过相同的行(未检测到)。这就是整个标题部分看起来像(这是巴比伦词汇表文件的标题(.gls - 但纯文本)。

### Glossary title:Cheeseus Muzik
### Author:Cheeseus
### Description:English - Bulgarian and Bulgarian - English glossary of musical terms
### Source language:Bulgarian
### Source alphabet:Cyrillic
### Target language:Bulgarian
### Target alphabet:Cyrillic
### Icon:
### Browsing enabled?Yes
### Type of glossary:00000000
### Case sensitive words?0
; DO NOT EDIT THE NEXT **SIX** LINES  - Babylon-Builder generated text !!!!!!
### Glossary id:0265922f91878d6e846e9c869d8a89447c6e719e8585886b8692955f91887a9b8474859a85616a279a929ca07f6881507056895d6881304b5142515f42ba6c992e2b23828188719469656840908429504d595b486965418931312d5b47ad7843525650833a233a47514270695543449f31373b7179484e435a8c428827
### Confirmation string:8A148GOK
### File build number:0121DA07
### Build:80"0)2"0
### Glossary settings:00000000
### Gls type:00000001
; DO NOT EDIT THE PREVIOUS **SIX** LINES  - Babylon-Builder generated text !!!!!!

### Glossary section:

a piacere
а пиачере, по желание

a tempo
а темпо, завръщане към основното темпо след отклонение

ad libitum
ат либитум, свободно, по желание

adagio
адажио (бавно)

allargando
аларгандо, забавяне

allegretto
алегрето, весело, бързичко

allegro
алегро, бързо, весело

allentando
алентандо, със забавяне

... (this is the actual glossary – source term on one line, target term on the next, followed by an empty line, then again source term, target term, new line. I only want these lines, while discarding (omitting, removing) the glossary header lines above. The code I have successfully removes all lines starting with # but this one below (the glossary ID), and it also removes the two lines starting with a semi-colon.

这是我似乎无法摆脱的界限:

### Glossary id:0265922f91878d6e846e9c869d8a89447c6e719e8585886b8692955f91887a9b8474859a85616a279a929ca07f6881507056895d6881304b5142515f42ba6c992e2b23828188719469656840908429504d595b486965418931312d5b47ad7843525650833a233a47514270695543449f31373b7179484e435a8c428827

我怀疑这是因为这条线很长(或者可能因为上一行以分号开头?)。我试过指定最大值。在fgets中读取的每一行的字节长度:

$line = fgets($source_file, 8192);

但这也不起作用。希望你能帮忙。

整个代码太长了,不能放在这里,它已经工作得很好 - 除了摆脱这一行。

解决方案(基于@Mehdi Bounya的回答)

我似乎没有在正确的位置执行我已经到位的检查。这是完全符合我需要的代码:

elseif($sourceformat == "Babylon") {

    if($targetformat == "Wordfast") {
        $converted_source_target_delimiter = "\t";
        $converted_term_delimiter = "\r\n";
    }

    $source = trim(fgets($source_file));

    if(empty($source)) {
        continue;
    }
    if($source[0] === '#') {
        continue;
    }
    if($source[0] === ';') {
        continue;
    }

    $target = trim(fgets($source_file));
}
$exported_entry = $source.$converted_source_target_delimiter.$target.$converted_term_delimiter;

感谢所有提供帮助的人!

2 个答案:

答案 0 :(得分:3)

您可以使用fopen打开文件并循环显示这些行,然后只需检查该行是否以您想要的字符开头。

此函数有两个参数,$file是文件路径,$startWith是要跳过的字符数组:

function skipLines($file, $startWith = NULL){
    $handle = fopen($file, "r");
    if ($handle) {
        while (($buffer = fgets($handle)) !== false) {
            if(in_array($buffer[0], $startWith)){
                // Your code if line starts with $startWith
            } else {
                // Your code if line does not start with $startWith
                echo $buffer;
            }
        }
        fclose($handle);
    }
}

skipLines("sample.txt", ['#']); // Result 1


skipLines("sample.txt", [';']); // Result 2


skipLines("sample.txt", ['#', ';']); // Result 3

结果1:

; DO NOT EDIT THE NEXT **SIX** LINES  - Babylon-Builder generated text !!!!!!
; DO NOT EDIT THE PREVIOUS **SIX** LINES  - Babylon-Builder generated text !!!!!!

结果2:

### Glossary title:Cheeseus Muzik
### Author:Cheeseus
### Description:English - Bulgarian and Bulgarian - English glossary of musical terms
### Source language:Bulgarian
### Source alphabet:Cyrillic
### Target language:Bulgarian
### Target alphabet:Cyrillic
### Icon:
### Browsing enabled?Yes
### Type of glossary:00000000
### Case sensitive words?0
### Glossary id:0265922f91878d6e846e9c869d8a89447c6e719e8585886b8692955f91887a9b8474859a85616a279a929ca07f6881507056895d6881304b5142515f42ba6c992e2b23828188719469656840908429504d595b486965418931312d5b47ad7843525650833a233a47514270695543449f31373b7179484e435a8c428827
### Confirmation string:8A148GOK
### File build number:0121DA07
### Build:80"0)2"0
### Glossary settings:00000000
### Gls type:00000001

### Glossary section:

结果3:

// Nothing...

答案 1 :(得分:1)

与@Mehdi Bounya上面的答案类似,此代码会将所有非“#”行存储到数组中。比较也是使用substr作为替代。

$correct_lines = [];

$handle = fopen("logs.txt", "r");
if ($handle) {
    while (($line = fgets($handle)) !== false) {
        if (substr($line, 0, 1) !== "#") {
            array_push($correct_lines, $line);
        }
    }

    fclose($handle);
} else {
    echo "Error opening the file";
}

foreach ($correct_lines as $line) {
    echo $line;

    // ; DO NOT EDIT THE NEXT **SIX** LINES  - Babylon-Builder generated text !!!!!!
    // ; DO NOT EDIT THE PREVIOUS **SIX** LINES  - Babylon-Builder generated text !!!!!!
}
相关问题