用PHP解析Wiki标记

时间:2012-10-17 01:00:44

标签: php regex parsing

我有一个带有Wiki标记的文本文件。例如:

[[April]]

April is the fourth month of the year. It has 30 days. The name April comes from that Latin word aperire which means "to open". This probably refers to growing plants in spring. April begins on the same day of week as July in all years and also January in leap years.

April's flower is the Sweet Pea. Its birthstone is the diamond. The meaning of the diamond is innocence.

== April in poetry ==

Poets use April to mean the end of winter. For example: April showers bring May flowers.

== Events in April ==

[[August]]

August is the eighth month of the year in the Gregorian calendar, coming between July and September. It has 31 days, the same number of days as the previous month, July, and is named after Roman Emperor Augustus Caesar.

== The Month ==

This month was first called Sextilis in Latin, because it was the sixth month in the old Roman calendar. The Roman calendar began in March about 735 BC with Romulus. October was the eighth month. August was the eighth month when January or February were added to the start of the year by King Numa Pompilius about 700 BC. Or, when those two months were moved from the end to the beginning of the year by the decemvirs about 450 BC (Roman writers disagree). In 153 BC January 1 was determined as the beginning of the year.

August is named for Augustus Caesar who became Roman consul in this month.  The month has 31 days because Julius Caesar added two days when he created the Julian calendar in 45 BC. August is after July and before September.

August, in either hemisphere, is the seasonal equivalent of February in the other. In the Northern hemisphere it is a summer month and it is a winter month in the Southern hemisphere. In a common year, no other month begins on the same day of the week as August, though in leap years, February starts on the same day as August. August always ends on the same day of the week as November.

August's flower is the Gladiolus with the birthstone being peridot. The astrological signs for August are Leo (July 24 - August 22) and Virgo (August 23 - September 23).

== August observances ==

=== Fixed observances and events ===

=== Moveable and Monthlong events ===

== Selection of Historical Events ==

== References ==
4月和8月都是维基文章。我设法用以下内容来标题:

$fh = fopen("wiki2.txt", "r");
if ($fh) {
    while (($line = fgets($fh)) !== false) {
        preg_match_all('#\\[\\[(.*?)\\]\\]#',$line,$matches,PREG_SET_ORDER);
        foreach($matches as $m) {
            echo $m[0]."<br />";
        }
    }
    fclose($fh);
}

但是,我希望能够在文章中拉出文本。有没有人对我能做什么(正则表达式或其他解决方案)提取文章数据?

谢谢!

1 个答案:

答案 0 :(得分:1)

我认为你是在思考这个问题(另外,wiki标记不再是正则表达式,而不是HTML。)

为什么不这样做:

$HeaderNumber = 0;
$Document[$HeaderNumber]['Title'] = "Default";
while (($line = fgets($fh)) !== false) {
        if (strpos('[[', $line) > -1 && strpos(']]', $line) > -1){
            $Document[$HeaderNumber]['Text'] = implode($Document[$HeaderNumber]['Lines'], "\n");
            unset($Document[$HeaderNumber]['Lines']);
            $HeaderNumber++;
            $line = str_replace(array("[[","]]"), "", $line);
            $Document[$HeaderNumber]['Title'] = $line;
            continue;
        }

        $Document[$HeaderNumber]['Lines'][] = $line;

    }
}

这将创建一个数字索引的数组,每个数组都有一个Title和一个Text字段,其中包含您对该名称的期望。您可以使用pear库中的the Text_Wiki module将文本进一步处理为HTML。