Question

基本上我有一个旧的静态html网站（http://www.brownwatson.co.uk/brochure/page1.html）我需要在其中添加一个搜索框来搜索该文件夹中名为/ brochure的文件夹是html文档和图像等我需要搜索以查找ISBN号，书籍参考号，标题等。主机提供商已经没有数据库，我试图创建这样的东西：

<div id="contentsearch">
         <form id="searchForm" name="searchForm" method="post" action="search.php">
           <input name="search" type="text" value="search" maxlength="200" />
           <input name="submit" type="submit" value="Search" />
           </form>
         <?php
$dir = "/brochure/";

// Open a known directory, and proceed to read its contents
if (is_dir($dir)) {
if ($dh = opendir($dir)) {
    while (($file = readdir($dh)) !== false) {
        if($file == $_POST['search']){
            echo('<a href="'.$dir . $file.'">'. $file .'</a>'."\n");
        }
    }
    closedir($dh);
}
}
?>
       </div>

我知道，我知道这很糟糕，并没有任何想法吗？多年来我还没有创造出这样的东西，而且几乎只需要一些代码并将它们粘在一起！

Answer 1

有很多可用的解决方案。没有特别的顺序：

免费或开源

Google Custom Search Engine
Tapir - 托管服务，用于索引RSS源上的网页。
Tipue - 自我托管的javaScript插件，文档齐全，包含固定搜索结果的选项。
lunr.js - javaScript库。
phinde - 自托管php和基于弹性搜索的搜索引擎

另见http://indieweb.org/search#Software

订阅（又名付费）服务：

Google Site Search
Swiftype - 为个人网站/博客提供免费套餐。
Algolia
Amazon Cloud Search

Answer 2

一个非常非常懒的选项（以避免设置Google Custom Search Engine）是使表单指向Google，并使用隐藏的查询元素将搜索限制在您自己的网站上：

<div id="contentsearch">
  <form id="searchForm" name="searchForm" action="http://google.com/search">
    <input name="q" type="text" value="search" maxlength="200" />
    <input name="q" type="hidden" value="site:mysite.com"/>
    <input name="submit" type="submit" value="Search" />
  </form>
</div>

除了懒惰之外，与CSE相比，此方法可以让您更好地控制搜索表单的外观。

Answer 3

如果您的网站具有良好的Google索引，则可以使用Google CSE快速准备好的解决方案。

除了具有硬编码的html页面和包含图像的目录的静态网站之外;是的，有可能创建搜索机制。但是相信我，创建一个充满活力的网站会更加繁忙和耗费资源。

使用PHP在目录和文件中搜索效率非常低。我建议使用动态CMS驱动的网站，而不是提供复杂的PHP解决方法。

Answer 4

我正在寻找一种解决方案，以搜索使用Jekyll创建的博客，但没有找到好的博客，而Custom Google Search也向我提供了来自子域的广告和结果，因此效果不佳。因此，我为此创建了自己的解决方案。我写了一篇有关how to create search for static site like Jekyll的波兰文文章，并使用Google翻译进行了翻译。

可能很快会在我的英文博客上创建更好的手动翻译或重写。

解决方案是从HTML文件创建SQLite数据库的python脚本和显示搜索结果的小型PHP脚本。但这将要求您的静态站点托管还必须支持PHP。

以防万一，如果这篇文章不好，这里是代码，它是专为我的博客（我的html和文件结构）创建的，因此需要进行调整以使用您的博客。

Python脚本：

import os, sys, re, sqlite3
from bs4 import BeautifulSoup
def get_data(html):
    """return dictionary with title url and content of the blog post"""
    tree = BeautifulSoup(html, 'html5lib')
    body = tree.body
    if body is None:
        return None
    for tag in body.select('script'):
        tag.decompose()
    for tag in body.select('style'):
        tag.decompose()
    for tag in body.select('figure'): # ignore code snippets
        tag.decompose()
    text = tree.findAll("div", {"class": "body"})
    if len(text) > 0:
      text = text[0].get_text(separator='\n')
    else:
      text = None
    title = tree.findAll("h2", {"itemprop" : "title"}) # my h2 havee this attr
    url = tree.findAll("link", {"rel": "canonical"}) # get url
    if len(title) > 0:
      title = title[0].get_text()
    else:
      title = None
    if len(url) > 0:
      url = url[0]['href']
    else:
      url = None
    result = {
      "title": title,
      "url": url,
      "text": text
    }
    return result

if __name__ == '__main__':
  if len(sys.argv) == 2:
    db_file = 'index.db'
    # usunięcie starego pliku
    if os.path.exists(db_file):
      os.remove(db_file)
    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute('CREATE TABLE page(title text, url text, content text)')
    for root, dirs, files in os.walk(sys.argv[1]):
      for name in files:
        # my files are in 20.* directories (eg. 2018) [/\\] is for windows and unix
        if name.endswith(".html") and re.search(r"[/\\]20[0-9]{2}", root):
          fname = os.path.join(root, name)
          f = open(fname, "r")
          data = get_data(f.read())
          f.close()
          if data is not None:
            data = (data['title'], data['url'], data['text']
            c.execute('INSERT INTO page VALUES(?, ?, ?)', data))
            print "indexed %s" % data['url']
            sys.stdout.flush()
    conn.commit()
    conn.close()

和PHP搜索脚本：

function mark($query, $str) {
    return preg_replace("%(" . $query . ")%i", '<mark>$1</mark>', $str);
}
if (isset($_GET['q'])) {
  $db = new PDO('sqlite:index.db');
  $stmt = $db->prepare('SELECT * FROM page WHERE content LIKE :var OR title LIKE :var');
  $wildcarded = '%'. $_GET['q'] .'%';
  $stmt->bindParam(':var', $wildcarded);
  $stmt->execute();
  $data = $stmt->fetchAll(PDO::FETCH_ASSOC);
  $query = str_replace("%", "\\%", preg_quote($_GET['q']));
  $re = "%(?>\S+\s*){0,10}(" . $query . ")\s*(?>\S+\s*){0,10}%i";
  if (count($data) == 0) {
    echo "<p>Brak wyników</p>";
  } else {
    foreach ($data as $row) {
      if (preg_match($re, $row['content'], $match)) {
        echo '<h3><a href="' . $row['url'] . '">' . mark($query, $row['title']) . '</a></h2>';
        $text = trim($match[0], " \t\n\r\0\x0B,.{}()-");
        echo '<p>' . mark($query, $text) . '</p>';
      }
    }
  }
}

在我的代码中，我通过在PHP文件中添加前件，将此PHP脚本包装为与其他页面相同的布局。

如果您不能在主机上使用PHP，则可以尝试使用sql.js which is SQLite compiled to JS with Emscripten。这是example how to use ajax to load a file。

需要向静态HTML网站添加搜索

4 个答案: