Need to add a search to static HTML site

22,715

Solution 1

There are quite a few solutions available for this. In no particular order:

Free or Open Source

  1. Google Custom Search Engine
  2. Tapir - hosted service that indexes pages on your RSS feed.
  3. Tipue - self hosted javaScript plugin, well documented, includes options for pinned search results.
  4. lunr.js - javaScript library.
  5. phinde - self hosted php and elasticsearch based search engine

See also http://indieweb.org/search#Software

Subscription (aka paid) Services:

  1. Google Site Search
  2. Swiftype - offers a free plan for personal sites/blogs.
  3. Algolia
  4. Amazon Cloud Search

Solution 2

A very, very lazy option (to avoid setting up a Google Custom Search Engine) is to make a form that points at Google with a hidden query element that limits the search to your own site:

<div id="contentsearch">
  <form id="searchForm" name="searchForm" action="http://google.com/search">
    <input name="q" type="text" value="search" maxlength="200" />
    <input name="q" type="hidden" value="site:mysite.com"/>
    <input name="submit" type="submit" value="Search" />
  </form>
</div>

Aside from the laziness, this method gives you a bit more control over the appearance of the search form, compared to a CSE.

Solution 3

I was searching for solution for searching for my blog created using Jekyll but didn't found good one, also Custom Google Search was giving me ads and results from subdomains, so it was not good. So I've created my own solution for this. I've written an article about how to create search for static site like Jekyll it's in Polish and translated using google translate.

Probably will create better manual translation or rewrite on my English blog soon.

The solution is python script that create SQLite database from HTML files and small PHP script that show search results. But it will require that your static site hosting also support PHP.

Just in case the article go down, here is the code, it's created just for my blog (my html and file structure) so it need to be tweaked to work with your blog.

Python script:

import os, sys, re, sqlite3
from bs4 import BeautifulSoup
def get_data(html):
    """return dictionary with title url and content of the blog post"""
    tree = BeautifulSoup(html, 'html5lib')
    body = tree.body
    if body is None:
        return None
    for tag in body.select('script'):
        tag.decompose()
    for tag in body.select('style'):
        tag.decompose()
    for tag in body.select('figure'): # ignore code snippets
        tag.decompose()
    text = tree.findAll("div", {"class": "body"})
    if len(text) > 0:
      text = text[0].get_text(separator='\n')
    else:
      text = None
    title = tree.findAll("h2", {"itemprop" : "title"}) # my h2 havee this attr
    url = tree.findAll("link", {"rel": "canonical"}) # get url
    if len(title) > 0:
      title = title[0].get_text()
    else:
      title = None
    if len(url) > 0:
      url = url[0]['href']
    else:
      url = None
    result = {
      "title": title,
      "url": url,
      "text": text
    }
    return result

if __name__ == '__main__':
  if len(sys.argv) == 2:
    db_file = 'index.db'
    # usunięcie starego pliku
    if os.path.exists(db_file):
      os.remove(db_file)
    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute('CREATE TABLE page(title text, url text, content text)')
    for root, dirs, files in os.walk(sys.argv[1]):
      for name in files:
        # my files are in 20.* directories (eg. 2018) [/\\] is for windows and unix
        if name.endswith(".html") and re.search(r"[/\\]20[0-9]{2}", root):
          fname = os.path.join(root, name)
          f = open(fname, "r")
          data = get_data(f.read())
          f.close()
          if data is not None:
            data = (data['title'], data['url'], data['text']
            c.execute('INSERT INTO page VALUES(?, ?, ?)', data))
            print "indexed %s" % data['url']
            sys.stdout.flush()
    conn.commit()
    conn.close()

and PHP search script:

function mark($query, $str) {
    return preg_replace("%(" . $query . ")%i", '<mark>$1</mark>', $str);
}
if (isset($_GET['q'])) {
  $db = new PDO('sqlite:index.db');
  $stmt = $db->prepare('SELECT * FROM page WHERE content LIKE :var OR title LIKE :var');
  $wildcarded = '%'. $_GET['q'] .'%';
  $stmt->bindParam(':var', $wildcarded);
  $stmt->execute();
  $data = $stmt->fetchAll(PDO::FETCH_ASSOC);
  $query = str_replace("%", "\\%", preg_quote($_GET['q']));
  $re = "%(?>\S+\s*){0,10}(" . $query . ")\s*(?>\S+\s*){0,10}%i";
  if (count($data) == 0) {
    echo "<p>Brak wyników</p>";
  } else {
    foreach ($data as $row) {
      if (preg_match($re, $row['content'], $match)) {
        echo '<h3><a href="' . $row['url'] . '">' . mark($query, $row['title']) . '</a></h2>';
        $text = trim($match[0], " \t\n\r\0\x0B,.{}()-");
        echo '<p>' . mark($query, $text) . '</p>';
      }
    }
  }
}

In my code an in article I've wrapped this PHP script in the same layout as other pages by adding front matter to PHP file.

If you can't use PHP on your hosting you can try to use sql.js which is SQLite compiled to JS with Emscripten.

Solution 4

If your site is well index by Google a quick and ready solution is use Google CSE.

Other than that for a static website with hard coded html pages and directory containing images; yes it is possible to create search mechanism. But trust me it is more hectic and resource consuming then creating a dynamic website.

Using PHP to search in directories and within files will be very inefficient. Instead of providing complicated PHP workarounds I would suggest go for a dynamic CMS driven website.

Share:
22,715
user3839812
Author by

user3839812

Updated on April 28, 2021

Comments

  • user3839812
    user3839812 about 3 years

    Basically I've got an old static html site ( http://www.brownwatson.co.uk/brochure/page1.html ) I need to add a search box to it to search a folder called /brochure within that folder is html documents and images etc I need the search to find ISBN numbers, Book Reference Numbers, Titles etc.. There's no database the hosting provider has got php I was trying to create something like this:

    <div id="contentsearch">
             <form id="searchForm" name="searchForm" method="post" action="search.php">
               <input name="search" type="text" value="search" maxlength="200" />
               <input name="submit" type="submit" value="Search" />
               </form>
             <?php
    $dir = "/brochure/";
    
    // Open a known directory, and proceed to read its contents
    if (is_dir($dir)) {
    if ($dh = opendir($dir)) {
        while (($file = readdir($dh)) !== false) {
            if($file == $_POST['search']){
                echo('<a href="'.$dir . $file.'">'. $file .'</a>'."\n");
            }
        }
        closedir($dh);
    }
    }
    ?>
           </div>
    

    I know, I know this is pretty bad and doesn't work any ideas? I haven't created anything like this in years, and have pretty much just taken bits of code and stuck it together!

    • Paul Dessert
      Paul Dessert almost 10 years
      What exactly doesn't work? Any errors?
    • Darren
      Darren almost 10 years
      If you are looking for exact matching, I'd grab all the files using either glob()/scandir()/DirectoryItterator and then check it with something like in_array(). Alternatively you could use similar_text() to match the strings :-)
    • user3839812
      user3839812 almost 10 years
      Many thanks for you reply's its not showing any results, I'm probably doing something really stupid see link: brownwatson.co.uk/search.php
    • davidcondrey
      davidcondrey almost 10 years
      Here's a regex expression I found that searches for ISBN10 and ISBN13 nums, maybe you'll find use for it: Expression ISBN(-1(?:(0)|3))?:?\x20(\s)*[0-9]+[- ][0-9]+[- ][0-9]+[- ][0-9]*[- ]*[xX0-9]
  • user3839812
    user3839812 almost 10 years
    Thanks, thought as much I've just used the google search option for now. Probably will end up updating to a CMS system in the future.
  • Shehroz Ahmed
    Shehroz Ahmed almost 5 years
    I wonder why this answer has no up votes? This is the best answer!! Although, I had to spend some time to understand it and fix the python code to make it work, but it worked well for me. Thanks!
  • jcubic
    jcubic almost 5 years
    @ShehrozAhmed because it was added years later than other answers.
  • Stefan
    Stefan about 4 years
    Not sure what people mean by "static site". Static sites may use clientside JavaScript but no serverside PHP. The lack of serverside programming features limit search implementations to clientside (limited) or service-based (often expensive) solutions.
  • jcubic
    jcubic about 4 years
    @Stefan if the static site is on shared hosting, there is no problem to have static site side by side with PHP or other server side script, I have this on my Jekyll blog. It depends where this static site is deployed, for instance I think (not sure) that you can't add server side script to Netlify.
  • Stefan
    Stefan about 4 years
    Yes @jcubic. You are right. Their are dynamic sites with static content/parts, which do not need (PHP) processing and are served directly from a datastore/filesystem. What you write last, was my concern. You can deploy "pure static sites" virtually anywhere. Their is no/little requirements on the Web-/App-Server, regarding configuration and CGI/Java/PHP/... support. So if one hears "static site generator", she immediately knows where/how the sites can be deployed. No PHP, no database, little configuration, easy caching, fast... So it makes sense to distinguish between static and dynamic sites.
  • jcubic
    jcubic about 4 years
    @Stefan the solution use (PHP) so it should be obvious that one can't use this on platform that don't have any server site handling. But it's useful because there are lot of people that use simple shared hostings and those have PHP or some even have Node.js or Python (like mine) so this will work for them. Also static sites is a way to generate the files (as name static site generator suggest), it have nothing or almost nothing, to do with hosting.
  • F. Müller
    F. Müller over 3 years
    The Tipue link is dead (404). Also, it seems that website owner is not the same anymore.
  • jcubic
    jcubic about 3 years
    @Stefan just FYI I've added another solution, it's a not fully working one (whole code), but you can use sql.js which is SQLite compiled to webassembly to read SQL database and provide search on the client. I planning on creating something like this for documentation of my Open Source project.
  • Stefan
    Stefan about 3 years
    Thanks for the update @jcubic. Just curious. Why didn't You choose one of the solutions proposed below? I'm developing a Gatsby jamstack site hosted on Netlify. At the moment, we do "search by service". We will evaluate JavaScript based solutions in some weeks.