mikey.bike . archive . other writings . lists . about
This site runs on the Hakyll static site generator which has
so far been great with one exception: it lacks a system for asset
hashing. The problem is that if I make a change to a css file, if
visitors have cached that asset, they won’t see the change if the file
hasn’t been renamed. So I found myself manually renaming files like a
heathen – base-5.css to base-6.css, etc. – as well as updating
their references in my templates.
Ideally, this process is automated for me, and instead of incrementing a number, a hash of the file contents is generated every time I make a css tweak. Luckily, we are programmers! And we can fix this with the advanced technique of googling “hakyll asset hashing” and copying from the first result.
Here’s what I ended up doing, using this thread as a starting point.
The core hash function looks like this:
{-# LANGUAGE BangPatterns #-}
import qualified Crypto.Hash.SHA256            as SHA256
import qualified Data.ByteString               as BS
import qualified Data.ByteString.Base16        as Base16
import qualified Data.ByteString.Char8         as BS8
hash :: FilePath -> IO String
hash path = do
  !h <- SHA256.hash <$> BS.readFile path
  pure $! BS8.unpack $! Base16.encode hNow, I’ll be honest, I don’t really understand what I’m doing with
regard to strictness here; but a) it seems reasonable to avoid
unevaluated thunks of files floating around and b) the thread above
sprinkled in ! and $! so I did too. Oddly, though, they used
ByteString.Lazy – I don’t really know why and I switched it to the
strict bytestrings. (Let me know if you know why that’s dumb.)
Laziness aside, hash gets a digest of a bytestring from a file,
using the cryptohash-256 library, and then encodes that in
base64. Easy!
The next idea from that thread is to build a map, keyed from Hakyll
Identifiers (which represent items to compile, such as css files)
to their hashes:
import qualified Data.Map                      as Map
import           Data.Map                       ( Map )
import           Hakyll                         ( Identifier
                                                , fromFilePath
                                                , getRecursiveContents
                                                )
type FileHashes = Map Identifier String
mkFileHashes :: FilePath -> IO FileHashes
mkFileHashes dir = do
  allFiles <- getRecursiveContents (\_ -> pure False) dir
  fmap Map.fromList $ forM allFiles $ \innerPath -> do
    let fullPath = dir </> innerPath
    !h <- hash fullPath
    pure (fromFilePath fullPath, h)getRecursiveContents is a nice Hakyll helper function that just
walks a directory. We can point it at say an assets/ directory and
run the hash function for each file.
Given this mapping of identifiers to their hashes, we can produce a route that interpolates that hash:
import           Hakyll                         ( Routes
                                                , customRoute
                                                , fromFilePath
                                                , toFilePath
                                                )
import           System.FilePath                ( (</>) )
import           System.FilePath.Posix          ( takeBaseName
                                                , takeDirectory
                                                , takeExtension
                                                )
assetHashRoute :: FileHashes -> Routes
assetHashRoute fileHashes = customRoute $ \identifier ->
  let maybeHash = Map.lookup identifier fileHashes
      path      = toFilePath identifier
  in  maybe path (addHashToUrl path) maybeHash
addHashToUrl :: FilePath -> String -> String
addHashToUrl path hash =
  let baseName  = takeBaseName path
      extension = takeExtension path
      dir       = takeDirectory path
  in  dir </> baseName <> "-" <> hash <> extensionThe basic idea is that we can use assetHashRoute when producing
routes for some asset directory, and Hakyll will rewrite their routes
with the addHashToUrl function.
Here’s how I do it with my various asset directories:
import           Hakyll
-- Top-level site generation function.
site :: IO ()
site = do
  hakyll $ do
    -- Generate the mappings
    imageHashes <- preprocess (mkFileHashes "images")
    cssHashes   <- preprocess (mkFileHashes "stylesheets")
    jsHashes    <- preprocess (mkFileHashes "js")
    match "images/**/*" $ do
      route $ assetHashRoute imageHashes
      compile copyFileCompiler
    match "js/**/*" $ do
      route idRoute
      compile copyFileCompiler
    match "stylesheets/*.css" $ do
      route $ assetHashRoute cssHashes
      compile copyFileCompilerSo, we’ve moved assets to the correct paths, but how do we link to
them? Once again the thread suggested an implementation, which feels a
little hacky to me, but I’m already in too deep and don’t have any
better ideas. Basically, we can run a url-rewriting function over the
content of our html files, so that if it finds a link to base.css it
rewrites as base-[HASH].css. Hakyll gives us a helper functions to
make this a lot easier, withUrls.
import           Hakyll                         ( Compiler
                                                , Item(itemIdentifier)
                                                , fromFilePath
                                                , getRoute
                                                , withUrls
                                                )
rewriteAssetUrls :: FileHashes -> Item String -> Compiler (Item String)
rewriteAssetUrls hashes item = do
  route <- getRoute $ itemIdentifier item
  pure $ case route of
    Nothing -> item
    Just r  -> fmap (rewriteAssetUrls' hashes) item
rewriteAssetUrls' :: FileHashes -> String -> String
rewriteAssetUrls' hashes = withUrls rewrite
  where rewrite url = maybe url
                            (addHashToUrl url)
                            (lookupHashForUrl hashes url)
lookupHashForUrl :: FileHashes -> String -> Maybe String
lookupHashForUrl hashes url = 
  let urlWithoutRootSlash = dropWhile (== '/') url
  in Map.lookup (fromFilePath urlWithoutRootSlash) hashesrewriteAssetUrls delegates to rewriteAssetUrls' which takes in a
string of HTML, and produces a rewritten string given our FileHashes
map. We filter out any Hakyll item that doesn’t have a route because
– well, because that’s how it was implemented in the thread, and I
assume that’s to avoid “ephemeral” items like templates.
We can add this function to our compilers like so:
-- Combine all asset hash maps that we generated above
let assetHashes = imageHashes <> cssHashes <> jsHashes
match "index.html" $ do
  route idRoute
  compile
    $   getResourceBody
    >>= loadAndApplyTemplate "templates/layout.html" defaultContext
    >>= rewriteAssetUrls assetHashesAnd so on.
Now, if you view source on this page (if you dare), you’ll see some lovely hashed asset URLs. Just lovely. Was SHA256 overkill? Probably.
If you want to see the actual implementation, it lives here.