mikey.bike . archive . other writings . lists . about
This site runs on the Hakyll static site generator which has
so far been great with one exception: it lacks a system for asset
hashing. The problem is that if I make a change to a css file, if
visitors have cached that asset, they won’t see the change if the file
hasn’t been renamed. So I found myself manually renaming files like a
heathen – base-5.css to base-6.css, etc. – as well as updating
their references in my templates.
Ideally, this process is automated for me, and instead of incrementing a number, a hash of the file contents is generated every time I make a css tweak. Luckily, we are programmers! And we can fix this with the advanced technique of googling “hakyll asset hashing” and copying from the first result.
Here’s what I ended up doing, using this thread as a starting point.
The core hash function looks like this:
{-# LANGUAGE BangPatterns #-}
import qualified Crypto.Hash.SHA256 as SHA256
import qualified Data.ByteString as BS
import qualified Data.ByteString.Base16 as Base16
import qualified Data.ByteString.Char8 as BS8
hash :: FilePath -> IO String
hash path = do
!h <- SHA256.hash <$> BS.readFile path
pure $! BS8.unpack $! Base16.encode hNow, I’ll be honest, I don’t really understand what I’m doing with
regard to strictness here; but a) it seems reasonable to avoid
unevaluated thunks of files floating around and b) the thread above
sprinkled in ! and $! so I did too. Oddly, though, they used
ByteString.Lazy – I don’t really know why and I switched it to the
strict bytestrings. (Let me know if you know why that’s dumb.)
Laziness aside, hash gets a digest of a bytestring from a file,
using the cryptohash-256 library, and then encodes that in
base64. Easy!
The next idea from that thread is to build a map, keyed from Hakyll
Identifiers (which represent items to compile, such as css files)
to their hashes:
import qualified Data.Map as Map
import Data.Map ( Map )
import Hakyll ( Identifier
, fromFilePath
, getRecursiveContents
)
type FileHashes = Map Identifier String
mkFileHashes :: FilePath -> IO FileHashes
mkFileHashes dir = do
allFiles <- getRecursiveContents (\_ -> pure False) dir
fmap Map.fromList $ forM allFiles $ \innerPath -> do
let fullPath = dir </> innerPath
!h <- hash fullPath
pure (fromFilePath fullPath, h)getRecursiveContents is a nice Hakyll helper function that just
walks a directory. We can point it at say an assets/ directory and
run the hash function for each file.
Given this mapping of identifiers to their hashes, we can produce a route that interpolates that hash:
import Hakyll ( Routes
, customRoute
, fromFilePath
, toFilePath
)
import System.FilePath ( (</>) )
import System.FilePath.Posix ( takeBaseName
, takeDirectory
, takeExtension
)
assetHashRoute :: FileHashes -> Routes
assetHashRoute fileHashes = customRoute $ \identifier ->
let maybeHash = Map.lookup identifier fileHashes
path = toFilePath identifier
in maybe path (addHashToUrl path) maybeHash
addHashToUrl :: FilePath -> String -> String
addHashToUrl path hash =
let baseName = takeBaseName path
extension = takeExtension path
dir = takeDirectory path
in dir </> baseName <> "-" <> hash <> extensionThe basic idea is that we can use assetHashRoute when producing
routes for some asset directory, and Hakyll will rewrite their routes
with the addHashToUrl function.
Here’s how I do it with my various asset directories:
import Hakyll
-- Top-level site generation function.
site :: IO ()
site = do
hakyll $ do
-- Generate the mappings
imageHashes <- preprocess (mkFileHashes "images")
cssHashes <- preprocess (mkFileHashes "stylesheets")
jsHashes <- preprocess (mkFileHashes "js")
match "images/**/*" $ do
route $ assetHashRoute imageHashes
compile copyFileCompiler
match "js/**/*" $ do
route idRoute
compile copyFileCompiler
match "stylesheets/*.css" $ do
route $ assetHashRoute cssHashes
compile copyFileCompilerSo, we’ve moved assets to the correct paths, but how do we link to
them? Once again the thread suggested an implementation, which feels a
little hacky to me, but I’m already in too deep and don’t have any
better ideas. Basically, we can run a url-rewriting function over the
content of our html files, so that if it finds a link to base.css it
rewrites as base-[HASH].css. Hakyll gives us a helper functions to
make this a lot easier, withUrls.
import Hakyll ( Compiler
, Item(itemIdentifier)
, fromFilePath
, getRoute
, withUrls
)
rewriteAssetUrls :: FileHashes -> Item String -> Compiler (Item String)
rewriteAssetUrls hashes item = do
route <- getRoute $ itemIdentifier item
pure $ case route of
Nothing -> item
Just r -> fmap (rewriteAssetUrls' hashes) item
rewriteAssetUrls' :: FileHashes -> String -> String
rewriteAssetUrls' hashes = withUrls rewrite
where rewrite url = maybe url
(addHashToUrl url)
(lookupHashForUrl hashes url)
lookupHashForUrl :: FileHashes -> String -> Maybe String
lookupHashForUrl hashes url =
let urlWithoutRootSlash = dropWhile (== '/') url
in Map.lookup (fromFilePath urlWithoutRootSlash) hashesrewriteAssetUrls delegates to rewriteAssetUrls' which takes in a
string of HTML, and produces a rewritten string given our FileHashes
map. We filter out any Hakyll item that doesn’t have a route because
– well, because that’s how it was implemented in the thread, and I
assume that’s to avoid “ephemeral” items like templates.
We can add this function to our compilers like so:
-- Combine all asset hash maps that we generated above
let assetHashes = imageHashes <> cssHashes <> jsHashes
match "index.html" $ do
route idRoute
compile
$ getResourceBody
>>= loadAndApplyTemplate "templates/layout.html" defaultContext
>>= rewriteAssetUrls assetHashesAnd so on.
Now, if you view source on this page (if you dare), you’ll see some lovely hashed asset URLs. Just lovely. Was SHA256 overkill? Probably.
If you want to see the actual implementation, it lives here.