mikey.bike . archive . other writings . lists . about
This site runs on the Hakyll static site generator which has
so far been great with one exception: it lacks a system for asset
hashing. The problem is that if I make a change to a css file, if
visitors have cached that asset, they won’t see the change if the file
hasn’t been renamed. So I found myself manually renaming files like a
heathen – base-5.css
to base-6.css
, etc. – as well as updating
their references in my templates.
Ideally, this process is automated for me, and instead of incrementing a number, a hash of the file contents is generated every time I make a css tweak. Luckily, we are programmers! And we can fix this with the advanced technique of googling “hakyll asset hashing” and copying from the first result.
Here’s what I ended up doing, using this thread as a starting point.
The core hash function looks like this:
{-# LANGUAGE BangPatterns #-}
import qualified Crypto.Hash.SHA256 as SHA256
import qualified Data.ByteString as BS
import qualified Data.ByteString.Base16 as Base16
import qualified Data.ByteString.Char8 as BS8
hash :: FilePath -> IO String
= do
hash path !h <- SHA256.hash <$> BS.readFile path
pure $! BS8.unpack $! Base16.encode h
Now, I’ll be honest, I don’t really understand what I’m doing with
regard to strictness here; but a) it seems reasonable to avoid
unevaluated thunks of files floating around and b) the thread above
sprinkled in !
and $!
so I did too. Oddly, though, they used
ByteString.Lazy
– I don’t really know why and I switched it to the
strict bytestrings. (Let me know if you know why that’s dumb.)
Laziness aside, hash
gets a digest of a bytestring from a file,
using the cryptohash-256 library, and then encodes that in
base64. Easy!
The next idea from that thread is to build a map, keyed from Hakyll
Identifier
s (which represent items to compile, such as css
files)
to their hashes:
import qualified Data.Map as Map
import Data.Map ( Map )
import Hakyll ( Identifier
, fromFilePath
, getRecursiveContents
)
type FileHashes = Map Identifier String
mkFileHashes :: FilePath -> IO FileHashes
= do
mkFileHashes dir <- getRecursiveContents (\_ -> pure False) dir
allFiles fmap Map.fromList $ forM allFiles $ \innerPath -> do
let fullPath = dir </> innerPath
!h <- hash fullPath
pure (fromFilePath fullPath, h)
getRecursiveContents
is a nice Hakyll helper function that just
walks a directory. We can point it at say an assets/
directory and
run the hash function for each file.
Given this mapping of identifiers to their hashes, we can produce a route that interpolates that hash:
import Hakyll ( Routes
, customRoute
, fromFilePath
, toFilePath
)import System.FilePath ( (</>) )
import System.FilePath.Posix ( takeBaseName
, takeDirectory
, takeExtension
)
assetHashRoute :: FileHashes -> Routes
= customRoute $ \identifier ->
assetHashRoute fileHashes let maybeHash = Map.lookup identifier fileHashes
= toFilePath identifier
path in maybe path (addHashToUrl path) maybeHash
addHashToUrl :: FilePath -> String -> String
=
addHashToUrl path hash let baseName = takeBaseName path
= takeExtension path
extension = takeDirectory path
dir in dir </> baseName <> "-" <> hash <> extension
The basic idea is that we can use assetHashRoute
when producing
routes for some asset directory, and Hakyll will rewrite their routes
with the addHashToUrl
function.
Here’s how I do it with my various asset directories:
import Hakyll
-- Top-level site generation function.
site :: IO ()
= do
site $ do
hakyll -- Generate the mappings
<- preprocess (mkFileHashes "images")
imageHashes <- preprocess (mkFileHashes "stylesheets")
cssHashes <- preprocess (mkFileHashes "js")
jsHashes
"images/**/*" $ do
match $ assetHashRoute imageHashes
route
compile copyFileCompiler
"js/**/*" $ do
match
route idRoute
compile copyFileCompiler
"stylesheets/*.css" $ do
match $ assetHashRoute cssHashes
route compile copyFileCompiler
So, we’ve moved assets to the correct paths, but how do we link to
them? Once again the thread suggested an implementation, which feels a
little hacky to me, but I’m already in too deep and don’t have any
better ideas. Basically, we can run a url-rewriting function over the
content of our html files, so that if it finds a link to base.css
it
rewrites as base-[HASH].css
. Hakyll gives us a helper functions to
make this a lot easier, withUrls
.
import Hakyll ( Compiler
Item(itemIdentifier)
,
, fromFilePath
, getRoute
, withUrls
)
rewriteAssetUrls :: FileHashes -> Item String -> Compiler (Item String)
= do
rewriteAssetUrls hashes item <- getRoute $ itemIdentifier item
route pure $ case route of
Nothing -> item
Just r -> fmap (rewriteAssetUrls' hashes) item
rewriteAssetUrls' :: FileHashes -> String -> String
= withUrls rewrite
rewriteAssetUrls' hashes where rewrite url = maybe url
(addHashToUrl url)
(lookupHashForUrl hashes url)
lookupHashForUrl :: FileHashes -> String -> Maybe String
=
lookupHashForUrl hashes url let urlWithoutRootSlash = dropWhile (== '/') url
in Map.lookup (fromFilePath urlWithoutRootSlash) hashes
rewriteAssetUrls
delegates to rewriteAssetUrls'
which takes in a
string of HTML, and produces a rewritten string given our FileHashes
map. We filter out any Hakyll item that doesn’t have a route because
– well, because that’s how it was implemented in the thread, and I
assume that’s to avoid “ephemeral” items like templates.
We can add this function to our compilers like so:
-- Combine all asset hash maps that we generated above
let assetHashes = imageHashes <> cssHashes <> jsHashes
"index.html" $ do
match
route idRoute
compile$ getResourceBody
>>= loadAndApplyTemplate "templates/layout.html" defaultContext
>>= rewriteAssetUrls assetHashes
And so on.
Now, if you view source on this page (if you dare), you’ll see some lovely hashed asset URLs. Just lovely. Was SHA256 overkill? Probably.
If you want to see the actual implementation, it lives here.