mikey.bike . archive . other writings . lists . about
Postgresql’s AT TIME ZONE is one of those functions that seems
intuitive at first but can bite you very easily. If you’re going to
use it, you should carefully read the documentation and verify what
the input types are, because it’s a function that’s both overloaded
and polymorphic in its return type:
select '2019-08-10T8:51:00'::timestamp AT TIME ZONE 'PDT';
-- #=> 2019-08-10 11:51:00-04select '2019-08-10T8:51:00'::timestamptz AT TIME ZONE 'PDT';
-- #=> 2019-08-10 05:51:00In the first example we interpret the time as if it were in
PDT. Human interpretation: It’s 8:51 in California. Postgres then
prints out the time for my system time, which is EDT.
In the second, it’s the inverse: we interpret the time at EDT, and
then print out the time it would be in PDT. Human interpretation:
It’s 8:51 in Boston, what time is it in California? The reason 8:51
is interpreted as an EDT timestamp is because Postgres coerces to
timestamptz using the system time zone, and on my system that’s
EDT. So for me the following two queries are equivalent:
The double AT TIME ZONE looks weird, but it is useful for dealing
with timestamp without time zone columns, because really, AT TIME
ZONE does two jobs. One (when given a timestamp without a time zone)
is to assert that the time is in such-and-such time zone. The other
job (when given a timestamp with a time zone) is to query, what time
is it in such-and-such time zone?
select '2019-08-10T8:51:00'::timestamptz
AT TIME ZONE 'PDT';
select '2019-08-10T8:51:00'::timestamp
AT TIME ZONE 'EDT'
AT TIME ZONE 'PDT';Part of the problem is that “At” is an overloaded word, and you could
probably blame the English language for some of this. AT TIME ZONE
may have been better named GO AHEAD AND DO A THING, because at least
then we don’t think we know what it’s doing.
AT TIME ZONE is both overloaded and has a polymorphic return type. I
can’t immediately summon what an accurate type signature for this
function might be, so I wonder: could we define it in Haskell?
First, let’s define some dummy datatypes to work with. The timestamp
types will just be wrappers around String; we won’t actually do any
real conversion. For now, we’re just interested in the types.
type TimeZone = String
data Timestamp = Timestamp String deriving Show
data TimestampTz = TimestampTz String deriving ShowNow let’s define the class of types that can be converted using “at
time zone”. While normal, boring type classes are generic over one
type variable, an instance of this class must be defined for two
type variables: the input type and the output type. This kind of
craziness wasn’t allowed in the original Haskell standard, and we need
to enable the MultiParamTypeClasses language extension for this to
compile.
From A History of Haskell: Being Lazy With Class:
“We [the Haskell Committee] felt that single-parameter type classes were already a big step beyond our initial conservative design goals, and they solved the problem we initially addressed. Going beyond that would be an unforced step into the dark, and we were anxious about questions of overlap, confluence, and decidability of type inference. […] As time went on, however, user pressure grew to adopt multi-parameter type classes, and GHC adopted them in 1997 (version 3.00). However, multi-parameter type classes did not really come into their own until the advent of functional dependencies.”
{-# LANGUAGE MultiParamTypeClasses #-}
class AtTimeZoneConvertible input output where
atTimeZone :: TimeZone -> input -> outputNow let’s define the instances. One for Timestamp -> TimestampTz,
and one for TimestampTz -> Timestamp.
instance AtTimeZoneConvertible Timestamp TimestampTz where
atTimeZone timezone (Timestamp timestamp) =
TimestampTz $ timestamp ++ " " ++ timezone
instance AtTimeZoneConvertible TimestampTz Timestamp where
atTimeZone timezone (TimestampTz timestamp) =
-- do some time calculations...
Timestamp $ "10:10"And now we can use them like so:
ghci> atTimeZone "EDT" (Timestamp "10:10") :: TimestampTz
TimestampTz "10:10 EDT"
ghci> atTimeZone "EDT" (TimestampTz "14:10 UTC") :: Timestamp
Timestamp "10:10"Disregarding the fact that atTimeZone only ever returns "10:10"
when given a timestamp with time zone, this looks good! One annoying
thing is that we need to specify the return type, even though we’ve
only defined one instance for each input type. The problem is that
there isn’t anything preventing us from defining more instances and
having multiple possible output types for, say, converting a
Timestamp.
If we try to evaluate atTimeZone without specifying the return type,
we end up with this error:
ghci> atTimeZone "EDT" (TimestampTz "14:10 UTC")
<interactive>:53:1: error:
• Non type-variable argument
in the constraint: AtTimeZoneConvertible TimestampTz output
GHC is saying something like: I can infer the types as far as
AtTimeZoneConvertible TimestampTz output, and that’s just not enough
to decide what instance to use, because output is a type variable,
not a concrete type.
What we want to say is that the input type implies the output
type. This is exactly what the FunctionalDependencies language
extension lets us do. It looks like this:
{-# LANGUAGE FunctionalDependencies #-}
class AtTimeZoneConvertible input output | input -> output where
atTimeZone :: TimeZone -> input -> outputNow the compiler will prevent us from defining more than one instance for a
given input type, and we no longer need to specify the output type:
ghci> atTimeZone "EDT" (TimestampTz "14:10 UTC")
Timestamp "10:10"We can even call it multiple times, just like we did with AT TIME
ZONE:
ghci> atTimeZone "UTC" $ atTimeZone "EDT" (TimestampTz "14:10 UTC")
TimestampTz "10:10 UTC"If we try to define another instance for the Timestamp input type,
for say a String output type:
instance AtTimeZoneConvertible Timestamp String where
atTimeZone timezone (Timestamp timestamp) =
timestamp ++ " " ++ timezoneWe’ll get an error like this:
AtTimeZone.hs:14:10: error:
Functional dependencies conflict between instance declarations:
instance AtTimeZoneConvertible Timestamp TimestampTz
-- Defined at AtTimeZone.hs:14:10
instance AtTimeZoneConvertible Timestamp String
-- Defined at AtTimeZone.hs:18:10
I was also curious if this is possible in Rust. I am much less familiar with Rust, but I’ve at least heard a few times that Rust’s traits are like Haskell’s type classes. Let’s see how it might work. First, some data types:
type TimeZone = String;
#[derive(Debug)]
struct Timestamp {
ts: String,
}
#[derive(Debug)]
struct TimestampTz {
ts: String,
}As in Haskell, we’ll define a trait for AtTimeZoneConvertible:
trait AtTimeZoneConvertible<Output> {
fn at_time_zone(&self, time_zone: TimeZone) -> Output;
}One difference with Haskell already is that Rust has a more
object-oriented approach: a trait is defined in terms of some self
type. In Haskell, this was just another type variable,
input. Practically, there isn’t really a difference, as far as I can
tell.
Now let’s define some instances:
impl AtTimeZoneConvertible<TimestampTz> for Timestamp {
fn at_time_zone(&self, time_zone: TimeZone) -> TimestampTz {
TimestampTz {
ts: self.ts.to_string() + " " + &time_zone,
}
}
}
impl AtTimeZoneConvertible<Timestamp> for TimestampTz {
fn at_time_zone(&self, _time_zone: TimeZone) -> Timestamp {
Timestamp {
ts: "10:10".to_string(),
}
}
}This is similar to our approach in Haskell without functional dependencies. So I assumed the following code wouldn’t work:
fn main() {
println!(
"{:?}",
Timestamp {
ts: "14:10".to_string()
}
.at_time_zone("UTC".to_string())
.at_time_zone("EDT".to_string())
);
}
// $ cargo run
// Timestamp { ts: "10:10" }Surprisingly, it does! Rust seems to be saying, you’ve only given me
one instance for AtTimeZoneConvertible for your type, so I’ll use
it, even though multiple instances could exist.
I’m not quite sure why the Rust compiler allows this. It seems like a
reasonable thing to disallow, because there is no guarantee that the
compiler can infer the types. Remember that the trait is generic over
the Output type. If we add another instance, there is indeed a
failure to compile that same code:
impl AtTimeZoneConvertible<String> for Timestamp {
fn at_time_zone(&self, time_zone: TimeZone) -> String {
self.ts.to_string() + " " + &time_zone
}
}$ cargo build
Compiling rust-at-time-zone v0.1.0 (/Users/mjhoy/proj/rust-at-time-zone)
error[E0282]: type annotations needed
--> src/main.rs:45:9
|
45 | / Timestamp {
46 | | ts: "14:10".to_string()
47 | | }
48 | | .at_time_zone("UTC".to_string())
| |__________^ cannot infer type for `Output`
|
= note: type must be known at this point
error: aborting due to previous error
But perhaps in the real world, this isn’t such a problem, and the benefits of making life easier when there is just one instance are too good to pass up.
All that said, we can prevent this issue by enforcing only one
Output type per instance for a given type using an associated
type. It looks like this:
trait AtTimeZoneConvertible {
type Output;
fn at_time_zone(&self, time_zone: TimeZone) -> Self::Output;
}This is a lot like the input -> output functional dependency for
Haskell. The AtTimeZoneConvertible trait is no longer generic over
the Output type; instead, one Output type must be chosen for a
given instance. Our instances now look like this:
impl AtTimeZoneConvertible for Timestamp {
type Output = TimestampTz;
fn at_time_zone(&self, time_zone: TimeZone) -> TimestampTz { ... }
}
impl AtTimeZoneConvertible for TimestampTz {
type Output = Timestamp;
fn at_time_zone(&self, _time_zone: TimeZone) -> Timestamp { ... }
}If we try to define another instance for Timestamp, we now get an
error:
impl AtTimeZoneConvertible for Timestamp {
type Output = String;
fn at_time_zone(&self, time_zone: TimeZone) -> String {
self.ts.to_string() + " " + &time_zone
}
}$ cargo build
error[E0119]: conflicting implementations of trait `AtTimeZoneConvertible`
for type `Timestamp`:
--> src/main.rs:30:1
Spend extra time reviewing code that uses AT TIME ZONE or coercions
such as ::timestamp or ::timestamptz. The behavior is often
surprising.
At work, we have data that moves from a production database into a
warehouse. In this process, for some reason, timestamp columns are
coerced to timestamptz. This means any query that uses AT TIME
ZONE is semantically different depending on whether you run it in the
warehouse or in production, and was the source of some subtle bugs.
Also, both Haskell and Rust have good support for representing functions that are overloaded and polymorphic in their return types. GHC is a bit more strict than the Rust compiler, though; you may want to use associated types in Rust to enforce functional dependencies.