TL;DR

Asymptotic Linearity

Definition An estimator $\psi_n$, (a function of i.i.d. observations $O_1, \ldots O_n$,) for $\psi \in \mathbb{R}^k$ is asymptotically linear if $$ \sqrt{n} (\psi_n - \psi_0) = \frac{1}{\sqrt{n}} \sum_{i=1}^n D(O_i) + o_p(1) $$ such that $E_0(D)=0$ and $E_0(DD^\intercal)$ is finite and non-singular. The function $D$ is called the influence curve or influence function.

By the central limit theorem, we have that $\sqrt{n}\psi_n \leadsto N(\psi_0, E_0(DD^\intercal))$.

The Delta Method

Theorem Let $\phi: \mathbb{D}_\phi \subset \mathbb{R}^k \mapsto \mathbb{R}^m$ be differentiable at $\psi$. Let $\psi_n \in \mathbb{D}_\phi$. If $\sqrt{n}(\psi_n - \psi)$ converges in distribution, then $$\sqrt{n}(\phi(\psi_n) - \phi(\psi)) = \phi'_\psi(\sqrt{n}(\psi_n - \psi)) + o_P(1).$$ For a proof of a more general result, see Theorem 3.1 in Asymptotic Statistics by van der Vaart. A special case of van der Vaart Theorem 3.1 is what wikipedia and Statistics 101 call the delta method, where it is typically assumed that $\sqrt{n}(\psi_n - \psi) \leadsto N(0, \Sigma)$.

If $\psi_n$ is asymptotically linear with IC $D$, then by the continuous mapping theorem and the definition of asymptotic linearity we have

\begin{align} \sqrt{n}(\phi(\psi_n) - \phi(\psi_0)) =& \phi_{\psi_0}' \frac{1}{\sqrt{n}}\sum_{i=1}^n D(O_i) + o_p(1)\\ =& \frac{1}{\sqrt{n}}\sum_{i=1}^n \phi_{\psi_0}'D(O_i) + o_p(1) \end{align}

This means that the IC of the transformed estimator $\phi(\psi_n)$ is $\phi_{\psi_0}'D$. Note that if $\phi_{\psi_0}'$ is $0$, this is not very interesting.

Automatic differentiation

Automatic differentiation is weird because if you don't know what it is, you probably think that you do. Automatic differentiation is not symbolic differentiation or numerical differentiation.

Symbolic differentiation is when you take an expression, say sin(x), that you want to differentiate, and replace it with an expression for the derivative (cos(x)). This can be done by computers, and is what is going on if you ask WolframAlpha for the derivative sin(x) w.r.t. x. Symbolic diff is nice and exact but can explode in complexity, and you need an expression composed of parts that the system already knows how to differentiate, not an arbitrary subroutine.

Numerical differentiation is when you approximate the derivative of some function at a particular value by evaluating the function at some perturbed values of the input and use linear interpolation. For example, you could approximate sin'(x) at x=5 by computing (sin(5.0001)-sin(5))/0.0001. Numerical diff is easy to implement but is approximate and can be slow, particularly when you need to compute the gradient of a function of many inputs.

Automatic differentiation doesn't do either of these things. Basically, it takes some arbitrary subroutine you write as input, and produces a new subroutine that computes the derivative. The produced code computes the exact derivative, not an approximation. In many cases it's essentially the same code that you would write if you were to do it by hand. If the target function is not differentiable everywhere, you'll still get code that computes the derivative correctly where the function is differentiable. It can also be used on arbitrary algorithms, for example

function f(x::Vector{Float64})
    total = 0.0
    for y in x
        if y > 0.0
            total += foo(x)
        else
            total += bar(x)
        end
    end
    total
end

Symbolic differentiation wouldn't know what to do with the for loop and if statement. Autodiff doesn't have a problem with this.

Here is an interesting blog post about autotodiff and another here. This post describes one method called reverse-mode automatic differentiation. There are a handful of packages that implement different kinds of autodiff in Julia with information available at juliadiff.org. www.autodiff.org has a lot of useful information too.

A particularly easy to understand method for autodiff uses dual numbers. The DualNumbers.jl package implements dual numbers in Julia, and the ForwardDiff.jl uses that for autodiff.

Putting it all together

The estimation methods in TargetedLearning.jl returns objects that are subtypes of the Estimate type which contain estimated influence curve information. Using operator overloading to implement automatic differentiation in nearly the same way that DualNumbers.jl does, we can compute arbitrary an arbitrary transformation $\phi$ of Estimates and automatically calculate an estimate of the IC of the transformed Estimate without having to work it out by hand. The estimate of the IC of $\phi(\psi_n)$ is computed as $\phi'_{\psi_n} D_n$ where $D_n$ is the estimated IC of $\psi_n$.

For example suppose we have Estimate objects ey1 and ey0 which are estimates of $E(Y_1)$ and $E(Y_0)$ (mean counterfactual outcomes under treatments 1 and 0) respectively. If the outcome is binary, we might be interested in the log causal odds ratio. We can compute that as log((ey1/(1-ey1))/(ey0/(1-ey0))). This expression yields a new Estimate which includes an estimated influence curve for the log causal odds ratio.

It's important to remember that if $\phi_{\psi_0}'$ is $0$, this might not be so useful.