PNWSCALA 2013
Inlining anonymous functions with macros
Erik Osheim (@d6)
Purpose of this talk
We'll be generating implementations from anonymous functions.
Keep in mind that this technique...
- ... uses macros (and quasiquotes)
- ... mainly exists for optimization purposes
- ... does not increase expressiveness
- ... is a work in progress
What are macros?
Macros are self-cleaned, compile-time code transformations..
In 2.10, macros are defined as methods which take parameters as ASTs (abstract syntax trees), and return a result as an AST.
What are anonymous functions?
Anonymous functions are values that can be called. For example:
ns.map((x: Int) => x + 6) ns.map(x => x + 6) ns.map(_ + 6) def plusSix(n: Int): Int = n + 6 ns.map(plusSix)
(What's different about the last one?)
What was different?
The last application was equivalent to:
def plusSix(n: Int): Int = n + 6 ns.map(x => plusSix(x))
This anonymous function really just wraps a (non-anonymous) method.
What is inlining?
In our case, inlining means rewriting an application with its implementation.
def addOne(n: Int): Int = n + 1 def timesTwo(n: Int): Int = n * 2 def minusThree(n: Int): Int = n - 3 addOne(timesTwo(minusThree(6))) // returns 7 def inlined(n: Int): Int = ((n - 3) * 2) + 1 inlined(6) // also returns 7
Doesn't the JVM already do this?
Yes, the JVM has heuristics to guide its own inlining:
- Inlining increases the "horizon of optimization"
- Static, private, and final call are "easy"
- Other calls can be inlined "optimistically"
- Optimistic inlining will be deoptimized later if necessary
(Deoptimization for an inlined method occurs when a
method
needs to dispatch to more than two destinations.
This is called a
"megamorphic call".)
So what's the "inlining problem" problem?
Cliff Click coined the term in 2011 to describe an increasingly common pattern.
It describes a situation where numerous JVM optimizations (which Click calls "an easy 10x speedup") are lost due to a failure to inline.
So what's the "inlining problem" problem?
This is a somewhat loose translation of his example to Scala:
type R = Array[Int] class Blitter(f: (Int, Int) => Int) { def blit(src1: R, src2: R, dst: R) { var i = 0 while (i < dst.length) { dst(i) = f(src1(i), src2(i)) i += 1 } } }
In the actual example the iteration logic is more complicated than a simple while loop, but the idea is the same.
So what's the "inlining problem" problem?
Click observes that this is a problem for functional programming on the JVM.
type R = Array[Int] class Blitter(f: (Int, Int) => Int) { def blit(src1: R, src2: R, dst: R) { var i = 0 while (i < dst.length) { dst(i) = f(src1(i), src2(i)) i += 1 } } }
Abstracting out the blitting function f results in a megamorphic call to f.apply.
Our programs are 10x slower!!??
If you squint a bit you'll see map, exists, forall, flatMap, etc.
Click claims our programs may be 10x slower due to this abstraction.
While investigating how to solve this problem, we should consider:
- How serious is the problem today?
- How painful is the solution?
- Is it worth it?
Our programs are 10x slower!!??
Based on anecdotal cases from my own testing:
- Observed 2-5x slow down when looping over arrays
- Only observable in megamorphic situations
- Many programs show no benefit from inlining
- Things seem better than Click says they are
Your programs (probably) won't get 10x faster if you "fix" this.
Is it worth it?
In some cases, I would say yes.
Especially cases dealing with "manual specialization" or
tight loops over arrays.
If you're in this space, you know where this will help.
Anyway, it's an interesting problem.
Using macros to inline functions
We want to write methods that accept literal anonymous functions, and inline their implementations using macros.
def normal(n: Int, f: Int => Int): Int = f(n) normal(6, _ * 3) // calls f(n), which returns 6 * 3 def secret(n: Int, f: Int => Int): Int = ??? secret(6, _ * 3) // returns 6 * 3
Why do we need literal functions?
We would like a function that "does the work" itself or else we can't inline properly.
After inlining, the code should be as direct as possible.
def tripled(n: Int): Int = n * 3 secret(6, _ * 3) // returns 6 * 3 secret(6, tripled) // calls tripled(6) !!!
Our secret macro can only look at what it is given. If it sees a call to tripled it doesn't have any clue what the body of the function will be.
What does a macro look like?
object Scope { def secret(n: Int, f: Int => Int): Int = macro secretImpl def secretImpl(c: Context) (n: c.Expr[Int], f: c.Expr[Int => Int]): c.Expr[Int] = { // transform input Exprs into an out Expr } }
There are two parts: the macro implementation
(secretImpl),
and the "macro-ized" method
(secret).
What does an Expr (AST) look like?
Scope.secret(6, _ * 3) // becomes (approximately) Apply( // call Scope.secret method Select(Scope, secret), List( // 2 arguments Literal(Constant(6)), Function( // anonymous function List(x), // one parameter named x Apply( // call x.+ Select(x, +), List( // 1 argument: Literal(Constant(3)))))))
What does an Expr (AST) look like?
There is a bit more complexity regarding names:
Select( This(newTypeName("Scope")), newTermName("secret")) Select( Ident(newTermName("x$1")), newTermName("$plus"))
So what does an inlining transformation look like?
- Verify arguments are "valid"
- Build tree using quasi-quoting
- Inline anonymous function body into tree
- Inline arguments for parameters
- Clean up
So what does an inlining transformation look like?
// user calls the macro, confirm 2nd arg is function literal val z = Scope.secret(6, (x: Int) => x * 3) // build starting tree, using f.apply(n) val z = ((x: Int) => x * 3).apply(6) // remove apply, inline f's body val z = x * 3 // substitute 6 (argument) for x (parameter) val z = 6 * 3
(Reminiscent of beta reduction in lambda calculus.)
How do you build a tree?
Easily, using quasi-quotes and the "Macro Paradise" compiler plugin!
Here's how to build the tree want want for the previous example:
val n: Expr[Int] = ... val f: Expr[Int => Int = ... q"""$f.apply($n)"""
This is before any inlining happens.
How do the AST transformations work?
Like most AST rewrites, we use pattern matching:
override def transform(tree: Tree): Tree = tree match { case Apply(Select(Function(params, body), ApplyName), args) => ... // new tree goes here case Apply(Function(params, body), args) => ... // new tree goes here case _ => super.transform(tree) }
How do the AST transformations work?
We start with the method body, inlining each argument to the method.
case Apply(Function(params, body), args) => params.zip(args).foldLeft(body) { case (b, (param, arg)) => inlineSymbol(param.symbol, b, arg) }
Let's see it in action
Scope.secret(6, x$1 => x$1 + 3)Apply( Select(This(newTypeName("Scope")), newTermName("secret")), List( Literal(Constant(6)), Function( List(ValDef( Modifiers(PARAM | SYNTHETIC), newTermName("x$1"), TypeTree(), EmptyTree)), Apply( Select(Ident(newTermName("x$1")), newTermName("$plus")), List(Literal(Constant(3)))))))
Let's see it in action
(x$1 => x$1 + 3).apply(6)Apply( Select( Function( List(ValDef( Modifiers(PARAM | SYNTHETIC), newTermName("x$1"), TypeTree(), EmptyTree)), Apply( Select(Ident(newTermName("x$1")), newTermName("$plus")), List(Literal(Constant(3))))), newTermName("apply")), List(Literal(Constant(6))))
Let's see it in action
x$1 + 3Apply( Select(Ident(newTermName("x$1")), newTermName("$plus")), List(Literal(Constant(3))))
Let's see it in action
6 + 3Apply( Select(Literal(Constant(6)), newTermName("$plus")), List(Literal(Constant(3))))
What clean up is necessary?
New trees and fragments need valid owners/types.
Use c.resetLocalAttrs to "fix" generated trees.
This represents the biggest problem with this strategy
(at least things would fail at compile-time).
So is that it?
No! We can use similar stratgies to generate anonymous classes.
This ends up looking a bit like C++ templates.
What's an example of generating anonymous classes?
Imagine the following trait, which defines an infinite sequence:
trait InfStream[@sp A] { // nth item from an infinite sequence def nth(n: Int): A // infinite sequence as a stream def stream: scala.collection.immutable.Stream[A] // first n items from an infinite sequence def vector(n: Int): Vector[A] }
What's an example of generating anonymous classes?
Logically, to define any instance of InfStream[R] we need to know three things:
- The starting state (S) of the sequence.
- Given a state, how to get to the "next" state.
- How to get a value R out of the current state.
With these we can easily implement all of InfStream[R].
What's an example of generating anonymous classes?
So, translating the previous slide into code:
def infStream[S, R]( init: () => S, next: S => S, get: S => R): InfStream[R] = { // implementation goes here... }
What's an example of generating anonymous classes?
Let's consider the Fibonacci sequence.
After a little thought, we can implement it as:
infStream[(Long, Long), Long]( () => (0L, 1L), { case (x, y) => (y, x + y) }, { case (x, y) => x } )
What's an example of generating anonymous classes?
Imagine profiling shows us that InfStream is affected
by the "inlining problem".
(In fact, it is not.)
We could use quasiquotes and inlining to build a custom InfStream instance.
What's an example of generating anonymous classes?
Let's take a look at generating the nth method:
def nth($n: Int): $A = { def loop($i: Int, $b: $B, $c: $C): $A = if ($i <= 1) { $result.apply($b, $c) } else { val ($b2, $c2) = $step.apply($b, $c) loop($i - 1, $b2, $c2) } val ($b, $c) = $start.apply() loop($n, $b, $c) }
(Dollar signs represent fresh names in a quasiquote.)
What's an example of generating anonymous classes?
Recall our anonymous functions:
val init = () => (0L, 1L) val next = (x: Long, y: Long) => (y, x + y) val get = (x: Long, y: Long) => x
What's an example of generating anonymous classes?
After we inline our anonymous functions:
def nth(n: Int): Long = { def loop(i: Int, b: Long, c: Long): Long = if (i <= 1) { b } else { val (b2, c2) = (c, b + c) loop(i - 1, b2, c2) } val (b, c) = (0L, 1L) loop(n, b, c) }
There sure are a lot of tuples. Too bad we don't have multiple returns.
What's an example of generating anonymous classes?
This is another thing we can fix with an inlining macro:
def nth(n: Int): Long = { def loop(i: Int, b: Long, c: Long): Long = if (i <= 1) { b } else { loop(i - 1, c, b + c) } loop(n, 0L, 1L) }
In principle, we can do this with case classes as well.
What are the advantages?
- Defeat the inlining problem
- Fewer classes to load, potentially less bytecode
- Second-guess scalac/JVM
- Remove boiler-plate without compromising performance
- Extend the language!
What are some problems?
- Resetting attributes
- Need to be careful which trees to use
- Requires education
- "Template bloat"
What are some problems?
Due to resetting attributes, you might see the following issues:
import spire.implicits._ import scala.{math => mth} cfor(0)(_ < 10, _ + 1) { i => println(mth.min(i, 3)) }<console>:12: error: object mth is not a member of package scala
Not currently an issue (thanks resetLocalAttrs!)
What are some problems?
If the outer tree isn't using fresh names, it's possible inlining would produce conflicts.
(This is why it's important to use fresh names with quasiquoting.)
Fortunately, this impacts macro devs not macro users.
What are some problems?
Generating implementations with inlining isn't something Scala devs normally do.
C++ devs are familiar with "template bloat" and reckless use of these inlining features could potentially cause similar problems in Scala.