SCALA DAYS 2012
Generic Numeric Programming
Through Specialized Type Classes
by Erik Osheim
Topic
I've been working on generic numeric programming in Scala.
My paper (and this talk) will discuss two efforts:
- com.azavea.math.Numeric
a proof-of-concept I created - spire (in collaboration with Tom Switzer)
"Powerful new number types and numeric abstractions for Scala."
So what is generic numeric programming?
So what is generic numeric programming?
The big idea is to require numeric capabilities,
not specific number types.
So what is generic numeric programming?
The big idea is to require numeric capabilities,
not specific number types.
Related to DRY ("Don't repeat yourself")
When do you need this?
When do you need this?
Glib:
def foo(as:List[A]) = as.sum
When do you need this?
Glib:
def foo(as:List[A]) = as.sum
Less glib:
Anytime you don't know which numeric type you'll have.
Example 1:
You have some code doing a bunch of integer math,
(using Long) and you realize it will overflow sometimes.
So, you want it to use BigInt in those cases.
Example 2:
You have some financial modeling code using BigDecimal,
but your boss wants to be able to "preview" simulation results
without waiting N minutes.
So, you want to use Double for "preview mode".
Example 3:
You're an author of a JSON encoding library.
You aren't sure which numeric types your users need.
So you want to support all "number" types.
Example 4:
You are creating mathematical data structures,
which can represent different kinds of values
(e.g. Matrix[A], Raster[A], etc.).
SPOILER ALERT! This is my use case.
Example 5:
You fantasize about code generation.
// stuck in boilerplate factory // plz send help def fooInt(ns:Array[Int]) = ... def fooLong(ns:Array[Long]) = ... def fooDouble(ns:Array[Double]) = ... def fooBigInt(ns:Array[BigInt]) = ...
Example 6:
You are a time traveler.
Your code must work with number types
FROM THE FUTURE which don't exist yet.
Example 6:
You are a time traveler.
Your code must work with number types
FROM THE FUTURE which don't exist yet.
class ComplexFloat(val u:Long) extends AnyVal { ... }
OK, fine, I'll write generic code.
What's the problem?
Well, there are a few...
JVM bytecode
JVM distinguishes primitives (int, double, byte, etc.)
from objects (classes, which extend java.lang.Object).
Primitives aren't classes, so they can't be used with Java generics.
Thus at the bytecode level separate code is required for handling each primitive type and for handling objects.
JVM bytecode
Boxed types like java.lang.Integer try to help.
For instance, they enable java.util.List to be used with int.
However, they
- Lack numeric methods (e.g. add)
- Lack useful shared interface.
- Impact performance (allocation, GC, etc.)
JVM bytecode
So, at the bytecode level we must have different code to handle primitives and objects. Can we do this?
Yes, with specialization!
Specialization
The idea is to take generic Scala code and create new code for the primitive types (i.e. subclasses of AnyVal).
Can be used with classes and also with methods.
A bit like templates in C++.
Specialization
Given a class Matrix[A](a:A) you could imagine* manually specializing Matrix on Int by
- Creating a VectorInt(a:Int) class, replacing all occurances of A with Int.
- Rewriting occurences of new Matrix[Int](x) to new MatrixInt(x)
* NOTE: It's actually a bit more complex than this.
Specialization
What complications? Briefly:
- Separate compilation
- Maintaining class hierarchy
- Visibility restrictions
- Variance
- and many more...
Specialization
Currently specialization imposes restrictions on the developer who wants to use it effectively.
In particular, inheritance hierarchies often need to be reworked.
Some constructions don't specialize well (e.g. inner classes).
Requires profiling and willingness to dive into bytecode.
Specialization
In summary, we have specialization which is designed to solve this kind of problem. So how do we use it?
Type classes
We need type classes to abstract across numeric types because the they don't share a useful interface. This also reduces coupling.
This kind of generic code is called "ad-hoc polymorphism".
Kevin Wright "Type classes? Those are so 2010."
Type classes
Quick overview:
When dealing with a generic type A, we pass an instance of the type class (e.g. Eqv[A]) parameterized on the same type.
This type class instance provides methods we can use to operate on values of A (which is otherwise completely opaque).
Type classes
Example providing an equivalence relation:
trait Eqv[A] { def eqv(a:A, b:A):Boolean def neqv(a:A, b:A):Boolean = !eq(a, b) } object IntHasEqv extends Eqv[Int] { def eqv(a:Int, b:Int) = a == b } object FloatHasEqv extends Eqv[Float] { def eqv(a:Float, b:Float) = a == b } object Eqv { implicit val intHasEqv = IntHasEqv implicit val floatHasEqv = FloatHasEqv }
Type classes
Example using the equivalence relation:
def contains[A:Eqv](a:A, as:List[A]) = { val e = implicitly[Eqv[A]] as.foldLeft(false) { (b, x) => b || e.eqv(x, a) } }
Kris Nuttycombe
"Typeclasses are just the Strategy pattern, with
kinds."
Type classes
There is a bit more that you will want:
- EqvOps[A] to provide infix operators (e.g. ===)
- Eqv.apply[A] to access implicit param
- Instances for more types (e.g. ByteHasEqv)
- Implicit views?
Type classes
OK, I'm convinced we should use specialized type classes.
Is that it?
Type classes
OK, I'm convinced we should use specialized type classes.
Is that it?
Well...
Type classes
The last issue involves infix operators.
Each use of an infix operator (e.g. +) is resolved via an implicit, and creates a new temporary object:
implicit val n:Numeric[Int] = ... a + b * c // allocates 2 objects, as: new NumericOps(a)(n).+(new NumericOps(b)(n).*(c)) num.plus(a, num.times(b, c)) // no allocations
Type classes
I wrote a compiler plugin to remove the NumericOps[A] instantiation and call methods directly on Numeric[A].
The result was a 5x speed up when using Int!
Numeric
Beyond the challenges specialized type classes pose, numeric type classes have their own issues:
- Literals
- Anticipating new types
- Convenience vs safety
- Generic types versus numeric tower
- Operator names
- Inconsistency between types
- Commutativity vs OO
- Division
Numeric
When expressing mathematical concepts in code, it's important to try to minimize the noise.
def addOne[T:Numeric](t:T) = t + Numeric[T].one def addN[T:Numeric](t:T, n:Int) = t + Numeric[T].fromInt(n)
(Can be solved if you're willing to add a second type parameter.)
Numeric
When thinking of Int or Double it's easy to imagine all numeric types supporting a relatively large range of operations.
In practice, this often doesn't work.
For example, complex numbers are unordered. Many number-like things lack a sign (e.g. intervals).
Sometimes even equality is hard
(e.g. endomorphism rings)!
Numeric
Tensions around which methods should be included also include tension over which types can properly implement a method.
For instance, can Long implement div or sqrt properly?
Numeric
The approach we have taken in Spire is to support very fine-grained type classes in the spire.algebra package as well as the coarser type classes in spire.math.
Basically, we err on the side of strictness for the former, and permissiveness for the latter.
Ring Hierarchy
|
Group Hierarchy
|
Eq Hierarchy
|
Odds and Ends
|
Numeric
In contrast to the previous algebraic type classes (which do no inheritance outside their respective hierarchies) our implementations of Integral, Fractional and Numeric satisfies all the numeric properties one usually wants:
- Order
- Signed
- Field
- NRoot
- ConvertableFrom
- ConvertableTo
Numeric
One of the things that has helped motivate the design of Spire were our attempts to implement new useful numeric types. Here are some types that Spire provides (which have tested and stretched the type class hierarchies):
- Rational
- Complex[A]
- Real
- Interval[A]
- FPFilter[A]
- Number
Numeric
In addition to expanding the realm of available numeric types, this approach has the potential to completely remove the penalty of generic code.
(Nota bene: This currently requires specialized value classes, a compiler plugin, or similar.)
test | direct (ms) | new (ms) | old (ms) | ||
infix-adder-int | 0.9 | 1.3 | 1.43x | 16.6 | 19.00x |
infix-adder-long | 1.9 | 1.6 | 0.87x | 24.3 | 12.93x |
infix-adder-float | 4.1 | 3.6 | 0.88x | 16.6 | 4.03x |
infix-adder-double | 3.5 | 3.8 | 1.07x | 18.0 | 5.14x |
array-total-int | 1.3 | 1.1 | 0.90x | 19.0 | 15.20x |
array-total-long | 1.6 | 1.9 | 1.15x | 23.1 | 14.23x |
array-total-float | 3.3 | 3.5 | 1.08x | 16.6 | 5.12x |
array-total-double | 4.0 | 3.4 | 0.84x | 18.9 | 4.72x |
test | direct (ms) | new (ms) | old (ms) | ||
find-max-int | 3.9 | 2.5 | 0.65x | 17.4 | 4.48x |
find-max-long | 1.5 | 2.1 | 1.42x | 21.0 | 14.00x |
find-max-float | 3.8 | 5.3 | 1.40x | 18.3 | 4.87x |
find-max-double | 5.5 | 5.3 | 0.95x | 16.3 | 2.95x |
insertion-sort-int | 2.0 | 1.6 | 0.81x | 14.4 | 7.19x |
insertion-sort-long | 1.4 | 1.4 | 1.00x | 14.5 | 10.55x |
insertion-sort-float | 1.6 | 1.8 | 1.08x | 19.3 | 11.85x |
insertion-sort-double | 1.1 | 2.1 | 1.89x | 14.1 | 12.56x |
test | direct (ms) | new (ms) | old (ms) | ||
increment-int1 | 0.0 | 12.3 | n/a | ||
increment-int2 | 0.0 | 0.1 | n/a | 12.8 | n/a |
increment-int3 | 0.0 | 0.6 | n/a | 12.9 | n/a |
increment-int4 | 0.0 | 12.8 | n/a |
Numeric
Performance is so important because as soon as a generic implementation is slow enough, people will avoid it, and we lose many of the benefits.
Numeric
I hope that I have persuaded you that it's worth trying to create (and consume) libraries which support generic numeric computation, either using Spire or something else.
Spire is available at https://github.com/non/spire.