SCALA DAYS 2012

Generic Numeric Programming

Through Specialized Type Classes

by Erik Osheim

Topic

I've been working on generic numeric programming in Scala.

My paper (and this talk) will discuss two efforts:

  1. com.azavea.math.Numeric
    a proof-of-concept I created
  2. spire (in collaboration with Tom Switzer)
    "Powerful new number types and numeric abstractions for Scala."

So what is generic numeric programming?

So what is generic numeric programming?

The big idea is to require numeric capabilities,
not specific number types.

So what is generic numeric programming?

The big idea is to require numeric capabilities,
not specific number types.

Related to DRY ("Don't repeat yourself")

When do you need this?

When do you need this?

Glib:

def foo(as:List[A]) = as.sum
    

When do you need this?

Glib:

def foo(as:List[A]) = as.sum
    

Less glib:

Anytime you don't know which numeric type you'll have.

Example 1:

You have some code doing a bunch of integer math,
(using Long) and you realize it will overflow sometimes.

So, you want it to use BigInt in those cases.

Example 2:

You have some financial modeling code using BigDecimal,
but your boss wants to be able to "preview" simulation results
without waiting N minutes.

So, you want to use Double for "preview mode".

Example 3:

You're an author of a JSON encoding library.

You aren't sure which numeric types your users need.

So you want to support all "number" types.

Example 4:

You are creating mathematical data structures,
which can represent different kinds of values
(e.g. Matrix[A], Raster[A], etc.).

SPOILER ALERT! This is my use case.

Example 5:

You fantasize about code generation.

// stuck in boilerplate factory
// plz send help
def fooInt(ns:Array[Int]) = ...
def fooLong(ns:Array[Long]) = ...
def fooDouble(ns:Array[Double]) = ...
def fooBigInt(ns:Array[BigInt]) = ...
    

Example 6:

You are a time traveler.

Your code must work with number types
FROM THE FUTURE which don't exist yet.

Example 6:

You are a time traveler.

Your code must work with number types
FROM THE FUTURE which don't exist yet.

class ComplexFloat(val u:Long)
  extends AnyVal { ... }
    

OK, fine, I'll write generic code.

What's the problem?

Well, there are a few...

JVM bytecode

JVM distinguishes primitives (int, double, byte, etc.)
from objects (classes, which extend java.lang.Object).

Primitives aren't classes, so they can't be used with Java generics.

Thus at the bytecode level separate code is required for handling each primitive type and for handling objects.

JVM bytecode

Boxed types like java.lang.Integer try to help.

For instance, they enable java.util.List to be used with int.

However, they

JVM bytecode

So, at the bytecode level we must have different code to handle primitives and objects. Can we do this?

Yes, with specialization!

Specialization

The idea is to take generic Scala code and create new code for the primitive types (i.e. subclasses of AnyVal).

Can be used with classes and also with methods.

A bit like templates in C++.

Specialization

Given a class Matrix[A](a:A) you could imagine* manually specializing Matrix on Int by

  1. Creating a VectorInt(a:Int) class, replacing all occurances of A with Int.
  2. Rewriting occurences of new Matrix[Int](x) to new MatrixInt(x)

* NOTE: It's actually a bit more complex than this.

Specialization

What complications? Briefly:

  1. Separate compilation
  2. Maintaining class hierarchy
  3. Visibility restrictions
  4. Variance
  5. and many more...

Specialization

Currently specialization imposes restrictions on the developer who wants to use it effectively.

In particular, inheritance hierarchies often need to be reworked.

Some constructions don't specialize well (e.g. inner classes).

Requires profiling and willingness to dive into bytecode.

Specialization

In summary, we have specialization which is designed to solve this kind of problem. So how do we use it?

Type classes

We need type classes to abstract across numeric types because the they don't share a useful interface. This also reduces coupling.

This kind of generic code is called "ad-hoc polymorphism".

Kevin Wright "Type classes? Those are so 2010."

Type classes

Quick overview:

When dealing with a generic type A, we pass an instance of the type class (e.g. Eqv[A]) parameterized on the same type.

This type class instance provides methods we can use to operate on values of A (which is otherwise completely opaque).

Type classes

Example providing an equivalence relation:

trait Eqv[A] {
  def eqv(a:A, b:A):Boolean
  def neqv(a:A, b:A):Boolean = !eq(a, b)
}

object IntHasEqv extends Eqv[Int] {
  def eqv(a:Int, b:Int) = a == b
}

object FloatHasEqv extends Eqv[Float] {
  def eqv(a:Float, b:Float) = a == b
}

object Eqv {
  implicit val intHasEqv = IntHasEqv
  implicit val floatHasEqv = FloatHasEqv
}
    

Type classes

Example using the equivalence relation:

def contains[A:Eqv](a:A, as:List[A]) = {
  val e = implicitly[Eqv[A]]
  as.foldLeft(false) {
    (b, x) => b || e.eqv(x, a)
  }
}
    

Kris Nuttycombe
"Typeclasses are just the Strategy pattern, with kinds."

Type classes

There is a bit more that you will want:

  1. EqvOps[A] to provide infix operators (e.g. ===)
  2. Eqv.apply[A] to access implicit param
  3. Instances for more types (e.g. ByteHasEqv)
  4. Implicit views?

Type classes

OK, I'm convinced we should use specialized type classes.

Is that it?

Type classes

OK, I'm convinced we should use specialized type classes.

Is that it?

Well...

Type classes

The last issue involves infix operators.

Each use of an infix operator (e.g. +) is resolved via an implicit, and creates a new temporary object:

implicit val n:Numeric[Int] = ...

a + b * c // allocates 2 objects, as:

new NumericOps(a)(n).+(new NumericOps(b)(n).*(c))

num.plus(a, num.times(b, c)) // no allocations
    
    
  

Type classes

I wrote a compiler plugin to remove the NumericOps[A] instantiation and call methods directly on Numeric[A].

The result was a 5x speed up when using Int!

Numeric

Beyond the challenges specialized type classes pose, numeric type classes have their own issues:

  1. Literals
  2. Anticipating new types
  3. Convenience vs safety
  4. Generic types versus numeric tower
  5. Operator names
  6. Inconsistency between types
  7. Commutativity vs OO
  8. Division

Numeric

When expressing mathematical concepts in code, it's important to try to minimize the noise.

def addOne[T:Numeric](t:T) =
  t + Numeric[T].one

def addN[T:Numeric](t:T, n:Int) =
  t + Numeric[T].fromInt(n)
    

(Can be solved if you're willing to add a second type parameter.)

Numeric

When thinking of Int or Double it's easy to imagine all numeric types supporting a relatively large range of operations.

In practice, this often doesn't work.

For example, complex numbers are unordered. Many number-like things lack a sign (e.g. intervals).

Sometimes even equality is hard
(e.g. endomorphism rings)!

Numeric

Tensions around which methods should be included also include tension over which types can properly implement a method.

For instance, can Long implement div or sqrt properly?

Numeric

The approach we have taken in Spire is to support very fine-grained type classes in the spire.algebra package as well as the coarser type classes in spire.math.

Basically, we err on the side of strictness for the former, and permissiveness for the latter.

Ring Hierarchy

  • Ring
  • EuclideanRing
  • Field

Group Hierarchy

  • Semigroup
  • Monoid
  • Group

Eq Hierarchy

  • Eq
  • Order

Odds and Ends

  • Signed
  • NRoot

Numeric

In contrast to the previous algebraic type classes (which do no inheritance outside their respective hierarchies) our implementations of Integral, Fractional and Numeric satisfies all the numeric properties one usually wants:

Numeric

One of the things that has helped motivate the design of Spire were our attempts to implement new useful numeric types. Here are some types that Spire provides (which have tested and stretched the type class hierarchies):

Numeric

In addition to expanding the realm of available numeric types, this approach has the potential to completely remove the penalty of generic code.

(Nota bene: This currently requires specialized value classes, a compiler plugin, or similar.)

testdirect (ms)new (ms)old (ms)
infix-adder-int0.91.3 1.43x16.619.00x
infix-adder-long1.91.6 0.87x24.312.93x
infix-adder-float4.13.6 0.88x16.6 4.03x
infix-adder-double3.53.8 1.07x18.0 5.14x
array-total-int1.31.1 0.90x19.015.20x
array-total-long1.61.9 1.15x23.114.23x
array-total-float3.33.5 1.08x16.6 5.12x
array-total-double4.03.4 0.84x18.9 4.72x
testdirect (ms)new (ms)old (ms)
find-max-int3.92.5 0.65x17.4 4.48x
find-max-long1.52.1 1.42x21.014.00x
find-max-float3.85.3 1.40x18.3 4.87x
find-max-double5.55.3 0.95x16.3 2.95x
insertion-sort-int2.01.6 0.81x14.4 7.19x
insertion-sort-long1.41.4 1.00x14.510.55x
insertion-sort-float1.61.8 1.08x19.311.85x
insertion-sort-double1.12.1 1.89x14.112.56x
testdirect (ms)new (ms)old (ms)
increment-int10.012.3 n/a
increment-int20.00.1 n/a12.8 n/a
increment-int30.00.6 n/a12.9 n/a
increment-int40.012.8 n/a

Numeric

Performance is so important because as soon as a generic implementation is slow enough, people will avoid it, and we lose many of the benefits.

Numeric

I hope that I have persuaded you that it's worth trying to create (and consume) libraries which support generic numeric computation, either using Spire or something else.

Spire is available at https://github.com/non/spire.

The End

Thanks for listening!