A series on Phantom - Part 6: The new compact DSL and Primitive engine
Flavian
Flavian Scala Developer
Flavian is a Scala engineer with many years of experience and the author of phantom and morpheus.

  • Historically, phantom had the unfortunate feature of being overly verbose, and over the years people in various capacities have pointed this out to us. There were many limitations in the underlying implementation, some because of Scala and others because of the maturity of frameworks that solve certain problems in more elegant ways was non existent.

    Passing types to indexes

    One of the things that we have managed to completely eliminate is the need to pass in column types to the indexing modifiers, such as PartitionKey, PrimaryKey, Index and ClusteringOrder. This was unnecessarily verbose and it always lead to inelegant repetitive code.

object id extends UUIDColumn(this) with PartitionKey[UUID]


Thanks to a series of improvements in the underlying implicit machinery in Phantom, we were able to completely eliminate this requirement, starting with Phantom 2.0.0, and we also made it backwards incompatible.

object id extends UUIDColumn(this) with PartitionKey


It may not same such a drastic improvement at first glance, but it required significant fiddling and it brings a touch of boilerplate reduction.

Passing the Table and Record types manually to collection types

One of the other longstanding unnecessarily verbose things in phantom historically has been the need to manually pass in three type parameters to the collection columns, or four in the case of map columns.

import com.outworkers.phantom.dsl._

case class ExampleRecord(
  id: UUID,
  list: List[String],
  set: Set[Int],
  map: Map[Int, Int]
)

abstract class ExampleTable extends CassandraTable[ExampleTable, ExampleRecord] {
  object id extends UUIDColumn(this) with PartitionKey
  object list extends ListColumn[ExampleTable, ExampleRecord, String](this)
  object set extends SetColumn[ExampleTable, ExampleRecord, Int](this)
  object map extends MapColumn[ExampleTable, ExampleRecord, Int, Int](this)
}


If it looks familiar, we can only thank you for sticking with phantom for so long and bearing with the more problematic parts of the framework. As of phantom 2.0.0, we have been able to reduce the DSL code to be able to automatically infer the Table and Record type parameters.

abstract class ExampleTable extends CassandraTable[ExampleTable, ExampleRecord] {
  object id extends UUIDColumn(this) with PartitionKey
  object list extends ListColumn[String](this)
  object set extends SetColumn[Int](this)
  object map extends MapColumn[Int, Int](this)
}

A small but meaningful gain for your typing efforts. Now onto more interesting updates.

The compact DSL: Enter fully automated macros inference in 2.9.x

From the above example, it's obvious there are certain things which were still unnecessarily repetitive for the DSL, namely the this argument needing to be constantly passed around. This is deeply tied to the internal structures of the DSL, but it should not be exposed in such an inelegant way. We've listened, and executed.

As of Phantom 2.9.x

abstract class ExampleTable extends Table[ExampleTable, ExampleRecord] {
  object id extends UUIDColumn with PartitionKey
  object list extends ListColumn[String]
  object set extends SetColumn[Int]
  object map extends MapColumn[Int, Int]
}

Tada! No more nasty this argument being passed around. But it gets even better. Mixing in Table instead of CassandraTable was a neat way for us to try and not introduce completely backwards incompatible changes and preserving the old DSL for non intrusive changes. We've added a deprecation notice around the entire old DSL and nowadays you should opt to use Table instead. A neat little trick is that Table also mixes in RootConnector for you, so you won't need to mixin or extend RootConnector yourself.

Columns aren't even required anymore

As the entire inference and parsing mechanism at the binary level of the protocol has moved from TypeCodec and CodecRegistry to a far far more powerful and performant compile time macro driven mechanism, columns in phantom serve almost no purpose anymore other than schema inference and the ability to give you a neat way to express schemas that are very readable to others.

But you could just as well write the above table as:

abstract class ExampleTable extends Table[ExampleTable, ExampleRecord] {
  object id extends UUIDColumn with PartitionKey
  object list extends Col[List[String]
  object set extends Col[Set[Int]]
  object map extends Col[Map[Int, Int]]
}


By default, only Map, Set and List have direct translations, which means you will not be able to use arbitrary collections, as Phantom can't know how to translate the semantics of that collection types into something Cassandra doesn't natively understand. But the real benefit is extremely powerful, and to fully exploit it, let's have a look at more complex use cases, such as columns of complex types.

Let's say we want to bring a tuple into the mix, or an option, or better yet, a derived primitive. Now it's trivial:

case class CustomType(value: String)

object CustomType {
  implicit val customPrimitive = Primitive.derive[CustomType, String](_.value)(CustomType.apply)
}

abstract class ExampleTable extends Table[ExampleTable, ExampleRecord] {
  object id extends UUIDColumn with PartitionKey
  object list extends Col[List[String]
  object set extends Col[Set[Int]]
  object map extends Col[Map[Int, Int]]
  object tuple extends Col[(String, String, Double)]
  object optString extends Col[Option[String]]
  object customCol extends Col[CustomType]
 
}
  1. Of course you can opt to use TupleColumn but internally that's nothing more than an alias for Col. The beauty is that we can remove entire mapping logic code that used to happen in the fromRow method and instead rely on custom primitives for your types.

    We still offer a set of handy column type aliases for those who like to be more explicit and readable in their DSL logic.

    - Column
    - OptionalCol
    - OptionalColumn
    - ListColumn
    - SetColumn
    - MapColumn
    - EnumColumn
    - OptionalEnumColumn
    - TupleColumn
    - CustomColumn

    This is of course on top of the existing set of variations for primitive types, such as UUIDColumn/OptionalUUIDColumn and so on. The only place where a specific column type is required remains TimeUUIDColumn because the type is not enough to figure out which version of UUIDs to use sadly, and we need to know exactly otherwise we cannot tell Cassandra what to expect.

    Dependency on CodecRegistry and TypeCodec has been removed

    As of Phantom 2.9.x, custom types can now be used transparently everywhere across the framework, including prepared statements and so on. We used to rely on the Java Driver for parsing the binary Cassandra result back to a more meaningful Java type, and Phantom would then convert that Java type to a Scala one.

    However, this was extremely problematic, as two layers of indirection were required to perform this, and the downside was it you would be unable to use a derived primitive in any complex type or for certain operations, such as prepared statements.

    Not only that, but it turns out over 90% of the parsing time in the Java driver itself is spent figuring out what TypeCodec to use, which is extremely extremely bad for performance. After a significant number of users have come to us with this concern in using UDT type support in phantom-pro, we decided it's time to roll our own.

    Primitives now natively handle marshalling/un-marshalling types from and to Cassandra

    As a result, Primitives have drastically evolved to cope with new requirements and to offer a 100% performance boost in native parsing performance. Using collections, tuples, UDTs and derived primitives are now twice as fast if not more compared to the Java Driver, an enormous performance gain at scale.

    Not only that, but it allows for very neat little tricks. For instance, Phantom can actually print out prepared statements properly for debugging purposes with the real bound values. This is because the bind method is now very clever and summons an implicit macro to fully derive the appropriate encoding at compile time, so all your complex types get converted to bytes using compile time pre-generated logic.

    Phantom 2.9.x series comes with a drastic change in how the underlying driver handles serialisation and de-serialisation of types to bytes. The beauty of the achivement is made almost invisible by the no backwards compatibility policy we try extremely hard to adhere to. What this means for you is simply that no code changes will be required to leverage the enormous speed gains in query execution you should experience.

    Performance in the old days and why CodecRegistry and TypeCodec are extremely dangerous

    Before 2.9.x, phantom just like anything Cassandra driver used to rely on the underlying Datastax driver and the dreaded CodecRegistry to figure out how to talk to Cassandra about Scala types. In plain English, the job of CodecRegistry is to take an instance of a known type, such as Date, and serialise it to bytes that could be sent across the wire. The same registry "was in charge" of parsing those bytes back to something your application would understand, or back to Date in our case, when you would read from Cassandra.

    Ok, this sounds fairly straightforward, so what's the catch? In one word, reflection. The CodecRegistry relied very heavily on reflection to do its job, which as it turns out is extremely costly. How costly? It turns out 80% of the time is spent simply figuring out what the correct codec is for a particular type. This is obviously enormously expensive, and at scale it can very strongly interfere with the performance of your applications.



    The new macro engine 

    As of phantom 2.9.x, all primitive types in phantom are pre-generated with macros at compile time, meaning no lookup of any kind whatsoever will be necessary at runtime, and instead we can jump directly to dealing with the I/O and marshalling/un-marshalling of types. That's a 90% reduction in computation time you will get for free when you upgrade to phantom 2.9.x!!!

    A glimpse into the not so distant future

    All the above are nothing more than incremental steps for some ambitious plans in Phantom 3.0.x, namely to completely remove the need for a mapping DSL, or at least leave it optional. Instead, we are going to provide an advanced set of macro annotations that can natively expand into the mapping DSL invisibly and at compile time.

    So how will phantom tables look like? They won't, you can simply annotate record types.

import com.outworkers.phantom.dsl._

case class ExampleRecord(
 @primary id: UUID,
 @seconary list: List[String],
  set: Set[Int],
  map: Map[Int, Int]
)

query[ExampleRecord].where(_.id eqs id).one()

All the advanced scenarios you can think of will be made available through a new and comprehensive set of annotations. Bad old Hibernate memories? Fear not, this will not use any kind of reflection or runtime mechanism and we promise it won't trigger 100 queries you never wanted to trigger.But it will entirely remove the need to have a mapping DSL if you want to, and you can instead define a database layer directly from your domain models. We expect to make this available later this year and we are working hard on important milestones and intermediary steps to bring this to life.

Related articles