For the past 10 months, I am fortunate enough to be assigned to work on a transpiler between elasticsearch and sonar (basically MongoDB syntax with our own proprietary extensions). By creating this transpiler, this allowed our organization to use a proprietary version of Kibana with our MongoDB like database system.
This experience has forced me to extract knowledge from various software engineering literature. From various software engineering courses, which introduced design by contract and a third-year functional programming course.
Why did I pair design by contract with functional programming? Why are these two concepts inseparable in transpilers? These questions were the questions that were answered when I went through those University courses and further cemented after going through this project. In this article, I will discuss how I used design by contract and functional programming in implementing a transpiler from Elasticsearch aggregations to our MongoDB implementation syntax. Hope you enjoy or able to suffer for a short while.
Elasticsearch Aggregation Query Example
Before we begin, I’ll start by introducing the Elasticsearch Aggregation syntax, specifically, with the filter aggregation to keep it simple. The example below is from elasticsearch filter aggregation documentation.
{ "aggs" : { "t_shirts" : { "filter" : { "term": { "type": "t-shirt" } }, "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } } }
This query can be summarized to the following:
- The root aggregation, is filter bucket aggregation, filtering for field type for value “t-shirt”.
- The sub-aggregation under the filter bucket aggregation is metric average of field price.
To summarize this further, we get the average of the field avg_price for all the documents that contain type with value “t-shirt”. A sample result, again from the official elasticsearch documentation:
{ "aggregations" : { "t_shirts" : { "doc_count" : 3, "avg_price" : { "value" : 128.33333333333334 } } } }
Another small note, elasticsearch includes doc_count for all aggregations by default.
This is not meant to be a thorough introduction of elasticsearch. Later on, we will use this simple example to elaborate on how I implemented the transpiler from elasticsearch to Mongo query. In the next section, we will show the equivalent Sonar Aggregation Query (our own extended Mongo Aggregation Query). Like writing test-driven development, this will give us a target output of our transpiler.
Mongo Query Equivalent
The following is a Sonar Aggregation Query (our own extended Mongo Aggregation Query) equivalent of the elasticsearch query above. We have many $operators that are special to our organization like the $itergroup.
[ { "$group": { "$iterGroup": { "_id": { "cond": { "if": { "$eq": [ "$type", "t-shirt" ] }, "then": "t_shirts", "else": null } } }, "doc_count": { "$sum": 1 }, "avg_price": { "$avg": "$price" } } } ]
We won’t delve further on how the Sonar/Mongo query above work. The point is there is an expected output we can work with.
Design By Contract And Transpilers
When I was first given the task, I don’t know where to begin. I inherited the project from someone else so there are transpiler codes here and there already for me to work from. After rummaging the code base for examples and start coding myself, one thing arose. I must design a formal specification for the methods/functions in Transpiler modules. How do these apply to a transfer from Elasticsearch to Mongo syntax?
The example above shows a filter aggregation bucket, but there are more bucket aggregations in elasticsearch. A class diagram for the aggregation bucket to mongo aggregation syntax compiler would look like the following:
There are more elasticsearch bucket aggregations but for simplicity, I only displayed three in our UML diagram.
Our UML of the transpilers shows that there is at least one method named transpile. It takes an esRootDoc, short for root elasticsearch query doc, and outputs a mongoRootDoc, short for root Mongo aggregation query doc. Here I already started to apply design by contract, ensuring that the transpile method takes a specific elasticsearch doc, which is the root aggregation query in our example above. To be clear, it is:
{ "t_shirts" : { "filter" : { "term": { "type": "t-shirt" } }, "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } }
We choose which transpiler module via a TranspilerFactory method. This is easy enough with elasticsearch since a query document can be first broken down into three main components:
-
- Label: This is where we put/nest the result of our aggregation in the return payload. In our case this is t_shirt.
- Aggregation Type: This is the type of aggregation that is being done in the label. In our case, we are doing a filter bucket aggregation inside t_shirt label.
- Nested Aggregations: These are the aggregations that are done with respect to (nested in) the aggregation. For us, for each bucket of the filter aggregation, we do an average metric aggregation under the label, avg_price.
With those main components in mind, we amend our Transpiler UML to the following:
Now, each AggregationBucketTranspiler instance contains the label attribute and a nestedAggregations attribute that lists the children aggregations. Furthermore, I added AggregationBucketTransiplerFactory that creates each instance as described in the following algorithm:
The middle section of the sequence diagram shows that AggregationBucketTranspiler uses AggregationBucketTranspilerFactory to generated nested aggregation buckets.
Strictly defining what a method/function should take in a parser is very critical. This might seem obvious, but I’ve seen implementations in which parameter properties are inconsistent causing complications for everyone.
Now that we have established the importance of design-by-contract when creating transpilers, we quickly touch on another important concept that is often a cause of problems.
Functional Functions And Transpilers
Design-by-contract allows for a strict definition of function/method parameters, eliminating guess work, making it easy for everyone. Another pet peeve of mine I encountered while working with transpilers, are functions/methods which relies on side-effects, that is, functions that are not functional. This adds another mental load on the developers. In addition to keeping track of what expected parameter and return values, we also have to keep track of possible changes to the parameter. This mental load is something that can be avoided and best be avoided by ensuring all functions are functional.
Conclusion
The last 10 months have been a great experience. I’m still deep with my work with our database transpiler, if not deeper. I’m very big with applying proper Software Engineering techniques but these two, design-by-contract and functional programming is what I deemed most important when building transpilers. I know it’s not an everyday developing problem for most, but I thought I’ll share my 2 cents on this niche world.