How Open Source Development Differ

Introduction

I currently worked for a data warehouse company. My job there involves working on an open source software (kibana) that talks to our closed source version of elasticsearch (talks to our data warehouse). After being around 1 year with the company, I realize I often switch to two types of mindset when switching between the two projects. This mindset is due to the nature of the problem that is unique to open source as opposed to close source project.

Open Source Development

At the time of this writing, my job involved developing on kibana, an open source project, to suite our UI needs. Before this, I’ve worked for 2 years in University that maintains their own Moodle project, another open source project. Both of these open source projects have the same sets of problems when being worked on, all of which we discuss here.

Need for Resourceful

Let me first disperse the “pretentious” vibe of this discussion. Unlike close source development, open source development have a constant need to reverse engineer the thought process of the original writer(s) of the project. To change a certain behavior, you don’t have the luxury of having the knowledge in hand since you are not the creator, nor you likely have direct access to the primary people who created the project. This is very different from a close source development, where you usually have access to the more senior folks who created the project (unless he/she immediately quit and you’ve been hired for emergency). Due to the lack of intellectual resource, you yourself/yourselves have to be resourceful.

Periodic Merge from Upstream

This is one of the many positives of open source project. You are not alone. In a popular project, there is an army out there fixing bugs and adding new features. We take advantage of these by merging upstream changes to our local master branch.

But there’s no free lunch. No you have to appease the all glorious version control (e.g. git) from conflicts. You will never avoid these conflicts, but we should minimize them to resolve as little as possible, thus making the merging from upstream less of a nightmare.

Separate Your New Changes

To reduce the number of conflicts, we:

  1. Create plugin for new feature.
  2. Hacks are separated in a new directory.
  3. Have an insignia like comment on hacks that can’t be separated on a new file.

(1) Creating a new plugin often means creating a new directory. This is not always the case, sometimes, we have to hack the core code to implement the plugin, but these hacks should be as little as possible. A new directory for the new feature means merging upstream a smoother process, as opposed to creating these new features right into the code.

(2) Often, you have to make new methods/functions/class to be used in the core code. These new methods/functions/class should be in a new file as much as possible. This measure does the same thing of avoiding conflicts during a merge. But we still have to use these new methods/functions/class in the core code somehow so…

(3) Wrap the sections you modified with some consistent code. For instance, our re-brand of kibana is called sonark, so we wrapped sections we modified with comments. These comments are prepended with sonark. For instance:

// SonarK: Change blah blah blah

This trivial thing helps during conflict resolution. You would have an easier way to tell where the change come from, whether to drop it or not during a conflict resolution.

Stepping on Eggshells

This is a point already elaborated on the above, but yes. Open source development requires us stepping on eggshells. Touch the core code as little as possible to avoid conflicts as much as possible. Setting these semi-formal systems of dealing with open source code will take you a long way.

Purist be Warned

Whatever your coding philosophy is. Whether you hold your dear OOP principles dearly, or you treat your Clean Code book as a bible, open source code will insult you. And all you can do is stare.

There have been multiple times where a code is begging to be refactored to a text book example of OOP code, but you simply can’t touch it unless you meant to hack something. This is because, the cost of refactoring is simply too much when merging from upstream. Easily outweighing “intellectual debt” that we new Computer Science bachelor graduates would’ve love to pay for immediately (damn student loans!!!).

Conclusion

These thoughts are not taught in academia much, but I think they are critical. I’ve recently realized along these years that the type of folks who are good at creating things are not often the best in open source (it can’t hurt though). But the type of people who are good at these types of problems are those that loves tinkering things, whatever they may be.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.