Automate Hadoop Cluster with Systemd

If you have a server that is part of the Hadoop cluster, then you probably want your Hadoop daemons to run at startup. If you have done some Linux system administration, then you have probably encounter systemctl command here and there. This command is part of the Systemd, used for bootstrapping and managing system processes. In this short post, I’ll quickly go show you how I did it in one of my servers.

Hadoop Environment Variables

Before we begin, I assume you have some sort of script to setup Hadoop environment variables for your shell environment. I installed mine in /etc/profile.d/Hadoop.sh:

Note how these points to my Hadoop’s installation directory. Yours could be different, so if you haven’t setup yours, feel free to create your own and change the paths in the snippet above.

Now that we have that out of the way, let’s start creating systemd scripts.

Hadoop Systemd Scripts

If you were to start Hadoop manually, then you probably execute the following executables:

start-dfs.sh
start-yarn.sh

These executables are made available due to the /etc/profile.d/hadoop.sh script above. To make our life easier, let us encapsulate these executables in a single bash executable. This will help us when we write our systemd service script later on.

In addition to starting Hadoop daemons, this script can also stop Hadoop daemons. To start Hadoop daemons run:

hadoop-service.sh start

 

And to stop:

hadoop-service.sh stop

Now that we encapsulated our start/stop executables in a single bash executable, we can proceed to write our systemd script.

Create a file in /etc/systemd/system/hadoop.service:

Notice how ExecStart and ExecStop run our script with the proper argument. I also set the User/Group to the owner of my hadoop executables, otherwise systemd will use root which cause errors. My script rests on /opt/hadoop, yours could be different so change yours accordingly.

Starting Systemd

Now that we have our systemd script, we can now start our Hadoop daemons from systemd:

sudo systemctl start hadoop.service # Start service.

To run Hadoop daemons at startup:

sudo systemctl enable hadoop.service

Finally, to stop your daemons:

 sudo systemctl stop hadoop.service 

That’s it! I hope this makes things easier for you.

One comment on “Automate Hadoop Cluster with Systemd

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.