If you have a server that is part of the Hadoop cluster, then you probably want your Hadoop daemons to run at startup. If you have done some Linux system administration, then you have probably encounter systemctl command here and there. This command is part of the Systemd, used for bootstrapping and managing system processes. In this short post, I’ll quickly go show you how I did it in one of my servers.
Hadoop Environment Variables
Before we begin, I assume you have some sort of script to setup Hadoop environment variables for your shell environment. I installed mine in /etc/profile.d/Hadoop.sh:
Note how these points to my Hadoop’s installation directory. Yours could be different, so if you haven’t setup yours, feel free to create your own and change the paths in the snippet above.
Now that we have that out of the way, let’s start creating systemd scripts.
Hadoop Systemd Scripts
If you were to start Hadoop manually, then you probably execute the following executables:
start-dfs.sh start-yarn.sh
These executables are made available due to the /etc/profile.d/hadoop.sh script above. To make our life easier, let us encapsulate these executables in a single bash executable. This will help us when we write our systemd service script later on.
In addition to starting Hadoop daemons, this script can also stop Hadoop daemons. To start Hadoop daemons run:
hadoop-service.sh start
And to stop:
hadoop-service.sh stop
Now that we encapsulated our start/stop executables in a single bash executable, we can proceed to write our systemd script.
Create a file in /etc/systemd/system/hadoop.service:
Notice how ExecStart and ExecStop run our script with the proper argument. I also set the User/Group to the owner of my hadoop executables, otherwise systemd will use root which cause errors. My script rests on /opt/hadoop, yours could be different so change yours accordingly.
Starting Systemd
Now that we have our systemd script, we can now start our Hadoop daemons from systemd:
sudo systemctl start hadoop.service # Start service.
To run Hadoop daemons at startup:
sudo systemctl enable hadoop.service
Finally, to stop your daemons:
sudo systemctl stop hadoop.service
That’s it! I hope this makes things easier for you.
Perfect! Just what I was looking for