https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
frank@ZZHPC:~$ sudo apt install sshfrank@ZZHPC:~$ sudo apt install pdsh
What is pdsh?
PDSH (Parallel Distributed Shell) is a high-performance, multithreaded remote shell client that allows you to execute commands on multiple remote hosts simultaneously.
While a standard SSH command connects you to one machine at a time, pdsh is designed for cluster management or sysadmins who need to run the same task across 10, 100, or even 1,000 servers at once.
Key Features
-
Parallel Execution: It uses a "sliding window" (fanout) of threads to run commands in parallel rather than one-by-one (serially).
-
Host Grouping: You can target hosts using ranges (e.g.,
web[01-10]) or groups defined in files like/etc/genders. -
Thread Safety: It is designed to be highly efficient, handling timeouts on specific nodes without hanging the entire process.
-
Companion Tools: It usually comes with
pdcp(parallel copy), which lets you copy files to multiple machines at once.
Important Prerequisites
To use pdsh effectively, you usually need:
-
SSH Keys: You should have passwordless SSH access to the target machines.
-
RCMD Module: On Ubuntu, you often need to specify the ssh module by adding
-R sshto your command, or by setting the environment variablePDSH_RCMD_TYPE=ssh.