Sam's Notes | Sam Blog

梦想还是要有的,万一实现了呢

0%

hadoop HA集群搭建

主要内容

  • hadoop HA集群搭建
  • 前期准备
  • zookeeper
  • hadoop

前期准备

软硬件规划

  • 硬件

    所有服务器统一 4核CPU 16G内存

  • 软件
    CentOS 7.9
    OpenJDK 11
    hadoop 3.3.1

  • 目录结构
    安装目录和数据目录分离,软件都安装在/home/hadoop,数据都存储在/data/hadoop,系统安装时建议/data单独分区。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# hadoop软件安装目录
/home/hadoop/
├── hadoop-3.3.1
├── source
└── apache-zookeeper-3.7.0

# 数据目录
/data
└── hadoop
├── dfs
│ ├── data
│ └── name
├── hdfs
├── history
│ ├── done
│ └── done_intermediate
├── tmp
├── var
├── yarn
│ └── nm
└── zk
├── data
├── journaldata
└── logs

查看系统

1
2
3
uname -a

cat /etc/redhat-release

yum源

使用阿里云的yum源
所有服务器都需操作

1
2
3
4
5
6
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo

yum clean all
yum makecache

JDK

所有服务器都需操作

1
2
3
4
5
6
# delete 默认 JDK
yum -y remove java

# install OpenJDK 11
yum install -y java-11-openjdk-devel.x86_64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 切换 java版本
alternatives --config java
共有 2 个提供“java”的程序。

选项 命令
-----------------------------------------------
*+ 1 java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64/jre/bin/java)
2 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.12.0.7-0.el7_9.x86_64/bin/java)

按 Enter 保留当前选项[+],或者键入选项编号:2

# 确认版本
java -version


ls -la /usr/bin/java
/usr/bin/java -> /etc/alternatives/java

ls -la /etc/alternatives/java
/etc/alternatives/java -> /usr/lib/jvm/java-11-openjdk-11.0.12.0.7-0.el7_9.x86_64/bin/java

建立目录

所有服务器都需操作

1
2
3
4
5
6
7
8
9
10
11
12
mkdir -p /data/hadoop/zk/data
mkdir -p /data/hadoop/zk/journaldata
mkdir -p /data/hadoop/zk/logs

mkdir -p /data/hadoop/dfs/data
mkdir -p /data/hadoop/dfs/name
mkdir -p /data/hadoop/history/done
mkdir -p /data/hadoop/history/done_intermediate
mkdir -p /data/hadoop/yarn/nm
mkdir -p /data/hadoop/yarn/staging
mkdir -p /data/hadoop/tmp
mkdir -p /data/hadoop/var

环境变量

按实际情况 所有服务器都需操作
vim /etc/profile

1
2
3
4
5
6
7
8
9
10
# hadoop env -----------
export ZK_HOME=/home/hadoop/apache-zookeeper-3.7.0
export HADOOP_HOME=/home/hadoop/hadoop-3.3.1
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.12.0.7-0.el7_9.x86_64/
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
# export HIVE_HOME=/home/hadoop/hive-3.1.2
# export SPARK_HOME=/home/hadoop/spark-3.1.2-bin-hadoop3.2
export CLASSPATH=$JAVA_HOME/lib:$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
# export PATH=$PATH:$ZK_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin/:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HIVE_HOME/bin
export PATH=$PATH:$ZK_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin/

立即生效source /etc/profile

设置主机名称

所有服务器都需操作

  • hostname
    按规划设置hostname, 重启后生效

    1
    2
    hostnamectl set-hostname xxx

  • hostname与ip映射
    可修改本机/etc/hosts/或在本地dns服务器上设置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    # hadoop
    192.168.5.20 hadoop-master-a.chinauh.cn
    192.168.2.26 hadoop-master-b.chinauh.cn
    192.168.5.21 hadoop-data-1.chinauh.cn
    192.168.5.22 hadoop-data-2.chinauh.cn
    192.168.5.23 hadoop-data-3.chinauh.cn


    # zookeeper
    192.168.5.21 zk01.chinauh.cn
    192.168.5.22 zk02.chinauh.cn
    192.168.5.23 zk03.chinauh.cn

建立hadoop用户

所有服务器都需操作

1
2
3
4
5
6
7
useradd hadoop
passwd hadoop
chmod u+w /etc/sudoers
vim /etc/sudoers #在root ALL=(ALL)ALL下添加 hadoop ALL=(ALL) ALL
chmod u-w /etc/sudoers

chown -R hadoop:hadoop /data/hadoop/

关闭防火墙

所有服务器都需操作
测试环境简单操作关闭防火墙, 正式环境看情况

1
2
3
4
5
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

systemctl stop firewalld
systemctl disable firewalld

以下操作都是在hadoop账户进行!!! 以下操作都是在hadoop账户进行!!! 以下操作都是在hadoop账户进行!!! 以下操作都是在hadoop账户进行!!!

免密登录

  • 在hadoop-master-a
    上操作
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    ## 1) 生成公私钥
    ssh-keygen -t rsa

    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    The key fingerprint is:
    ....


    ## 2)将在/home/hadoop/.ssh目录下生成公钥id_rsa.pub和私钥id_rsa将生成的秘钥,写入authorized_keys上面
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    chmod 0600 ~/.ssh/authorized_keys
  • 其他4台机器上运行
    1
    2
    3
    ## 1) 生成公私钥
    ssh-keygen -t rsa
    ssh-copy-id -i hadoop-master-a.chinauh.cn #可以看到hadoop-master-a上authorized_keys的变化
  • 同步hadoop-master-a的authorized_keys给其他机器
    一定在hadoop-master-a用hadoop用户操作
    1
    2
    3
    4
    scp /home/hadoop/.ssh/authorized_keys hadoop-master-b.chinauh.cn:/home/hadoop/.ssh/
    scp /home/hadoop/.ssh/authorized_keys hadoop-data-1.chinauh.cn:/home/hadoop/.ssh/
    scp /home/hadoop/.ssh/authorized_keys hadoop-data-2.chinauh.cn:/home/hadoop/.ssh/
    scp /home/hadoop/.ssh/authorized_keys hadoop-data-3.chinauh.cn:/home/hadoop/.ssh/

zookeeper

下载

按规划,在hadoop-data-1 上 下载 zookeeper ,官网

安装

  • 解压
    1
    2
    3
    4
    5
    6
    7
    8
    9
    tar xvf apache-zookeeper-3.7.0-bin.tar.gz
    mv apache-zookeeper-3.7.0-bin apache-zookeeper-3.7.0

    #结果如下 ==>
    /home/hadoop
    ├── hadoop-3.3.1
    ├── source
    ├── spark-3.1.2-bin-hadoop3.2
    └── apache-zookeeper-3.7.0
  • 目录
    建立zookeeper的数据、日志目录
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    /data
    └── hadoop
    ├── dfs
    │ ├── data
    │ └── name
    ├── hdfs
    ├── history
    │ ├── done
    │ └── done_intermediate
    ├── tmp
    ├── var
    ├── yarn
    │ └── nm
    └── zk
    ├── data
    ├── journaldata
    └── logs
  • 配置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    $ cd /home/hadoop/apache-zookeeper-3.7.0/conf/
    $ cp zoo_sample.cfg zoo.cfg
    $ ls
    configuration.xsl log4j.properties zoo.cfg zoo_sample.cfg

    $ vim zoo.cfg
    # 编辑zoo.cfg内容如下
    dataDir=/data/hadoop/zk/data/
    dataLogDir=/data/hadoop/zk/logs
    server.1=zk01.chinauh.cn:2888:3888
    server.2=zk02.chinauh.cn:2888:3888
    server.3=zk03.chinauh.cn:2888:3888
    # the port at which the clients will connect
    clientPort=2181
    quorumListenOnAllIPs=true

  • 分发配置
    1
    2
    3
    4
    5
    6
    7
    scp -r /home/hadoop/apache-zookeeper-3.7.0 hadoop-data-2.chinauh.cn:/home/hadoop/
    scp -r /home/hadoop/apache-zookeeper-3.7.0 hadoop-data-3.chinauh.cn:/home/hadoop/

    #分别在hadoop-data-1, hadoop-data-2, hadoop-data-3上面配置myid文件
    [hadoop@hadoop-data-1 hadoop]$ echo 1 > /data/hadoop/zk/data/myid
    [hadoop@hadoop-data-2 hadoop]$ echo 2 > /data/hadoop/zk/data/myid
    [hadoop@hadoop-data-3 hadoop]$ echo 3 > /data/hadoop/zk/data/myid
  • 启动
    分别在hadoop-data-1, hadoop-data-2, hadoop-data-3 启动zookeeper
    1
    2
    3
    [hadoop@hadoop-data-1 hadoop]$ /home/hadoop/apache-zookeeper-3.7.0/bin/zkServer.sh start
    [hadoop@hadoop-data-2 hadoop]$ /home/hadoop/apache-zookeeper-3.7.0/bin/zkServer.sh start
    [hadoop@hadoop-data-3 hadoop]$ /home/hadoop/apache-zookeeper-3.7.0/bin/zkServer.sh start

hadoop

配置

登录主机hadoop-master-a, 配置文件主要在 /home/hadoop/hadoop-3.3.1/etc/hadoop 目录里

环境变量 hadoop-env.sh

1
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.12.0.7-0.el7_9.x86_64/

hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定hdfs的nameservice为chinauh,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>chinauh</value>
</property>
<!-- chinauh下面有两个NameNode,分别是 nn1 ,nn2 -->
<property>
<name>dfs.ha.namenodes.chinauh</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.chinauh.nn1</name>
<value>hadoop-master-a.chinauh.cn:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.chinauh.nn1</name>
<value>hadoop-master-a.chinauh.cn:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.chinauh.nn2</name>
<value>hadoop-master-b.chinauh.cn:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.chinauh.nn2</name>
<value>hadoop-master-b.chinauh.cn:50070</value>
</property>


<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://zk01.chinauh.cn:8485;zk02.chinauh.cn:8485;zk03.chinauh.cn:8485/chinauh</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/zk/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.chinauh</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!-- DataNode进程死亡或者网络故障造成DataNode无法与NameNode通信,NameNode不会
立即把该节点判定为死亡,要经过一段超时时间。HDFS默认的超时时间是10分钟+30秒,如果定
义超时时间为timeout,则其计算公式为:
timeout = 2 * heartbeat.recheck.interval + 10 * dfs.heartbeat.interval -->
<property>
<name>heartbeat.recheck.interval</name>
<!-- 单位:毫秒 -->
<value>2000</value>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<!-- 单位:秒 -->
<value>1</value>
</property>
<!-- 在日常维护hadoop集群过程中会发现这样一种现象:某个节点由于网络故障或者
DataNode进程死亡,被NameNode判定为死亡,HDFS马上自动开始数据块的容错拷贝,
当该节点重新加入到集群中,由于该节点的数据并没有损坏,导致集群中某些block的
备份数超过了设定数值。默认情况下要经过1个小时的时间才会对这些冗余block进行清理。
而这个时长与数据块报告时间有关。DataNode会定期将该节点上的所有block信息报告给
NameNode,默认间隔1小时。下面的参数可以修改报告时间 -->
<property>
<name>dfs.blockreport.intervalMsec</name>
<value>10000</value>
<description>Determines block reporting interval in milliseconds.</description>
</property>
<!--指定磁盘预留多少空间,防止磁盘被撑满用完,单位为bytes -->
<property>
<name>dfs.datanode.du.reserved</name>
<value>10240000</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-data-1:50090</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs
persistently.
</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its
blocks.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>need not permissions</description>
</property>
<!--NameNode有一个工作线程池用来处理客户端的远程过程调用及集群守护进程的调用。处理程序数量越多意味着要更大的池来处理来自不同DataNode的并发心跳以及客户端并发的元数据操作。对于大集群或者有大量客户端的集群来说,通常需要增大参数dfs.namenode.handler.count的默认值10。设置该值的一般原则是将其设置为集群大小的自然对数乘以20,即20logN,N为集群大小。
如果该值设的太小,明显的状况就是DataNode在连接NameNode的时候总是超时或者连接被拒绝,但NameNode的远程过程调用队列很大时,远程过程调用延时就会加大。症状之间是相互影响的,很难说修改dfs.namenode.handler.count就能解决问题,但是在查找故障时,检查一下该值的设置是必要的-->
<property>
<name>dfs.datanode.handler.count</name>
<value>35</value>
<description>The number of server threads for the datanode.</description>
</property>
<!--读超时时间:dfs.client.socket-timeout。默认值1分钟。
写超时时间:dfs.datanode.socket.write.timeout。默认8分钟。-->
<property>
<name>dfs.client.socket-timeout</name>
<value>600000</value>
</property>
<property>
<!--这里设置Hadoop允许打开最大文件数,默认4096,不设置的话会提示xcievers exceeded错误-->
<name>dfs.datanode.max.transfer.threads</name>
<value>409600</value>
</property>
<!---块大小-->

<property>
<name>dfs.blocksize</name>
<value>134217728</value>
<description>node2文件系统HDFS块大小为128M</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

</configuration>


core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://chinauh</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>该属性值单位为KB,131072KB即为默认的 64M</description>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>zk01.chinauh.cn:2181,zk02.chinauh.cn:2181,zk03.chinauh.cn:2181</value>
</property>
</configuration>

yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<configuration>
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>

<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop-master-b.chinauh.cn</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop-master-a.chinauh.cn</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zk01.chinauh.cn:2181,zk02.chinauh.cn:2181,zk03.chinauh.cn:2181</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/hadoop/yarn/nm</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8032</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8032</value>
</property>


<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8030</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8030</value>
</property>


<property>
<description>The http address of the RM1 web application.</description>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8088</value>
</property>
<property>
<description>The http address of the RM2 web application.</description>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8088</value>
</property>


<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8090</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8090</value>
</property>


<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8031</value>
</property>


<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>${yarn.resourcemanager.hostname.rm1}:8033</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>${yarn.resourcemanager.hostname.rm2}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12288</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.scheduler.minmum-allocation-mb</name>
<value>1024</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12288</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value>
</property>
<property>
<description>使用命令:[hadoop@hadoop-master-a ~]$ hadoop classpath 获取到</description>
<name>yarn.application.classpath</name>
<value>
/home/hadoop/hadoop-3.3.1/etc/hadoop:/home/hadoop/hadoop-3.3.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-3.3.1/share/hadoop/common/*:/home/hadoop/hadoop-3.3.1/share/hadoop/hdfs:/home/hadoop/hadoop-3.3.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-3.3.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-3.3.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-3.3.1/share/hadoop/yarn:/home/hadoop/hadoop-3.3.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-3.3.1/share/hadoop/yarn/*
</value>
</property>
</configuration>

注意 yarn.nodemanager.resource.memory-mbyarn.scheduler.minimum-allocation-mbyarn.scheduler.maximum-allocation-mb 根据服务器内存调整。

mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-data-1.chinauh.cn:10020</value>
<description>MR JobHistory Server管理的日志的存放位置</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-data-1.chinauh.cn:19888</value>
<description>查看历史服务器已经运行完的Mapreduce作业记录的web地址,需要启动该服务才行</description>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/data/hadoop/yarn/staging</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>${yarn.app.mapreduce.am.staging-dir}/done</value>
<description>MR JobHistory Server管理的日志的存放位置,默认:/mr-history/done</description>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>${yarn.app.mapreduce.am.staging-dir}/done_intermediate</value>
<description>MapReduce作业产生的日志存放位置,默认值:/mr-history/tmp</description>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

workers

在hadoop-master-a节点的workers文件内把localhost删除,加入

1
2
3
hadoop-data-1.chinauh.cn
hadoop-data-2.chinauh.cn
hadoop-data-3.chinauh.cn

分发

将/home/hadoop/hadoop-3.3.1拷贝到集群其他机器上面

1
2
3
4
$ scp -r /home/hadoop/hadoop-3.3.1 hadoop-master-b.chinauh.cn:/home/hadoop/
$ scp -r /home/hadoop/hadoop-3.3.1 hadoop-data-1.chinauh.cn:/home/hadoop/
$ scp -r /home/hadoop/hadoop-3.3.1 hadoop-data-2.chinauh.cn:/home/hadoop/
$ scp -r /home/hadoop/hadoop-3.3.1 hadoop-data-3.chinauh.cn:/home/hadoop/

启动

zookeeper上 格式化hadoop-ha目录

1
2
3
#在hadoop-master-a 上 格式化
[hadoop@hadoop-master-a hadoop]$ hdfs zkfc -formatZK

类似这样成功

1
2
3
4
5
6
7
#验证:检查zookeeper上是否已经有Hadoop HA目录,在任意一台zk节点上面
$ZK_HOME/bin/zkCli.sh -server zk01:2181
#在打开的zk终端shell中,输入
[zk: zk01:2181(CONNECTED) 2] ls /
[hadoop-ha, zookeeper]
[zk: zk01:2181(CONNECTED) 3] ls /hadoop-ha
[chinauh]

journalnode

启动namenode日志同步服务journalnode,所有ZooKeeper节点均启动,
journal 会监听 8485 端口, namenode -format 会连接此服务

1
2
3
4
5
6
7
8
9
10
11
12
[hadoop@hadoop-data-1 ~]$ $HADOOP_HOME/bin/hdfs --daemon start journalnode
WARNING: /home/hadoop/hadoop-3.3.1/logs does not exist. Creating.

[hadoop@hadoop-data-1 ~]$ jps
27429 Jps
8233 QuorumPeerMain
27372 JournalNode

# 其他zk节点
[hadoop@hadoop-data-2 root]$ $HADOOP_HOME/bin/hdfs --daemon start journalnode
[hadoop@hadoop-data-3 root]$ $HADOOP_HOME/bin/hdfs --daemon start journalnode

启动hadoop

  • 主namenode节点
    在主namenode节点hadoop-master-a格式化NAMENODE,并启动namenode
    1
    2
    [hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/hdfs namenode -format
    [hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/hdfs --daemon start namenode
    1
    2
    3
    4
    5
    # 验证
    [hadoop@hadoop-master-a ~]$ jps
    29461 NameNode
    29541 Jps

  • 备namenode
    在备namenode节点同步元数据,并启动namenode 服务,此前一定要先启动主namenode
    1
    [hadoop@hadoop-master-b ~]$ $HADOOP_HOME/bin/hdfs namenode -bootstrapStandby

    启动
    1
    [hadoop@hadoop-master-b ~]$ $HADOOP_HOME/bin/hdfs --daemon start namenode
    1
    2
    3
    4
    # 验证
    [hadoop@hadoop-master-b ~]$ jps
    24882 NameNode
    24966 Jps

ZKFC

在所有namenode节点上,启动DFSZKFailoverController

  • 主namenode
    1
    [hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/hdfs --daemon start zkfc
    1
    2
    3
    4
    5
    # 验证
    [hadoop@hadoop-master-a ~]$ jps
    1045 Jps
    984 DFSZKFailoverController
    462 NameNode
  • 备namenode
    1
    [hadoop@hadoop-master-b ~]$ $HADOOP_HOME/bin/hdfs --daemon start zkfc
    1
    2
    3
    4
    5
    # 验证
    [hadoop@hadoop-master-b ~]$ jps
    24882 NameNode
    25171 DFSZKFailoverController
    25212 Jps

datanode服务

  • 启动
    在集群任意节点
    1
    2
    [hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/hdfs --workers --daemon start datanode   #启动所有的datanode节点
    # $HADOOP_HOME/bin/hdfs --daemon start datanode启动单个datanode
    1
    2
    3
    4
    5
    6
    # 验证
    [hadoop@hadoop-data-3 etc]$ jps
    6050 JournalNode
    8579 Jps
    15398 QuorumPeerMain
    8471 DataNode
    DataNode进程已经启动。

yarn

  • 主resourcemanager

    1
    [hadoop@hadoop-master-b ~]$ $HADOOP_HOME/bin/yarn --daemon start resourcemanager
    1
    2
    3
    4
    5
    6
    # 验证
    [hadoop@hadoop-master-b ~]$ jps
    25393 ResourceManager
    24882 NameNode
    25171 DFSZKFailoverController
    25637 Jps
  • 备resourcemanager

    1
    [hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/yarn --daemon start resourcemanager
    1
    2
    3
    4
    5
    6
    # 验证
    [hadoop@hadoop-master-a ~]$ jps
    1331 Jps
    984 DFSZKFailoverController
    1261 ResourceManager
    462 NameNode
  • nodemanager

    1
    2
    # 启动NodeManager
    [hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/yarn --workers --daemon start nodemanager
    1
    2
    3
    4
    5
    6
    7
    8
    # 验证
    #在NodeManager节点中
    [hadoop@hadoop-data-3 etc]$ jps
    6050 JournalNode
    8724 NodeManager
    15398 QuorumPeerMain
    8471 DataNode
    8844 Jps

验证

其他

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#1 datanode 报告
[hadoop@hadoop-data-3 ~]$ $HADOOP_HOME/bin/hdfs dfsadmin -report


#2 重新格式化,
#先删除logs下所有文件后,
#然后删除/data/hadoop自身及所属目录 除zk,
#然后重新建回/data/hadoop等8个目录
#然后 $HADOOP_HOME/bin/hdfs namenode -format

#3 获取一个NameNode节点的HA状态
[hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn1
active
[hadoop@hadoop-master-a ~]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn2
standby


#4 获取resourcemanager的HA状态
[hadoop@hadoop-master-b ~]$ $HADOOP_HOME/bin/yarn rmadmin -getServiceState rm2
standby
[hadoop@hadoop-master-b ~]$ $HADOOP_HOME/bin/yarn rmadmin -getServiceState rm1
active

#5 同步journalnode与namenode
某个节点部署namenode,reboot了,然后启动时发现namenode没有起来
异常如下
` Journal Storage Directory root= /data/hadoop/zk/journaldata/chinauh; location= null not formatted ; journal id: chinauh `
原因:大概为journalnode保存的元数据和namenode的不一致

解决: 在 master namenode 上执行 `$HADOOP_HOME/bin/hdfs namenode -initializeSharedEdits`
使得journalnode与namenode保持一致。再重新启动namenode就没有问题了。

参考

Hadoop 最新 文档

hadoop3.3.1搭建过程(HA)

YARN 内存参数终极详解