MapReduce服务 MRS-Yarn-client模式提交Spark任务时ApplicationMaster尝试启动两次失败:原因分析

时间:2025-02-12 15:01:24

原因分析

  1. Driver端异常:
    16/05/11 18:10:56 INFO Client: client token: N/Adiagnostics: Application application_1462441251516_0024 failed 2 times due to AM Container for appattempt_1462441251516_0024_000002 exited with  exitCode: 10For more detailed output, check the application tracking page:https://hdnode5:26001/cluster/app/application_1462441251516_0024 Then click on links to logs of each attempt.Diagnostics: Exception from container-launch.Container id: container_1462441251516_0024_02_000001
  2. 在ApplicationMaster日志中,异常如下:
    2016-05-12 10:21:23,715 | ERROR | [main] | Failed to connect to driver at 192.168.30.57:23867, retrying ... | org.apache.spark.Logging$class.logError(Logging.scala:75)2016-05-12 10:21:24,817 | ERROR | [main] | Failed to connect to driver at 192.168.30.57:23867, retrying ... | org.apache.spark.Logging$class.logError(Logging.scala:75)2016-05-12 10:21:24,918 | ERROR | [main] | Uncaught exception:  | org.apache.spark.Logging$class.logError(Logging.scala:96)org.apache.spark.SparkException: Failed to connect to driver!at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:426)at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:292)…2016-05-12 10:21:24,925 | INFO  | [Thread-1] | Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) | org.apache.spark.Logging$class.logInfo(Logging.scala:59)

    Spark-client模式任务Driver运行在客户端节点上(通常是集群外的某个节点),启动时先在集群中启动AppMaster进程,进程启动后要向Driver进程注册信息,注册成功后,任务才能继续。从AppMaster日志中可以看出,无法连接至Driver,所以任务失败。

support.huaweicloud.com/trouble-mrs/mrs_03_0112.html