MAPREDUCE服务 MRS-Executor进程Crash导致Stage重试:问题

时间:2024-11-28 01:44:29

问题

在执行大数据量的Spark任务(如100T的TPCDS测试套)过程中,有时会出现Executor丢失从而导致Stage重试的现象。查看Executor的日志,出现“Executor 532 is lost rpc with driver,but is still alive, going to kill it”所示信息,表明Executor丢失是由于JVM Crash导致的。

JVM的关键Crash错误日志,如下:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (sharedRuntime.cpp:834), pid=241075, tid=140476258551552
#  fatal error: exception happened outside interpreter, nmethods and vtable stubs at pc 0x00007fcda9eb8eb1
support.huaweicloud.com/cmpntguide-lts-mrs/mrs_01_2017.html