MAPREDUCE服务 MRS-Executor进程Crash导致Stage重试:问题
问题
在执行大数据量的Spark任务(如100T的TPCDS测试套)过程中,有时会出现Executor丢失从而导致Stage重试的现象。查看Executor的日志,出现“Executor 532 is lost rpc with driver,but is still alive, going to kill it”所示信息,表明Executor丢失是由于JVM Crash导致的。
JVM的关键Crash错误日志,如下:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (sharedRuntime.cpp:834), pid=241075, tid=140476258551552
# fatal error: exception happened outside interpreter, nmethods and vtable stubs at pc 0x00007fcda9eb8eb1