数据湖探索 DLI-使用Spark作业访问DLI元数据:步骤4:编写代码

时间:2024-12-05 10:45:40

步骤4:编写代码

编写DliCatalogTest程序创建数据库、 DLI 表和OBS表。

完整的样例请参考Java样例代码,样例代码分段说明如下:

  1. 导入依赖的包。
    import org.apache.spark.sql.SparkSession;
  2. 创建SparkSession会话。

    创建SparkSession会话时需要指定Spark参数:"spark.sql.session.state.builder"、"spark.sql.catalog.class"和"spark.sql.extensions",按照样例配置即可。

    • Spark2.3.x版本
      SparkSession spark = SparkSession
                      .builder()
                      .config("spark.sql.session.state.builder", "org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder")
                      .config("spark.sql.catalog.class", "org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog")
                      .config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension")
                      .appName("java_spark_demo")
                      .getOrCreate();
    • Spark2.4.x版本
      SparkSession spark = SparkSession 
                       .builder() 
                       .config("spark.sql.session.state.builder", "org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder") 
                       .config("spark.sql.catalog.class", "org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog") 
                       .config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension") 
                       .config("spark.sql.hive.implementation","org.apache.spark.sql.hive.client.DliHiveClientImpl")
                       .appName("java_spark_demo") 
                       .getOrCreate();
    • Spark3.1.x版本
      SparkSession spark = SparkSession
                      .builder()
                      .config("spark.sql.session.state.builder", "org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder")
                      .config("spark.sql.catalog.class", "org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog")
                      .config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension")
                      .appName("java_spark_demo")
                      .getOrCreate();
    • Spark3.3.x版本
      SparkSession spark = SparkSession
                 .builder()           
                 .config("spark.sql.session.state.builder", "org.apache.spark.sql.hive.DliLakeHouseBuilder")           
                 .config("spark.sql.catalog.class", "org.apache.spark.sql.hive.DliLakeHouseCatalog")           
                 .appName("java_spark_demo")           
                 .getOrCreate();   
  3. 创建数据库。
    如下样例代码演示,创建名为test_sparkapp的数据库。
    spark.sql("create database if not exists test_sparkapp").collect();
  4. 创建DLI表并插入测试数据。
    spark.sql("drop table if exists test_sparkapp.dli_testtable").collect();
    spark.sql("create table test_sparkapp.dli_testtable(id INT, name STRING)").collect();
    spark.sql("insert into test_sparkapp.dli_testtable VALUES (123,'jason')").collect();
    spark.sql("insert into test_sparkapp.dli_testtable VALUES (456,'merry')").collect();
  5. 创建OBS表。如下示例中的OBS路径需要根据步骤2:OBS桶文件配置中的实际数据路径修改。
    spark.sql("drop table if exists test_sparkapp.dli_testobstable").collect();
    spark.sql("create table test_sparkapp.dli_testobstable(age INT, name STRING) using csv options (path 'obs://dli-test-obs01/testdata.csv')").collect();
  6. 关闭SparkSession会话spark。
    spark.stop();
support.huaweicloud.com/devg-dli/dli_09_0176.html