Research
Data Governance Innovation Lab adheres to the win-win development concept and welcomes cooperation with experts in academia and industry in the following research areas. For any queries, contact us at longjiang4@huawei.com.
Research
Data Governance Innovation Lab adheres to the win-win development concept and welcomes cooperation with experts in academia and industry in the following research areas. For any queries, contact us at longjiang4@huawei.com.
-
Traditional data analysis is based on specific service requirements, including data integration, governance, development, and analysis. The future is an era of data-driven innovation. Mining data value and new service scenarios from massive data through uncertain and random data exploration behavior will become the norm. Therefore, we are exploring the random and informative intelligent data exploration platform to help customers discover value.Traditional data analysis is based on specific service requirements, including data integration, governance, development, and analysis. The future is an era of data-driven innovation. Mining data value and new service scenarios from massive data through uncertain and random data exploration behavior will become the norm. Therefore, we are exploring the random and informative intelligent data exploration platform to help customers discover value.
-
Factors of AI such as feature vectorization, confidence, and probability pose new requirements on data computing and storage. The collision of vector calculation and statistical analysis can guide exploration for the next-generation of big data computing.Factors of AI such as feature vectorization, confidence, and probability pose new requirements on data computing and storage. The collision of vector calculation and statistical analysis can guide exploration for the next-generation of big data computing.
-
Intelligent data quality detection and repair, association, entity merging, sampling, and comprehensive profilingIntelligent data quality detection and repair, association, entity merging, sampling, and comprehensive profiling
-
Federated metadata management of data assets of public cloud, private cloud, and local data sources; tens of millions of metadata and their relationships, and millisecond-level query performance; unstructured metadata governance, and fuzzy retrieval and recommendation of images, video, and text; real-time metadata system of a data lake for unified metadata management of a big data cluster with more than 20,000 nodesFederated metadata management of data assets of public cloud, private cloud, and local data sources; tens of millions of metadata and their relationships, and millisecond-level query performance; unstructured metadata governance, and fuzzy retrieval and recommendation of images, video, and text; real-time metadata system of a data lake for unified metadata management of a big data cluster with more than 20,000 nodes
-
Full-link security governance: algorithms for various GDPR-compliant data classification and masking scenarios, including data labeling and watermarkingFull-link security governance: algorithms for various GDPR-compliant data classification and masking scenarios, including data labeling and watermarking
-
Intelligent data quality algorithms: abnormal data detection and repair algorithm,entity merging algorithm,and data column association algorithm; higher than 90% accuracy and recall rate for all datasets; high-performance data quality engine: TB-level data quality in seconds and distributed memory cache and automatic scaling.Intelligent data quality algorithms: abnormal data detection and repair algorithm,entity merging algorithm,and data column association algorithm; higher than 90% accuracy and recall rate for all datasets; high-performance data quality engine: TB-level data quality in seconds and distributed memory cache and automatic scaling.
-
Model-driven intelligent data pipeline construction and data asset generationModel-driven intelligent data pipeline construction and data asset generation
-
Multiple computing engines, such as Hive, Spark, HBase, and MySQL, implementing cross-region and cross-engine scheduling and optimization, and improving performance by over 10 times compared with open-source Rheem and CalciteMultiple computing engines, such as Hive, Spark, HBase, and MySQL, implementing cross-region and cross-engine scheduling and optimization, and improving performance by over 10 times compared with open-source Rheem and Calcite
-
Cross-region data resource scheduling, cross-public cloud and HCS hybrid cloud data resource scheduling, and AI operator scheduling; concurrent scheduling of millions of nodes during peak hoursCross-region data resource scheduling, cross-public cloud and HCS hybrid cloud data resource scheduling, and AI operator scheduling; concurrent scheduling of millions of nodes during peak hours
-
Intelligent industry module recommendation on visualized screens: intelligent template recommendation based on users' industry background; smart assistance optimization on visualized screens: intelligent one-click optimization (intelligent color matching and layout) through machine learning; scenario-based visualized modeling and development platforms, such as 3D city and 3D campus, as well as device-edge-cloud big data input and visualized interaction and presentationIntelligent industry module recommendation on visualized screens: intelligent template recommendation based on users' industry background; smart assistance optimization on visualized screens: intelligent one-click optimization (intelligent color matching and layout) through machine learning; scenario-based visualized modeling and development platforms, such as 3D city and 3D campus, as well as device-edge-cloud big data input and visualized interaction and presentation