Cover Story
Big Data in Education: Transformative Models in On-site/Online Integration

Liu Qi, Associate Professor and doctoral mentor at the Computer College of China University of Science and Technology, member of the CCF Big Data Expert Committee, member of the CAAI Machine Learning Technical Committee, and prominent member of the Youth Promotion Committee of the Chinese Academy of Sciences. Liu focuses research on data mining, knowledge discovery, machine learning, and their applications.

How Education Has Developed

From the ancient past all the way up to modern times, significant changes in education have often accompanied new milestones in the human saga. This is being demonstrated yet again today with the continued deepening of IT into work, life, and learning. Education is no longer restricted to the on-site model limited to transfer of knowledge from the 'screen' of the blackboard or from pen and paper. The paradigm has shifted to computer-assisted, intelligence augmented, and big-data enabled models. The new models integrate emerging technologies and take advantage of the vast repositories of knowledge available on the Internet. Technology-assisted teaching brings content to life in more abundant forms and mediums. Multimedia courseware has long been used in China's classrooms, and various networked and distance education models have helped many gain the skills they need for improved careers and living. While there have been many advances and added conveniences in accessing learning resources, many pain points still exist, especially in basic education. Lack of quality teachers, imbalances in education resources, immense pressure on students, ineffective study habits, and a slew of other issues make introducing new approaches and tech all the more important.

In recent years, the IT revolution has added substantial momentum to socioeconomic development. The technological convergence has also brought about explosive growth in data. The accumulation, mining, and targeted application of data plays an important role in taking big data from a concept and transforming it into high-value collateral for such industries as finance, e-commerce, health care, transportation, meteorology, and education. Intelligence-enabled service systems based on big data analysis technologies are adding tremendous conveniences and new levels of efficiency to people's lives. Big data analysis has become indispensable to scientific research, decision-making, and many other elements in operation and new discovery.

The new utilities have also injected new vitality into education. Big data in education refers to the sets of data generated and collected throughout the whole process of delivery, testing, and retention. These data sets possess great potential value in the development of education. China has a vast base from which to collect data in the applications for education. According to the 2016 Statistical Bulletin on Educational Development in China, there are more than 510,000 schools in the country, most having Internet access. With more than 260 million students and more than 140 million online learners, digital campuses, online learning platforms, and intelligence-enabled teaching and auxiliary systems are popping up everywhere, which in turn adds to the volumes of data that must be stored, managed, and made use of. Driven by big data, new transformations are taking place in education. In the 2017 release of the China Basic Education Big Data Development Blue Book, the authors argue that education is undergoing a somewhat 'unnoticed revolution' even as new tech and concepts are being applied. Unnoticed, perhaps, because the changes are not so evident in the shift from the computer-assisted paradigm to the data-driven one, even with the much higher efficiencies, new-found levels of grafted intelligence, and precise industry adoption.

Applying big data to education helps instructors make better decisions while allowing others to replicate high-quality approaches, planning, and content. The new pillar tech is used to collect, store, manage, and obtain resources while helping to better integrate education resources. With the abilities gained from the utilities, abilities to analyze and quantify students' personalities, their capabilities, and levels of knowledge, comprehensive student profiles can be generated to provide thousands of personalized and adaptive elements in the delivery of education resources to yield a plan truly tailored to the needs and objectives of the learners. Big data is also being used to effectively coordinate education resources and apply intelligent service applications into the education matrix, providing greater commercial potential. Education-related agencies and departments at all levels attach tremendous importance to big data and how it can drive informatization strategies in the sector. The New Generation Artificial Intelligence Development Plan released by the State Council emphasizes accelerating the construction of intelligent education, applying smart tech to better cultivate the knowledge workforce and reform teaching methods, and build a new education system that includes interactive learning. In this context, education is gradually shifting from offline to online models featuring high degrees of refinement and replicability.

Big Data: The New Driver in Education

Big data is driving many aspects of smart adoption in education. For instructors, teaching solutions can be adjusted to the in-the-moment, in-class needs of the student body. For learners, the new tech helps develop personalized learning plans to assist students in selecting the most appropriate learning path. Smart utilities in education are also being used in teaching media and products, such as education robots and intelligent-enabled education platforms. These are just a few of the examples of how big data analytics is being applied to the vast scope of education. The research in smart education includes five main categories: teaching, learning, examination, evaluation, and management.

In the teaching aspect, the main focus is on how to improve delivery methods so students can learn more effectively, including intelligent search powered on deep learning technologies, intelligent problem-solving based on enhanced learning practices, intelligent-enabled auxiliary learning systems benefiting from the knowledge base in neural research, and generation of personalized teaching schemes to suit the specifics of the learner's objectives. The learning element focuses on how to better understand and master the behaviors of students and recommend tailored content. For example, predicting learning behaviors from the massive volumes of data from MOOC platforms, mining of behavior track patterns with deep learning tech, provisioning of personalized learning content, and generation of learning path recommendations. The examination part of the research and adoptions involves using test papers as a tool for measuring learning. Large amounts of information must be extracted. At present, the main research focuses on the automated labeling of the test questions, the difficulty of the test questions, automatic generation of the test questions, intelligent groupings, and batch-based scoring of test papers to refine the test instruments and processes. The segment is extremely helpful for teachers and students in better design of testing content and which content to focus on, helping avoid the malfeasance of rote learning. The evaluation segment involves applying intelligence-enabled utilities to assess knowledge retention and in turn help students gain better all-around mastery of their subjects. The representation of knowledge and capability from intelligent analysis, the static and dynamic cognitive diagnosis of students, and the generation of personalized cognitive diagnosis reports can help students better understand their learning practices and make timely adjustments in their approaches. At the same time, the application of tech in this element helps better assess the multidimensional knowledge of teachers, measure their professional abilities, and form accurate evaluation mechanisms for teachers and researchers. In terms of management, various platforms are generated based on big data application in education to assist learning institutions in managing the various aspects of learning. For example, the University of Electronic Science and Technology has developed a campus big data analysis and decision-making platform, and the University of Science and Technology of China is exploring ways to apply behavior prediction to help provide under-performing and economically challenged students with assistance.

In recent years, our team has focused on key scientific issues, such as the inability to assess student knowledge, imbalance in access to education resources, and the difficulties in making tailored recommendations. A series of research achievements have been made (see Figure 1) in terms of cognitive modeling, in-depth representation and analysis of education resources, and personalized learning and recommendation. Representative papers published in important international conferences and periodicals in the field of artificial intelligence and data mining include IJCAI2015, AAAI 2017, AAAI 2018, KDD 2018, KDD 2019, and ACM TIST. For example, a framework for depth representation in educational resources combined with an attention mechanism is proposed, which not only adopts unified modeling of content (such as text and images) for heterogeneous resources, but also designs an internal sequence relationship for automated learning of information resources (for example, semantic relationships) and identifies content that is most related to the predicted attribute. In terms of the difficulties in automatically annotating question resources, the depth representation framework improves precision by 25% compared with manual approaches and it is able to find the key statement for the question and improve the explanatory effect.

Summary

From the traditional teaching paradigm in the industrial age to the computing-assisted paradigm in the informatization era and now into the data-driven model, delivery of education resources has broken free from the confines of the classroom and entered the intelligence era. Teaching resources and learning plans are now being tailored to individual needs instead of assuming a one-style-fits-all approach. Many achievements have been made in applying dig data to education, in everything from instruction to knowledge retention assessment. However, many problems remain to be solved. One, the research on big data analysis technology currently focuses on single subjects (such as mathematics, foreign language, and literature) and specific sections (such as junior high school). It is difficult to understand data in other disciplines (such as physics and chemistry). Collecting data from other learning periods, such as early adolescence, is problematic. As a result, the tech is generally promoted across the entire scope of education with basic tuning to the age groups and disciplines applied. Breakthroughs in terms of applying refinements able to drill down to the particulars of the learners and the subjects is still needed to truly benefit instructors and students. Two, enhancements in the applicability and interoperability of intelligent utilities across the scope of smart education is needed. For example, when a student gets a question wrong, he or she would benefit from knowing why the answer is wrong and having access to the corresponding knowledge to study. At the same time, for teachers, it is very important to know where they need to change the teaching methodology and how to gear those changes to the needs of their classroom. At present, the research on big data in education has few generally defined frameworks, which requires discussion and consensus to get a clear roadmap. Third, mathematics, physics, and other disciplines have high requirements on logical reasoning capabilities. To make education services more intelligent, we must extract knowledge from data then integrate and associate the data in the context of logical reasoning. This relates to how to extract education knowledge from multi-source heterogeneous data, how to build knowledge bases and realize logic reasoning, and how to make decisions based on the knowledge base and reasoning.

The new pillar techs described by big data and artificial intelligence are solving many of the long-standing issues in access to education, stalled development, and lack of personalization in learning. In the revolution that is taking place in education, strong visions, hard tech, and cross-innovation thinking are all needed to forge ahead.