Struggling and salvation dwells where man and his foe exist

In virtual reality, everything runs following the same pattern. We all see peace and harmony on top of Internet each day, though turbulent and swirling water currents surge in the beneath. Hundreds of the rough and tumbles happen all in no time, so fast for you to perceive. How many times you remember the joy when you paid for all the items in your shopping cart on Black Friday, but forget a horrible truth: thousands of E-commerce retailers were deceived and screwed up by illegal hackers. Everyday when you are surfing the Internet, switching between your Twitter and Quora accounts, or enjoying talking of political matters with others, you have no idea  that your accouts are under the risk of being hacked.

image.png

Internet security has become a top priority. Untrusted networks are unavailable. The Internet has experienced rapid development in the past 10 years. The technology at that time was already very mature, escorting the Internet. However,  there are people never giving up breaking through the security holes. The two sides of the attack and defense have evolved with the development of the security technology.

Today, malicious fraud attacks on the Internet are more diversified and more covert, and the difficulty of defense is increasing day by day. Traditional techniques that rely on expert policy rules have begun to fail. Companies such as Alibaba Cloud and Tencent Cloud intend to seek solutions in the field of artificial intelligence algorithms. In 2019, the Huawei Cloud Algorithm Innovation Lab also used its profound technology accumulation to develop intelligent algorithms to combat with Internet fraud attacks. In this page, we will introduce the Internet security defense capabilities built on intelligent algorithms. Algorithm Innovation Lab expects to start from here, keep improving, and return a safe Internet ecology to the world.

1. Device fingerprint algorithm

Algorithm challenges and technical solutions

In the human society, once a homicide case takes place, it’s hard to find the criminals, and even harder to catch them without knowing their weapons. Though the police can certainly take records on every already-recognized knives, like Parry Dagger, Mark I Trench Knife, Jambiya, etc., managing all of them and tracing whichever you want is almost impossible.

Making use of the device hardware and software information and system runtime characteristics, the device fingerprint tracking technology generates a unique device ID for each device, and continuously tracks a device during system operation. It is the basic technology for user modeling and anti-fraud. However, this technology faces a series of challenges in practical use. For example, due to software and hardware upgrades, operating environment changes, human modifications, etc., device characteristics would change; meanwhile, in different application scenarios, the uniqueness of each feature would also change; in addition, because of the large search scale, it demands a high requrement for the retrival performance of the algorithm. The proposed solution effectively combines the AI algorithm and the local sensitive hash technology. First, the AI feature weight learning algorithm is introduced to automatically adapt to the uniqueness change of the feature. Then the local sensitive hash algorithm is used to greatly narrow the search range. Furthermore, regarding the business characteristics of mobile devices, , a combined sampling hash method is designed to achieve accurate and fast tracking of fingerprints of large-scale mobile devices.

image.png

Current Progress

The proposed mobile device fingerprint tracking algorithm achieves a high accuracy rate on the trust-worthy laboratory data set, and outperforms Alipay for end-to-side defends, and precedes NetEase in the uniqueness of fingerprints. The core algorithm has been delivered to the product team, and patents have been submitted.

2. Fraud community mining algorithm based on risk relationship
Algorithm challenges and technical solutions

The malicious gang has penetrated inside, and it is difficult to find out how deep it has penetrated.

The malicious actions of the black industry on the Internet usually show features of gangs. Independent registration, login, and transaction actions are often related through key elements (users, devices, IP, etc.). From the perspective of graph algorithms, these associations present the characteristics of communities. We abstract the login data into a "risk relationship" data structure, taking users, devices, and IP as nodes, and relationships as edges. Based on the classic Leuven algorithm with newly designed weights, and the statistical indicators of usage behavior, time series, space and other similarities, the outlier behavior of malicious community users compared with normal users are better captured. Then malicious users are gathered into communities by the community detection algorithm. After the community gathering, we filter out the suspected malicious communities through effective statistical characteristics of the community and classification models, realizing the end-to-end mining of malicious communities based on login data.

image.png

Current progress

Jointly designed by Algorithm Innovation Lab and the security service product team, a fraud community mining algorithm based on risk relationships was incubated.We improved the classic community discovery algorithm, and  carried out innovation and optimization aimed at the Huawei cloud login scenario, and eventually help business parties dig out of Malicious gangs from seemingly unrelated data. The technical solution has been verified to obtain good results on the login data of Huawei cloud users, and a large number of malicious user gangs that have not been discovered before have been discovered for the business side.

3. Human-machine recognition algorithm family
Algorithm challenges and technical solutions

Algorithms and technical solutions have been well known to the public, and identifying enemies and our own people needs another way.

Human-machine identification algorithms are mainly used in machine traffic identification in scenarios such as website security, operations, and transactions. They are in the form of an intelligent verification code service for customers to use on the public cloud. Most of the existing verification codes are implemented by simple rules or models, which have poor user experience and low defense capabilities, and is unable to cope with the changable high-level humanoid script attacks.

In order to solve the problems of existing human-machine recognition algorithms and meanwhile maintain an extremely high user experience, we propose to increase the defense and recognition capabilities of advanced human-like machine script attacks from the following two aspects. On the one hand, the visual adversarial augmentation technology is applied on the verification code picture. By adding a small amount of noise disturbance, the recognition rate of the verification code picture of the neural network can be reduced. In addition, the noise selection model is regularly updated to increase the diversity of interference and improve the defense capabilities against common attacks. On the other hand, accurate identification of real user behavior and malicious script attacks is realized based on analysis of biological behavior and environmental characteristics. There are a lot of non-quantitative behaviors such as random pauses and other non-quantitative behaviors of the user's mouse dragging trajectory and mobile terminal finger sliding trajectory on the PC platform, and the sliding at the initial stage is fast, and decelerates when approaching the end point or retreats after sliding out. Through the innovative concentric ring method, trajectory data in different areas is collected for segment analysis and multi-dimensional feature extraction, and then fed into AI models such as random forests to achieve high-precision human-machine recognition.

image.png

image.png

image.png

Implementation status

The human-machine recognition algorithm has achieved a high detection rate of machine behavior under the laboratory data of multiple types of verification codes, while retaining excellent user experience. The service covers a variety of human-like attack scenarios and has the differentiated competitiveness in the defense of advanced human-like sliding behavior attacks. The current technical achievements have been implemented in the Huawei Cloud Intelligent Verification Code service, including sliding verification codes, non-inductive verification codes, and point-and-click verification codes, etc., and many external customers have access to them. This service has received commendation and thanks from the product team.

image.png

image.png