株式会社サイバーセキュリティクラウド
株式会社サイバーセキュリティクラウド

NEWS

ニュース一覧 /技術ブログ

2020.04.18 技術ブログ

[Introduce-Paper]-CODDLE/-Code-Injection-Detection-With-Deep-Learning

What's up, everybody!
Today, I want to introduce an interesting paper related to Code Injection. Detecting this notorious cyber attack is a crucial challenge to realize safe cyberspace. At present, so many web sites have been attacked using the Code Injection technique. In this article, I explain one of the interesting strategies to detect an injection attack with a machine learning model.

What is the Code Injection?

Code Injection is one of the most famous threats categories for the cybersecurity world. The attacks in this category feed code such as SQL, Javascript into a web application. These attacks are called SQL Injection and Cross-Site Scripting.

Abstract

The paper CODDLE: Code-Injection Detection With Deep Learning gives us useful knowledge for catching Code Injection attacks by artificial intelligence. Mainly CODDLE provides us insight regarding preprocess. They say this approach can achieve greater performance than the one without preprocess.

Strategy

The crucial point of this research is how to make input string more readable for a machine learning model. To achieve this target, they apply two preprocesses: Removing noise and Symbolize

Removing Noise

The most popular approach to enhance Neural Network detection ability is removing randomness. A raw web access log contains so many randomnesses such as a digit, name and so on.

Symbolize

The basic concept for this operation is that the goal of a well-trained neural network is to understand the role of a specific symbol or operator. To achieve this, the paper proposes replacing these symbols by code. The target for this swapping is a symbol, expression, programming language operator and so on. What's more, they apply coding with not only single value. The suggested strategy uses pair of values for encoding. One is for raw string part and the other is a code, which represents category for raw string potion. The respective values are,

Code Category Example
0 Operators AND, UNION, SELECT, FROM
1 Expressions =, )
2 Escape Symbols `

Figure 5

Encoding Examples

Here I'd like to show you some encoding samples.

SQL Injection

Output
Raw String ` and 1=0) union all
Remove Randomness ` and = ) union all
Encoding (1,2),(1,0),(1,1),(2,1),(2,0),(3,0)
Output
Raw String SELECT column1, columns2, column3 FROM tablename
Remove Randomness SELECT , , FROM
Encoding (5,0),(3,1),(3,1),(10,0)

XSS

Output
Raw String <img src=1 href=1 onerror="javascript:" alert(1)">
Remove Randomness < img src = href = onerror = " javascript : " alert ( ) " > < / img >
Encoding (1,0),(4,1),(4,1),(4,1),(5,0),(4,1),(2,1),(10,0),(3,1),(2,1),(3,1),(3,1),(6,1)

Evaluation

Setting

They have conducted experiments for SQL Injection and XSS. The dataset for respective attacks is SQL Payload Dataset and XSS Payload Dataset. The algorithm they used is Convolutional Neural Network.

Attack Type Dataset Data Source
SQL Injection SQL Payload Dataset https://github.com/SuperCowPowers/data_hacking/tree/master/sql_injection/data
XSS XSS Payload Dataset https://github.com/payloadbox/xss-payload-list

Result

Figure 7

Figure 9

Figure 10

According to the results in the above table, it looks like the suggested preprocessing method is superior to the one without preprocessing.

Comment

The results do seem to be positive. But I think this method has some weaknesses. The suggested encoding method requires domain-specific knowledge and it can only cover Known expression. So I imagine if the language supports a new function or something and the method can be used for an injection attack, the suggested method cannot detect these kinds of attacks. So in my opinion, the strategy offered by the paper is suitable for detecting a well-known attack. On the other hand, it is unsuitable for finding unknown attacks such as zero-day. But the idea of using coding is interesting.

Reference

CODDLE: Code-Injection Detection With Deep Learning
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8835902

Author

Yusuke Sasaki
Machine Learning Researcher and Developer
Cyber Security Cloud, Inc.

戻る