Как да хванем хакери във вашия код

Какво бихте направили, ако хакерите злоупотребяват с вашия софтуер в производството?

Това не е хипотетичен въпрос. Вероятно го правят в момента.

Може би си мислите за всички сигурни решения за дизайн, които сте направили, или за превантивни техники, които сте приложили, така че няма какво да се притеснявате.

Ако е така, това е чудесно - дори винаги да има неща, които се пренебрегват, винаги трябва да мислите за сигурността на вашата система.

Но има огромна разлика между предотвратяването на грешки в сигурността и прощаването на злонамерени опити.

Какво ще кажете да хванем и действаме срещу хакерите, които се опитват да проникнат в нашия софтуер? В тази публикация ще се опитам да ви дам практически и прости примери за улавяне на типично хакерско поведение във вашия код рано.

Защо да хващаме злонамерени опити?

Предотвратяването на грешки в сигурността не е ли достатъчно? Чувам ви да казвате: „Докато пиша защитен код, не ме интересува дали хакерите играят с моя солиден софтуер или не. И така, защо да ми пука за злонамерени опити? "

Нека първо отговорим на този валиден въпрос.

Донякъде сложен софтуер е трудно да се поддържа постоянно в безопасност. Повече сложност означава повече потенциални слабости, които хакерът да злоупотребява, докато проектирате, внедрявате, внедрявате или поддържате кода.

Просто погледнете номерата на CVE през годините. Много е:

Освен това, поради своята същност, грешката в сигурността не е просто обикновен елемент във вашето изоставане. Има някои неприятни последици, ако дадена уязвимост бъде експлоатирана: загуба на доверие, лоша репутация или дори финансова загуба.

Така че, най-добрите практики за сигурност като OWASP Application Security Verification Standard (ASVS) или Насоките за безопасно кодиране на Mozilla съществуват, за да помогнат на разработчиците да създадат защитен софтуер.

Тъй като обаче нови начини за заобикаляне на съществуващия контрол на сигурността или нови слабости се появяват почти ежедневно, около общността за сигурност има консенсус, че „Няма 100% сигурност“. Така че винаги трябва да бъдем нащрек и да реагираме на новините и подобренията в областта на сигурността.

Има и още нещо, което можем да направим, за да осигурим сигурен софтуер: да забележим хакери възможно най-рано, преди да направят нещо, което ние не очакваме или дори не знаем. Освен това, проследяването на тяхното злонамерено поведение за дълъг период от време ни прави по-активни.

Съществува популярно схващане за Центъра за оперативна сигурност (SOC) в тази насока - SOC са вид екип в организация, която е възложена на външни изпълнители или вътрешна. Тяхната работа е да наблюдават непрекъснато състоянието на сигурност на организацията. Те го правят, като откриват, анализират и реагират на инциденти с киберсигурност.

Екипите на SOC търсят ненормални дейности, включително аномалии в сигурността на софтуера. Идеята да забележат и да отговорят на успешна или неуспешна кибератака дава на организациите надмощие срещу заплахи, което в крайна сметка намалява времето за реакция на атаки чрез непрекъснато наблюдение.

SOC е силен само с богатия и качествен вход, който получава от различни източници на ИТ компоненти. Тъй като нашият софтуер също е важна част от инвентара, подходящите аларми за сигурност поради ненормално поведение, изпращани от нашия софтуер до екипите на SOC, са безценни.

Как да проверите за ненормално поведение

Ето редица проверки и контроли, които можем да приложим в нашия код, които разкриват злонамерено и ненормално поведение.

Преди да започнем, бих искал да подчертая, че тук не представям сложни решения като защитна стена за уеб приложения (WAF). Вместо това, аз просто ще се опитам да ви покажа, че прости условия, интелигентно обработване на изключения и подобни, малко или никакви действия във вашия код могат да ви помогнат да забележите ненормално поведение веднага щом възникнат.

Нека да копаем.

Нулева дължина или нулеви връщания

Първото действие, което можем да предприемем за откриване на злонамерено действие, е чрез проверка на агрегати с нулева дължина или нулеви връщания.

Ето един прост кодов блок, който илюстрира въпроса:

Receipt receipt = GetReceipt(transferId); if (receipt == null) { // what does this mean? // log, notify, alarm }

Тук се опитваме да получим достъп до получаването на определен трансфер, предоставен от нашите крайни потребители чрез transferIdпараметъра.

За да попречим на никого да има достъп до чужди разписки, нека приемем, че вътре в GetReceiptметода, нашият разработчик е достатъчно умен, за да провери дали transferIdнаистина принадлежи на текущия потребител.

Проверката на собствеността е добра добра практика за сигурност.

Нека по-нататък приемем, че по дизайн сме сигурни, че всеки трансфер трябва да има поне една свързана разписка, така че получаването на такава по време на изпълнение е подозрително. Защо? Тъй като получаването на празна разписка означава, че предоставеното transferIdне принадлежи на прехвърляне, изпълнено от текущия потребител.

С други думи, текущият потребител е предоставил подправен transferIdкод и чака да види съдържанието, ако това transferIdсе отнася до транзакция на някой друг.

И тъй като имаме подходящ контрол на собствеността, GetReceiptметодът връща празна или нулева разписка. Там трябва да предприемем някои действия за сигурност.

Няма да навлизам в подробности за действията по сигурността в тази публикация. Въпреки това регистрирането на сигурността и / или изпращането на подробни известия, системите за информация за защита и управление на събития (SIEM) са две от тях.

Ето още един пример за това как проверката на нулевата стойност ни позволява да се възползваме от злонамерен опит.

Имайте предвид, че ние имаме следните три крайни точки, ShowReceipt, Success, и Error:

// ShowReceipt endpoint if(CurrentUser.Owns(receiptId)) { Session["receiptid"] = receiptId; redirect "Success"; } else { redirect "Error"; }
// Success endpoint receiptId = Session["receiptid"]; return ReadReceipt(receiptId);
// Error endpoint return "Error";

Това е просто приложение, което показва съдържанието на разписката на потребителя.

В ShowReceipt, на първия ред е важен. Той проверява дали крайният потребител ни изпраща валиден, за receiptIdда видим съдържанието му. Без този контрол злонамерен потребител може да предостави такова receiptIdи да осъществи достъп до съдържанието.

Мястото на изявлението в третия ред е също толкова важно. Ако преместим този ред непосредствено преди оператора if, това няма да наруши нищо. Това обаче би създало същия проблем със сигурността, който се опитвахме да избегнем, като провери дали крайният потребител иска валидна разписка или не.

Моля, отделете малко време, за да сте сигурни, че разбирате защо е така.

Добре е, че поставихме този ред на правилното място и това създава още една възможност да забележим злонамерени опити. Тогава, в Successкрайната точка, какво означава, ако получим null receiptIdот Session?

Това означава, че някой се обажда на тази крайна точка, веднага след като е направил заявка за ShowReceiptкрайна точка с чужда receiptId. Дори ако са получили Errorпренасочване обратно поради проверката на собствеността!

Разбира се, с контрола, който имаме на първия ред, това е невъзможно.

И така, Successкрайната точка е приятно място да напишете запис в дневника за сигурност и да изпращате всякакви известия до нашите решения за наблюдение, когато получим нула receiptIdот Session.

// Success endpoint (Revisited) receiptId = Session["receiptid"]; if(receiptId == null) { // log, notify, alarm } return ReadReceipt(receiptId);

Обработка на целенасочени изключения

Exception handling is maybe the most important mechanism for developers to respond to any anomalous condition during the execution of the program.

Most of the time the main opportunity it provides is cleaning up resources that were borrowed such as file/network streams or database connections upon unexpected problems. This is a fail-safe behavior that lets us write more reliable programs.

In parallel we can effectively use runtime exceptions to notice malicious attempts towards our software.

Here are some popular sources of weakness where we can utilize related exceptions to notice fishy behavior:

 • Deserialization
 • Cryptography
 • XML Parsing
 • Regular Expression
 • Arithmetic Operations

The list is not complete, of course. And here I’ll go through only a few of these APIs.

Let’s start with Regular Expressions. Here’s a code block that applies a strict validation method on a user input:

if(!Regex.IsMatch(query.Search, @"^([a-zA-Z0-9]+ ?)+$")) { return RedirectToAction("Error"); }

The regular expression pattern used here is a solid whitelist one, which means it checks what is expected as an input. Not the other insecure way around, which is checking what is known to be bad.

Still, here’s a much secure version of the same code block:

if(!Regex.IsMatch(query.Search, @"^([a-zA-Z0-9]+ ?)+$", RegexOptions.Compiled, TimeSpan.FromSeconds(10))) { return RedirectToAction("Error"); }

This is an overloaded version of the IsMatch method of which the last argument is the key.

It enforces that the execution of the regular expression during runtime can not exceed 10 seconds. If it does, that means something suspicious is going on since the pattern used is not that complicated.

There’s an actual security weakness that might be used to abuse this pattern called ReDoS, though I won't go into the details of it here. But in short, an end-user can send the following string as the search parameter and make our back-end miserable, spending an awful amount of CPU power in vain.

Notice the quotation mark at the end (and don’t try this in production!):

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA!

The question is, what happens when the execution time actually exceeds 10 seconds?

The .NET environment throws an exception, namely RegexMatchTimeoutException. So, if we specifically catch this exception, we now have the opportunity to report this suspicious incident or do something about it.

Here’s the final code block to that end:

try { if(!Regex.IsMatch(query.Search, @"^([a-zA-Z0-9]+ ?)+$", RegexOptions.Compiled, TimeSpan.FromSeconds(10))) { return RedirectToAction("Error"); } } catch(RegexMatchTimeoutException rmte) { // log, notify, alarm }

Another important venue where we can utilize exceptions is XML parsing. Here’s an example code block:

XmlReader xmlReader = XmlReader.Create(input); var root = XDocument.Load(xmlReader, LoadOptions.PreserveWhitespace);

The input XML is fed into XmlReader.Create, and then we get the root element. Hackers can abuse this piece of code by providing some malicious XML files, which, when parsed by the above code, gives ownership of our servers to them.

Scary, right? The security bug is called XML External Entity (XXE) attack, and as with the Regular Expression exploit, I won't go into all the details here.

However, in order to prevent that super critical weakness, we ignore the usage of Document Type Definitions (DTD) through the XmlReaderSettings. So now, there’s no possibility of XXE security bugs anymore.

Here’s the secure version:

XmlReaderSettings settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Ignore; XmlReader xmlReader = XmlReader.Create(input, settings); var root = XDocument.Load(xmlReader, LoadOptions.PreserveWhitespace);

We can leave the code just like this and move on. However, if a hacker still tries to abuse this attack in vain, it's better that we can catch this behavior and produce an invaluable security alert:

try { XmlReaderSettings settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Ignore; XmlReader xmlReader = XmlReader.Create(input, settings); var root = XDocument.Load(xmlReader, LoadOptions.PreserveWhitespace); } catch(XmlException xe) { // log, notify, alarm }

Moreover, in order to prevent false positives, you can further customize the catch block by using the message content provided by the XmlException instance.

There’s a general programming best practice that denies using generic Exception types in catch blocks. What we have shown is also a good supporting case for this. Same goes with another best practice that denies using empty catch blocks, which is effectively doing nothing when an abnormal behavior occurs in our code.

Apparently though, instead of empty catch blocks, here we have a very solid opportunity to react to malicious attempts.

Normalization on Inputs

By definition, normalization is to get the simplest form of something. In fact, canonicalization is the term used for this purpose. But it is hard to pronounce, so, let's stick to normalization.

Of course, “the simplest form of something” is a little bit abstract. What do we mean by the “simplest form”?

It is always good to show by example. Here is a string:

%3cscript%3e

According to the URL encoding, this string is not in its simplest form. Because if we apply URL decoding on it, we get this one:

This is the simplest form of the original string according to URL encoding transformation standard.

How do we know that? We know it not because it is understandable to us now. We know it because if we apply URL decoding again, we will get the same string:

And that means URL decoding does not successfully transform it anymore. We hit the simplest form. Normalization can take more than one step, as originally the encoding might be applied more than once.

URL encoding is just one example of the transformation used for normalization, or in other words, decoding. HTML encoding, JavaScript encoding, and CSS encoding are other important encoding/decoding methods widely used for normalization.

Over the years, attackers find genuine techniques to bypass defense systems. And one of the most prevalent techniques they utilize is encoding. They use crazy encoding techniques on their original malicious inputs, in order to fool defenses around applications.

History is full of these demonstrations, and you can read the details of one of the most famous ones called Microsoft’s infamous IIS dotdot attack that took place in the early 2000s.

Since hackers rely on encoding techniques substantially when they are sending malicious inputs, normalization can be one of the most effective and easy ways to seize them.

Here is the rule of thumb: we recursively apply URL/HTML/CSS/JavaScript decoding to user input until the output no longer changes. And if the output is a different string than the original input, that means we may have a possible malicious request.

Here’s a simplified version of legendary OWASP ESAPI Java that implements this idea:

int foundCount = 0; boolean clean = false; while(!clean) { clean = true; // whatever codes you want; URL/Javascript/HTML/... Iterator i = codecs.iterator(); while (i.hasNext()) { Codec codec = (Codec)i.next(); String old = input; input = codec.decode(input); if (!old.equals(input)) { if (clean) { foundCount++; } clean = false; } } }

When the code block ends, if the value of foundCount is bigger or equal to 2, that means what? It means someone is sending multiple encoded input to our application, and the odds of this happening is really rare.

Normal users do not send multiple encoded strings to our application. There is a high probability that this is a malicious user. We have to log this event with the original input for further analysis.

The above mechanism, while part of the software itself, functions like a filter in front of the application. It runs on every untrusted input and gives us an opportunity to know about malicious attempts.

However, you may be suspicious about the additional delay this way of validation incurs. I understand if you don’t want to opt-in.

Here's another example of using normalization as a means to seize malicious attempts during file uploads or downloads. Consider the following code:

if (!String.IsNullOrEmpty(fileName)) { fileName = new FileInfo(fileName).Name; string path = @"E:\uploaded_files\" + fileName; if (File.Exists(path)) { response.ContentType = "image/jpg"; response.BinaryWrite(File.ReadAllBytes(path)); } }

Here we get a fileName parameter from our client, locate the image it points to, read, and present the content. This is a download example. It might also have been an upload scenario.

Nevertheless, in order to prevent the client manipulating the fileName parameter to their heart’s content, we utilize the Name property of the FileInfo class. This will only get the name part of the fileName, even if the client sends us anything other than what we expect (i.e. a file name with forged paths such as below):

../../WebSites/Cross/Web.config

Here the malicious client wants to read the contents of a sensitive Web.Config file by using our code. Getting only the file name part, we get rid of this possibility.

That is good but there is still something we can do:

if (!String.IsNullOrEmpty(fileName)) { string normalizedFileName = new FileInfo(fileName).Name; if (normalizedFileName != fileName) { // log, notify, alarm response = ResponseStatus.Unauthorized; } string path = @"E:\uploaded_files\" + fileName; if (File.Exists(path)) { response.ContentType = "image/jpg"; response.BinaryWrite(File.ReadAllBytes(path)); } }

We compare the normalized version of fileName with itself (the original input). If they differ, that means someone is trying to send us a manipulated fileName and we take appropriate action.

Normally the browser just sends the uploaded file name in its simplest form with no transformation.

For the sake of the argument, we may not even use the file name when the user uploads a file. We may be generating a GUID and use that instead.

Nevertheless, applying this control to the provided file name still matters, because hackers will definitely poke with that parameter no matter what.

Invalid Input Against Whitelists

Whitelisting is “accepting only what is expected”. In other words, if we come across some input that we do not expect, we reject it.

This input validation strategy is one of the most secure and effective strategies we have to this date. By using this strategy consistently throughout your software, you can close a lot of known and unknown venues that a malicious user can attack you.

This way of building a software is like building a closed castle with only thoroughly controlled doors opening outside, if that makes any sense.

OK, back to our topic.

Let’s analyze whitelisting with a simple scenario. Assume that our users have the freedom to choose their own, specific usernames when registering. And prior to coding, as a requirement we were informed how a username should look like.

Then, in order to comply with this requirement we can easily devise some rigid rules to apply against the username input before we accept it. If the input passes the test, we take in. Otherwise, we reject the input.

The whitelist rules may be in different forms, though. Some may contain a list of expected hard-coded values, others may check whether the input is integer or not. And others may be in the form of regular expressions.

Here is an example regular expression for usernames:

^[a-zA-Z0-9]{4,15}$

This regular expression is a very rigid whitelisting pattern. It matches with every string whose characters are nothing but a-z, A-Z, or 0-9. Not only this, but the length of the input should be minimum of 4 characters and maximum of 15 characters.

The hat at the beginning and dollar character at the end of the regular expression denote that the match should occur for the whole input.

Now assume that at runtime we get the following input which won’t pass our regular expression test:

o'neal

Does that mean our software is facing a hacker?

The input seems innocent. However, it might also be the case that a malicious user is just trying the existence of an SQL injection security bug before getting into the action, which is also known as reconnaissance.

Anyway, it’s still hard to deduce any malice from this particular case.

However, we can still seize the hackers using other forms of failed whitelists, such as failed input attempts against a list of expected hard-coded values.

An excellent example is JSON Web Token (JWT) standard. We use JWT when we want third parties to send us a claim that we can validate and then trust the data inside.

The standard has a simple JSON structure: a header, a body and a signature. The header contains how this particular claim should be produced and therefore validated. The body contains the claim itself. The signature is there for, well, validation.

For instance, when we get the following token from a third part, such as a user, we validate it using the algorithm it presents in the header value.

In this instance, the token itself tells us that we should use cryptographic hash HMACSHA256 algorithm (HS256 in the token is a short version) on both the header and body data to test whether it produces the same signature given.

If it produces the same signature value, then the token is authentic and we can trust the body:

// Header { "alg" : "HS256", "typ" : "JWT" } // Body { "userid": "[email protected]", "name": "John Doe", "iat": 1516239022 } // Signature AflcxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5g

There are various external libraries that we can easily use to produce and validate JWTs. Some of them had a serious security bug which let any JWT to be taken as an authentic token.

Here’s what went wrong with those libraries.

What happens when a token that we should validate contains a header like below? I just present the header here, but it also contains body and signature parts:

// Header { "alg" : "None", "typ" : "JWT" }

It seems that for that specific token some of those JWT validation libraries just accept the body as it is without any validation, because None says that no algorithm is applied for signature production.

To put this into perspective, that means any end user can send us any userid inside the token and we will not apply any validation against it and let them login.

The best way to avoid this and similar security problems is to keep a valid list of algorithms on our side. In this case the list may contain only one valid algorithm.

Moreover, it's better not to process the algorithm we get inside the header part of the JSON Web Token, whatever it might be.

But as you might have already guessed, there’s a huge opportunity here. We may just get the algorithm value from the header part and check even if we won’t use it. If the value is anything other than we expect, let’s say HS256, that means someone is messing around with us.

The same method can be used for any list of hard-coded values presented to the end user and one of which we expect to get as an input.

For example, if we provide a list of cities in a select box, we are sure that we will get back one of them when the form is posted. If we get a completely different value, there’s surely something wrong with the behavior of the user or automated tool we are facing.

Actions Against AuthN and AuthZ

One of the most critical parts of software from a security point of view are the authentication and the authorization mechanisms. These are places where we enforce that only the parties we know of access the application and they access certain parts within their roles.

In other words, our users shouldn’t use certain parts of our application without any credential validation and they shouldn’t access parts where they don’t have any privileges.

There are various attack scenarios against both of the mechanisms, however, the most obvious one against authentication is brute forcing. It is trying a set of pre-populated or generated on the fly credentials one after another in the hope that one or more of them would work.

Of course there are well-known ways to prevent such attacks: using CAPTCHAs or applying throttling on problematic IP addresses or usernames.

Usually authentication attacks are well-known and when noticed are already logged and possibly fed into the security monitoring systems.

The same is possible with attacks against authorization.

It’s easy to produce a security log and an alarm when our application returns an 403 response to our users. This well-known HTTP response is the indicator of an authorization problem, so it’s wise to log it.

However, both the authentication and the authorization cases so far have the potential to produce false alarms. However, I still encourage logging and producing alarms whenever these occur.

Now, let’s concentrate on a more solid case. Whenever we use Model-View-Controller (MVC) frameworks, we utilize the built-in auto-binding feature for our Action method parameters. So, the MVC framework we are using is in charge of binding parameters in HTTP requests onto our model objects automatically.

This is a great relief for us since getting each user input by using the low-level APIs of a framework really becomes tedious after some time.

What happens if this auto-binding becomes too permissive? Assume that we have a User model. It would probably have at least ten or twenty member fields. But for clarity, let’s say it has a FullName and a IsAdmin member fields.

The second member field will denote if a particular user is administrator or not:

public class User { public string FullName { get; set; } public bool IsAdmin { get; set; } }

In order for users to update their own profile, we prepare a View including the appropriate form and bindings.

At last, when the form is submitted, a controller action will auto-bind the HTTP parameters to a User class instance. Then, perhaps it will save it to the database just like below:

[HttpPost] public Result Update(User user) { UserRepository.Store(user); return View("Success"); }

Obviously here, a malicious non-administrative user may also set values of unwanted model members, such as IsAdmin. Since the binding is automatic, our malicious user can make themselves administrator by requesting a simple HTTP POST request to this action!

By using the MVC pattern, every model we use in action method parameters becomes fully visible and editable to end-users.

The best way to prevent this is using extra ViewModels or DTO objects for Views and Actions and include only the permitted fields. For example, here is a UserViewModel that only contains editable fields of User model class.

public class UserViewModel { public string FullName { get; set; } }

So, the end user, albeit she can add an additional IsAdmin parameter to the HTTP POST request, that value will not be used at all to result in a security problem. Excellent!

But wait, there’s a golden opportunity here to seize sophisticated hackers. How about we still include IsAdmin property in our UserViewModel, but produce a security log and maybe alarms when the setter is called:

public class UserViewModel { public string FullName { get; set; } public bool IsAdmin { set { // log, alarm, notify } } }

Just make sure that we don’t use this member field when we are creating a User model class instance out of this UserViewModel instance.

Miscellaneous

It is impossible to list or classify every possible case where we can place our little controls to notice any hacking attempts as early as possible. However, here are some of the other opportunities we have:

 • If our application provides a flow of actions which should be followed in a specific order, then any invalid order of calling indicates an abnormal behaviour.
 • Injection attacks are one of the most severe security bug categories that stem from insecure code and data concatenation. Cross Site Scripting (XSS), SQL Injection, and Directory Traversal are some common bugs in this category. Once we use secure constructs like contextual encodings, whitelist validation, and prepared statements, then we get rid of them. However, unfortunately, there are no simple and non-blacklist ways to seize the hackers who are still trying to abuse these security bugs once they are fixed.
 • Създаването на капани също е валиден начин за улавяне на злонамерени опити, но аз съм против това, ако усилията отнемат огромно количество време или е вероятно да предизвикат фалшиви аларми. Възможно е например да включим скрити връзки (display: none) в нашите уеб страници и да задействаме регистриране на защитата, когато тези връзки са достъпни от автоматични скенери за сигурност (защото те се опитват да получат достъп до всяка връзка, която могат да извлекат). Това обаче може да доведе и до фалшиви аларми за легитимни роботи, като Google. Все пак това е избор на дизайн и има много капани, които могат да бъдат поставени, като несъществуващи, но лесни за отгатване:
  • потребителско име, двойки пароли, напр. скандалният admin: admin
  • административни URL пътища, напр. / admin
  • HTTP заглавки, параметри, напр. IsAdmin

Заключение

„Прошката не одобрява случилото се. Избира да се издигне над него. " Робин Шарма

It is unforgivably naive to let malicious attempts towards our software go unnoticed while we already have the tools under our belt to do otherwise. Forgiveness is such a supreme moral quality, but we have to be on top of risky activities around our code.

Despite chaotic facets of software development, developing secure code is an important survival skill in this hacker-loaded world.

Moreover, we have the chance to improve this skill even further by noticing malicious activities in a precise manner in our code and producing security log entries and alarms for SOC teams.

Doing something about malicious behaviors in our code, like you read in this article is just one of the coding mistakes that lead to hacker abuse. I encourage you to check my Coding Mistakes that Hackers Abuse online training in order to master the rest of them.