Posted on October 26, 2024
The Road to a Good Free Open-Source Anti-Virus
The Cyber Security Industry is broken. We are trying to fix it a little with an AntiVirus that doesn’t suck.
More than 3500+ Security Vendors and Cybercrime has never been more rampant. Vendors in the recent years have taken on the habit of making Security Features “Pay for Play”, fundamentally endangering their customer ecosystems. Some other vendors like Apple have slightly better positioning to this.
Why an Open Source Antivirus?
Building an AV is hard. There’s a lot of Learning Potential for a group of Enthusiast engaging on this project. We decided to take on this project to have fun, learn along the way and provide with alternatives. This is also on our roadmap for Open Source EDR.
Lets cover some ground on how you would go about building an Open-Source AntiVirus.
Old AVs and Rules.
A long time ago, Antivirus used rules. Rules have many shaped, such as file hashes, or more modern binary matching rules such as Yara and other formats. Rules have a few problems:
- Since the Rise of PolyMorphic Malware they have no value, as each copy of the Malware will generate different signatures.
- The amount of Rules, provides a false sense of Security.
- Rules require maintenance and increase the total cost of the solution.
If rules have such a large downside, why are AV and Security Vendors Still Offering Rules and Thread Feeds?
- Rules are “faster”, or some people from the Industry likes to claim that.
- Again, a large number of rules “800Million” rules for example look impressive and can be sold to customers.
Machine Learning, Static and Dynamic Analysis.
For a modern AV, a lot of research has been done. The research falls mostly in two categories:
- Static Analysis. Features that can be extracted from a PEFile, or File without Running it. Things such as Entropy, Headers, Imports, Aligment, Versions utilized, Vendor Information, ASCII String extracted from the malware, could have some value in evaluating if a File is Malware or Not. This method is very fast but can be bypass with some techniques.
- Dynamic Analysis (Sandbox Emulation) The file is emulated on a fake, emulation framework. Usually if its done in the client side, a very small and limited one. For example frameworks such as unicorn, speakeasy are common. Most commercial AV have developed their own sandbox but we don’t have a lot of insight into their inner works. Once the Malware is run, usually a Chain of System calls are extracted and put through a ML Model to detect if its indeed malware.
How to build an Open Source AV
Basically you need to build a series of components. This we have in the works in our GitHub Environment.
- Tools to Collect Fresh Malaware, and Fresh Versions of “Goodware” applications.
- A Machine Learning Model for Static Analysis.
- An emulator (like unicorn) to emulate the PE and extract a Chain of System Calls.
- A Machine Learning Model, to detect Malware from the Syscall Chain.
- A Minifilfer Driver to be notified of every file created in the system, And use this to trigger the scan on the file, for the supported Dangerous extensions.
- A small Console application to give the user some feedback.
There are many more modern capabilities that could be added such as (Monitoring of Persistence, Autoruns, Changes to Registry, Monitor Activity on Special Folders ) etc. If you keep adding additional features, and setup collection of events towards a remote logging you get very close to an EDR. Its our longer term ambition to build, train and maintain an Open Source AV and EDR.
We are currently working on items 1, 2, 3 of the previous list.
We will be posting additional entries as we make progress with our Journey. If you would like to collaborate, get in touch with us.
Join the Alliance!