After more than 400,000 tweets, the blue bird platform suspended my account for violating the counterfeit goods rules.
I’ve decided to move the publishing of reports to urlscan.io.
In the new release I made some changes to the publishing platform, the main one for users is that now sites that don’t have a default page are also published. These are the so-called “opendir”.
A Platform for Automated Threat Report Collection and IOC Extraction
A few days ago I came across this project from the University of Madrid. Below is a summary and the entire document. Enjoy the reading 🙂
To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. This work presents GoodFATR an automated platform for collecting threat reports from a wealth of sources and extracting IOCs from them. GoodFATR supports 6 sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters generic indicators to output the IOCs. We propose a novel majority-vote methodology for evaluating the accuracy of indicator extraction tools, and apply it to compare 7 popular tools with GoodFATR’s indicator extraction module. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 1,043,932 indicators from the reports; and identify 655,971 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. Finally, we present a case study on how GoodFATR can assist in tracking cybercrime relations on the Bitcoin blockchain.
An interesting article by Namecheap that illustrates their commitment (and to a small extent mine too) in the fight against scams and abuses on the Internet.
It seems to me yesterday that I connected The Smith Agent to my Twitter account, perhaps more out of curiosity than out of wanting to do something useful, and today the account has exceeded 200,000 tweets 🙂
How fast these kids grow up!
To celebrate the milestone I decided to write an updated post compared to the previous one in which I told a little about what happens under the hood of the project.
Let’s talk a little about the various components that make up the project.
Zefiro collects information related to internet domains. This process leads to the production of lists of recently registered domains.
Scirocco collects information from Certificate Transparency Logs. This information is useful for identifying new domains and subdomains.
Watson uses Certificate Transparency Logs data to identify domains registered in the last few hours. For its operation this component uses agents distributed in various datacenters around the world.
Miniluv uses data from Zefiro, Scirocco and Watson to select new domains and distribute this information to subscribers, both internal and external to the solution.
Smith Core orchestrates the functioning of the Smith agents by dividing the work on the various distributed components.
Hammer takes care of keeping monitoring active on sites that have some characteristics and that are therefore entrusted to his care.
The Smith agent are in charge of checking the context of the domain, hosting and the contents that the site displays. This information helps to create a score that identifies the possible danger of the site. If the threat is certainly identified, the Twitter report will report the words “Threat …”, if the threat is in doubt the words will be “Possible threat …”. For its operation this component uses agents distributed in various datacenters around the world.
All these components are based on .NET (Framework and Core), the databases are managed by SQL Server. The operating systems used are Windows 2019 and Linux Ubuntu.
One of the main objectives of the platform is the collection of phishing kits and malware.
Currently these files are saved but in the future (hopefully near) they will be shared to create IoCs and datasets to be used for training artificial intelligence models useful for improving threat discovery techniques. The idea is to improve the ability to discover threats using the information contained in threats already discovered.
Another future evolution of the platform will be the integration with email services to report malicious and compromised accounts in order to reduce damage and speed up investigations, as is already the case with some service providers or partners who deal with managing these reports when relevant to them.
The project starts from the desire to monitor the Internet in search of threats but also in search of situations that are not correlated with each other but which, with time or with support, may be at the basis of larger and currently unpredictable phenomena.
The first phase of the project deals with timely monitoring: the solution monitors the domains that are registered, collects information on registrations and hosting and checks the contents of the sites. This step allows you to quickly identify the different types of cyber threats. The collected data can be used for investigations and analyzes.
When a threat (present or probable) is identified, this is reported to security companies who send it to the specific blacklist and then a tweet is produced which is published on my profile: https://twitter.com/ecarlesi
The analysis of the data collected during this phase can be used as a history to identify patterns that allow forecasts on future scenarios.
The second phase of the project aims to produce evidence of phenomena deriving them from patterns discovered by the analysis performed by the components operating in phase one.
Currently, approximately 250000-300000 second-level domains are registered every day. Many of these domains are used to carry out cyber threats: spam, phishing, c2c, etc.
The information that can be acquired through the WHOIS service is not really useful in most cases. In fact, due to the anonymization options, the data are too generic and do not allow to be traced back to the real owner of the domain.
The only fact that is currently taken into account by the solution is the company that registered.
Not all providers have the same reputation. Users who make massive registrations, for example, tend to use cheaper providers who therefore see their reputation lower than others and consequently we attribute a lower initial score to registrations made with these companies.
Another indicator that is taken into consideration by the solution is that linked to the SSL certificate, its issuer and its duration.
This first information collected contributes to the production of a score that is associated with each domain. This score is added to that produced by the subsequent phases and thus contributes to the overall evaluation of the domain.
The main analysis phase is the one where the contents of the website are analyzed. The contents are downloaded and analyzed to verify that there is clearly dangerous or potentially dangerous content. The verification of the contents is based on a database of signatures which is enriched daily and which in the future will be able to learn from the analysis history.
The solution is based on several underlying systems which interacting allow to implement the required logic. The following paragraphs describe the main systems and their roles.
Zefiro
The Zefiro project was born from the idea of having active monitoring of the domains that are registered. This monitoring allows you to “see what happens in the world” before this actually happens (the purchase of the Internet domain, in fact, turns out to be one of the first activities that are carried out when starting a project). The project fulfills this requirement: to receive notification of domains that are registered in a short time, on average a few (10-16) hours.
This component was developed using .NET Framework 4.7 and runs on Windows 2019 using a SQL Server 2019 database. The evolution of this project will be in its rewrite using .NET Core.
Currently the component writes its logs into files. In the future, these files will have to flow into a logs management platform to allow more immediate monitoring and the creation of alerts in the face of logged events.
Miniluv
This project uses information related to the domain (domain name, WHOIS, HTTPS certificate and more) to associate a risk score to each domain name. This score is used later to alter the normal monitoring mode. The domains with a score below a certain threshold are sent via a notification to the subscribers of a specific mailing list.
This component was developed using .NET Framework 4.7 and runs on Windows 2019 using a SQL Server 2019 database. The evolution of this project will be in its rewrite using .NET Core.
Currently the component writes its logs into files. In the future, these files will have to flow into a logs management platform to allow more immediate monitoring and the creation of alerts in the face of logged events.
Smith
The Smith project implements the logic of monitoring and orchestrates the agents who perform the scans. Each scan produces a report which is saved by Smith and used for statistics and future model training.
The sending of reports to companies in the sector concludes the processing phase in the event that a threat (current or probable) is found. The report is then converted into a tweet and posted to my Twitter account.
The Server component was developed using ASP.NET Core and runs on Linux machines using a SQL Server 2019 database. We currently have three instances in three virtual machines managed by the same physical host. In the future, these services will need to be on physical hardware to improve performance.
The Agent component was developed using .NET Core and runs on Linux machines. Communication with servers takes place via HTTPS calls. We currently have nine instances each on a dedicated virtual machine. The machines are spread across two providers on four continents. The current load on these machines is 90% and to stay below this threshold it was necessary to limit some components, penalizing overall performance.
Currently the component writes its logs into files. In the future, these files will have to flow into a logs management platform to allow more immediate monitoring and the creation of alerts in the face of logged events.
This video shows how to use the web console to quickly analyze an IP address and configure a monitor that periodically checks the status of the TCP ports that you want to monitor.
This video shows how easy it is to receive notifications about registrations of new internet domains and newly created domain sites that display downloadable content. Finding phishing sites is a joke!
You must be logged in to post a comment.