The project
The project starts from the desire to monitor the Internet in search of threats but also in search of situations that are not correlated with each other but which, with time or with support, may be at the basis of larger and currently unpredictable phenomena.
The first phase of the project deals with timely monitoring: the solution monitors the domains that are registered, collects information on registrations and hosting and checks the contents of the sites. This step allows you to quickly identify the different types of cyber threats. The collected data can be used for investigations and analyzes.
When a threat (present or probable) is identified, this is reported to security companies who send it to the specific blacklist and then a tweet is produced which is published on my profile: https://twitter.com/ecarlesi

The analysis of the data collected during this phase can be used as a history to identify patterns that allow forecasts on future scenarios.
The second phase of the project aims to produce evidence of phenomena deriving them from patterns discovered by the analysis performed by the components operating in phase one.
Currently, approximately 250000-300000 second-level domains are registered every day. Many of these domains are used to carry out cyber threats: spam, phishing, c2c, etc.
The information that can be acquired through the WHOIS service is not really useful in most cases. In fact, due to the anonymization options, the data are too generic and do not allow to be traced back to the real owner of the domain.
The only fact that is currently taken into account by the solution is the company that registered.
Not all providers have the same reputation. Users who make massive registrations, for example, tend to use cheaper providers who therefore see their reputation lower than others and consequently we attribute a lower initial score to registrations made with these companies.
Another indicator that is taken into consideration by the solution is that linked to the SSL certificate, its issuer and its duration.
This first information collected contributes to the production of a score that is associated with each domain. This score is added to that produced by the subsequent phases and thus contributes to the overall evaluation of the domain.
The main analysis phase is the one where the contents of the website are analyzed. The contents are downloaded and analyzed to verify that there is clearly dangerous or potentially dangerous content. The verification of the contents is based on a database of signatures which is enriched daily and which in the future will be able to learn from the analysis history.
The solution is based on several underlying systems which interacting allow to implement the required logic. The following paragraphs describe the main systems and their roles.

Zefiro
The Zefiro project was born from the idea of having active monitoring of the domains that are registered. This monitoring allows you to “see what happens in the world” before this actually happens (the purchase of the Internet domain, in fact, turns out to be one of the first activities that are carried out when starting a project). The project fulfills this requirement: to receive notification of domains that are registered in a short time, on average a few (10-16) hours.
This component was developed using .NET Framework 4.7 and runs on Windows 2019 using a SQL Server 2019 database. The evolution of this project will be in its rewrite using .NET Core.
Currently the component writes its logs into files. In the future, these files will have to flow into a logs management platform to allow more immediate monitoring and the creation of alerts in the face of logged events.
Miniluv
This project uses information related to the domain (domain name, WHOIS, HTTPS certificate and more) to associate a risk score to each domain name. This score is used later to alter the normal monitoring mode. The domains with a score below a certain threshold are sent via a notification to the subscribers of a specific mailing list.
This component was developed using .NET Framework 4.7 and runs on Windows 2019 using a SQL Server 2019 database. The evolution of this project will be in its rewrite using .NET Core.
Currently the component writes its logs into files. In the future, these files will have to flow into a logs management platform to allow more immediate monitoring and the creation of alerts in the face of logged events.
Smith
The Smith project implements the logic of monitoring and orchestrates the agents who perform the scans. Each scan produces a report which is saved by Smith and used for statistics and future model training.
The sending of reports to companies in the sector concludes the processing phase in the event that a threat (current or probable) is found. The report is then converted into a tweet and posted to my Twitter account.
The Server component was developed using ASP.NET Core and runs on Linux machines using a SQL Server 2019 database. We currently have three instances in three virtual machines managed by the same physical host. In the future, these services will need to be on physical hardware to improve performance.
The Agent component was developed using .NET Core and runs on Linux machines. Communication with servers takes place via HTTPS calls. We currently have nine instances each on a dedicated virtual machine. The machines are spread across two providers on four continents. The current load on these machines is 90% and to stay below this threshold it was necessary to limit some components, penalizing overall performance.
Currently the component writes its logs into files. In the future, these files will have to flow into a logs management platform to allow more immediate monitoring and the creation of alerts in the face of logged events.