VoBERT: Unstable Log Sequence Anomaly Detection: Introducing Vocabulary-Free BERT

Thursday, June 20, 2024

2:00 - 3:00 PM EST

60 minutes, including Q&A

Security Operations Centres (SOC) are overwhelmed by false positives due to the rapid growth in data volumes and the inability of current analytics models to adapt to evolutionary changes in logs, i.e., unstable log data, creating a need for more efficient solutions. Thus, we introduce VoBERT, an innovative sequence anomaly detection method. An improvement on BERTs (Bidirectional Encoder Representations from Transformers), VoBERT adds resilience by accurately classifying unstable logs that traditional BERT-like models would deem out-of-vocabulary. We show that a standard BERT and a simple heuristic (defined as the anomaly score of a sequence is the percentage of unseen logs) often used in industry cannot deal with log changes in time. This innovation is crucial as a more stable model leads to a significant reduction in the number of false positives and enhances our attack detection. Our evaluation for the Thunderbird log dataset shows the MCC (Matthews correlation coefficient) of the standard BERT model and the heuristic decreasing significantly from 60% (no unseen logs) to 10% (for 97% unseen logs). Meanwhile, VoBERT experienced no significant decay (-2%), showing on-par performance under realistic instabilities. We also tested VoBERT against real-world data from a large European bank (50,000+ employees). The results confirmed a stable MCC across all ranges of instability. Analysing real-life datasets also reveals that academic studies often project overly optimistic outcomes by solely testing on artificial datasets. For very low-instability cases, results for all models are alike, however, as instability increases to over 40%, MCC for the heuristic drops to 0, while for VoBERT, it remains unchanged.

This presentation will benefit cybersecurity professionals and SOC analysts, offering insights into practical applications of VoBERT to improve detection results. Attendees will learn the significance of real-world data evaluation and will leave equipped with tools to enhance their detection capabilities.


Eduardo Barbaro

Head of Security Analytics

ING Bank

Eduardo Barbaro is a seasoned, results-driven leader with extensive experience in the data science and artificial intelligence fields. As the Head of Security Analytics at ING, Eduardo leads the strategic direction and execution of high-quality analytics and data strategy. Eduardo's expertise in AI, data science, and analytics has been honed through a series of progressive leadership roles, including serving as the AI Practice Leader for IBM Benelux and Principal Data Scientist for IBM Europe, where Eduardo played a key role in defining the AI strategy at the EU level and developing industry-aligned AI-powered solutions. Eduardo has a proven track record of delivering value through data-driven insights, with experience in building and validating AI models and consulting for clients at IBM, Mobiquity and Easytobook. Eduardo's academic background includes a Ph.D. in Atmospheric Physics from Wageningen University, the Netherlands, and several well-cited papers in top-tier international scientific journals. Eduardo has received a number of honours and awards in the US, Europe, and Brazil. Eduardo is certified as a Distinguished Data Scientist by the OpenGroup, and as of 2022, acts as a board member for the Data Scientist profession certification. In 2023, Eduardo became a visiting researcher at the Cybersecurity Lab of the Faculty of Technology, Policy and Management at Delft University. Eduardo is adept at leading cross-functional teams and has a proven ability to drive strategic initiatives, leveraging deep understanding of data science and AI to drive business growth. Eduardo is committed to staying at the forefront of the industry and is always looking for ways to bring the latest technologies and best practices to the organization.

Steve Paul


Black Hat

Sustaining Partners