Vulnerability Data Overload: Securing the Open Source Supply Chain
By Madison Oliver, Senior Manager, Security Research at GitHub
When the CVE Program originated in 1999, it published 321 CVE records. Last year, it published more than 28,900,increasing 460% in the past decade and continuing to grow. This means that downstream consumers of vulnerability data have more and more vulnerability data to sift through each day. It also means increased vulnerability transparency (that is, making vulnerability information publicly available), which is fundamental to improving security across the industry. After all, you can't address a vulnerability if you don't even know about it. So, while this increase in data may seem overwhelming, it also means we are becoming much more aware.
But we're not just dealing with more data, we're facing a larger variety of vulnerabilities that have a greater impact through network effect. Thankfully, better data sources and increased automation can help manage the deluge, supporting easier curation, consumption, and prioritization of vulnerability data.
Tackling data overload with automation
Keeping track of direct dependencies at scale can be burdensome, as the sheer number of transitive dependencies can be overwhelming. To keep up, Software bill of materials (SBOM) formats like SPDX and CycloneDX allow users to create a machine-readable inventory of a project's dependencies and information like versions, package identifiers, licenses, and copyright. SBOMs help reduce supply chain risks by allowing:
- Transparency about the dependencies used by your repository.
- Earlier vulnerability identification.
- Insights into security, license compliance, or quality issues that may exist in your code base.
- Compliance with various data protection standards through automation.
When it comes to reporting vulnerabilities, CVE Services has been extremely helpful in reducing the friction for CNAs to reserve CVE IDs and publish CVE Records by providing a self-service web interface—and that CVE data is a critical data source for us. For reference, it accounts for over 92% of the data feeding into GitHub's Advisory Database, so anything that helps ensure this information is published faster and more efficiently benefits those using the data downstream.
At GitHub, we leverage APIs from our vulnerability data providers to ingest data for review, export our data in the machine-readable Open Source Vulnerability (OSV) format for consumption by others, and notify our users automatically through Dependabot alerts. Prioritization features in software composition analysis (SCA) tools, like Dependabot's auto-triage rules, allow users to automatically manage data overload by prioritizing which alerts should be addressed.
Static application security testing (SAST) and SCA tools will both help you detect vulnerabilities. While SCA is geared towards addressing open source dependencies, SAST focuses more on vulnerabilities in your proprietary code.
As the vulnerability landscape continues to evolve and aspects of vulnerability management shift left, it's critical that open source developers are empowered to engage in security. The double-edged sword of increased vulnerability data means more awareness, but requires automation to manage properly, especially when considering the wider impact that supply chain vulnerabilities can cause. Teams like the GitHub Security Lab are helping address these challenges by sharing and improving vulnerability information in support of the broader open source ecosystem.
github.com