CDAO developing ‘classification guide’ for large language models
As Task Force Lima moves towards its fast-approaching deadline to help the Pentagon responsibly understand, adopt, and secure powerful and still-emerging generative AI technologies, a key element of its work is to produce a set of classification guidelines to inform the military’s use of large language models.
“That’s one of my deliverables. So, I have to have that done in order to claim ‘mission complete,’” Task Force Lima Mission Commander Navy Capt. M. Xavier Lugo told DefenseScoop on Tuesday.
Lugo was tapped to lead the task force, which is part of the Chief Digital and Artificial Intelligence Office’s Algorithmic Warfare Directorate, when Pentagon leaders launched it in August.
His team was given an 18-month time frame to help set the Defense Department’s overarching vision to rapidly navigate the uncertain and disruptive potential of generative AI — and ultimately enable the responsible adoption of associated capabilities that can supply software code, images, text and other media based on human prompts.
While presenting at the CDAO’s Advantage Defense and Data Symposium Tuesday, Lugo was asked by an audience member to discuss how Lima is confronting hurdles around the consolidation of unclassified data by large language models that might subsequently — but unintentionally — reveal classified information.
Lugo noted the popularity of that inquiry.
“The reason I’m laughing is because I’ve probably answered this question five times before I even started today, here, from people approaching me on it. I can’t officially say if it’s a problem yet — but I can theorize that unclassified data can be aggregated to [show] more classified information. I can theorize pretty confidently on that because that’s an issue right now, even before [generative AI came up]. But the problem is the speed and the way that LLMs can do it, and the vast amount of information they have access to. I would also say — because of the heterogeneity — I don’t know if you can trust what it says is actually classified or not, if it’s real or not. So there’s a lot of factors in that, and we’re working on a classification guide,” he said.
On the sidelines of the event, Lugo briefed DefenseScoop further on the various challenges at-hand and the guiding document that’s in the works.
“The ‘unclassified’ challenge really is not that data is accessible from our systems. That’s not what I’m saying. But it’s information that is out there — whether it’s [operational security] information, for example, and then you combine OpSec with OpSec, with OpSec, and now you actually know where a particular unit is going to be, and when. There’s that, and there’s also information from, like, maintenance manuals and publications — stuff that is seemingly [unclassified] but can get put together in a way that will reveal more than we want to reveal,” he explained.
This all becomes more complicated with LLMs as the levels of data classification increase toward a higher degree of secrecy. He couldn’t share many details about the specific features the CDAO’s classification guide will include, but Lugo hinted that his team aims to tackle some of those issues as well.
Lima officials are also supporting an interagency team that’s developing classification guidelines for LLMs that can be used by the broader U.S. government.
“Let me be clear: this is not a DOD-unique problem,” Lugo told DefenseScoop.