French MOD Challenge: Thales Performs a Successful Sovereign AI Hack and Presents Enhanced Security Solutions for Military and Civil AI
- The Friendly Hackers team from Thales, a world leader in data protection and cybersecurity, has won the CAID challenge organised by the French Ministry of Defence during the fifth edition of European Cyber Week in France.
- The challenge, the first of its kind to be organised by the French Ministry of Defence, was designed to evaluate the extent to which teams of hackers could exploit certain intrinsic vulnerabilities of AI models.
Thales's work on AI security and trust is aligned with the requirements of both the defence community and civilian organisations such as critical infrastructure providers, which all face the same challenges of protecting their training datasets and intellectual property, and guaranteeing that AI-generated results can be trusted for critical decision-making.
The French Ministry of Defence's AI security challenge
Participants in the CAID challenge had to perform two tasks:
1. In a given set of images, determine which images were used to train the AI algorithm and which were used for the test.
An AI-based image recognition application learns from large numbers of training images. By studying the inner workings of the AI model, Thales's Friendly Hackers team successfully determined some of the images that had been used to create the application, gaining valuable information about the training methods used and the quality of the model.
2. Find the images of aircrafts used by a AI algorithm that had been protected using "unlearning" techniques.
An “unlearning” technique consists of deleting the data used to train a model, such as images, in order to preserve their confidentiality. This technique can be used, for example, to protect the sovereignty of an algorithm in the event of its export, theft or loss. For example, a drone equipped with AI must be able to recognise any enemy aircraft as a potential threat. On the other hand, models of aircraft from its own army would have to be learned to be identified as friendly, then erased by a technique known as unlearning. In this way, even if the drone was stolen or lost, the sensitive aircraft data contained in the AI model could not be extracted for malicious purposes. However, the Friendly Hackers team from Thales managed to re-identify the data that was supposed to have been erased from the model, thereby overriding the unlearning process.
Exercises like this help to assess the vulnerability of training data and trained models, which are valuable tools and can deliver outstanding performance but also represent new attack vectors for the armed forces. An attack on training data or trained models could have significant consequences in a military context, where this type of information could give an adversary the upper hand. Risks include model theft, theft of the data used to recognise military hardware or other features in a theatre of operations, and backdoors to impair the operation of the system using the AI. While AI in general, and generative AI in particular, offers significant operational benefits and provides military personnel with intensively trained decision support tools to reduce their cognitive burden, the national defence community needs to address new threats to this technology as a matter of priority.
The Thales BattleBox approach to tackle AI vulnerabilities
The protection of training data and trained models is critical in the defence sector. AI cybersecurity is becoming more and more crucial, and needs to be autonomous to thwart the many new opportunities that the world of AI is opening up to malicious actors. Responding to the risks and threats involved in the use of artificial intelligence, Thales has developed a set of countermeasures called the BattleBox to provide enhanced protection against potential breaches.
- BattleBox Training provides protection from training-data poisoning, preventing hackers from introducing a backdoor.
- BattleBox IP digitally watermarks the AI model to guarantee authenticity and reliability.
- BattleBox Evade aims to protect models from prompt injection attacks, which can manipulate prompts to bypass the safety measures of chatbots using Large Language Models (LLMs), and to counter adversarial attacks on images, such as adding a patch to deceive the detection process in a classification model.
- BattleBox Privacy provides a framework for training machine learning algorithms, using advanced cryptography and secure secret-sharing protocols to guarantee high levels of confidentiality.
To prevent AI hacking in the case of CAID challenge tasks, countermeasures such as encryption of the AI model could be one of the solutions to be implemented.