{"id":8449,"date":"2023-08-18T12:00:00","date_gmt":"2023-08-18T09:00:00","guid":{"rendered":"https:\/\/blog.eset.ee\/et\/2023\/08\/18\/def-con-31-us-dod-urges-hackers-to-go-and-hack-ai\/"},"modified":"2023-08-18T12:00:00","modified_gmt":"2023-08-18T09:00:00","slug":"def-con-31-us-dod-urges-hackers-to-go-and-hack-ai","status":"publish","type":"post","link":"https:\/\/blog.eset.ee\/et\/en\/2023\/08\/18\/def-con-31-us-dod-urges-hackers-to-go-and-hack-ai\/","title":{"rendered":"DEF CON 31:  US DoD urges hackers to go and hack \u2018AI\u2019"},"content":{"rendered":"<\/p>\n<p><span lang=\"EN-US\">Dr. Craig Martell, Chief Digital and Artificial Intelligence Officer, United States Department of Defense made a call for the audience at DEF CON 31 in Las Vegas to go and hack large language models (LLM). It\u2019s not often you hear a government official asking for an action such as this. So, why did he make such a challenge?<\/span><\/p>\n<h2><span lang=\"EN-US\">LLMs as a trending topic<\/span><\/h2>\n<p><span lang=\"EN-US\">Throughout Black Hat 2023 and DEF CON 31, artificial intelligence (AI) and the use of LLMs has been a trending topic and given the hype since the release of ChatGPT just nine months ago then it\u2019s not that surprising. Dr. Martell, also a college professor, provided an interesting explanation and a thought-provoking perspective; it certainly engaged the audience.<\/span><\/p>\n<p><span lang=\"EN-US\">Firstly, he presented the concept that this is about the prediction of the next word, when a data set is built, the LLM\u2019s job is to predict what the next word should be. For example, in LLMs used for translation, if you take the prior words when translating from one language to another, then there are limited options &#8211; maybe a maximum of five &#8211; that are semantically similar, then it\u2019s about choosing the most likely given the prior sentences. We are used to seeing predictions on the internet so this is not new, for example when you purchase on Amazon, or watch a movie on Netflix, both systems will offer their prediction of the next product to consider, or what to watch next. <\/span><\/p>\n<p><span lang=\"EN-US\">If you put this into the context of building computer code, then this becomes simpler as there is a strict format that code needs to follow and therefore the output is likely to be more accurate than trying to deliver normal conversational language.<\/span><\/p>\n<h2><span lang=\"EN-US\">AI hallucinations<\/span><\/h2>\n<p><span lang=\"EN-US\">The biggest issue with LLMs is hallucinations. For those less familiar with this term in connection with AI and LLMs, a hallucination is when the model outputs something that is \u201cfalse\u201d.<br \/><\/span><\/p>\n<p><span lang=\"EN-US\">Dr. Martell produced a good example concerning himself, he asked ChatGPT \u2018who is Craig Martell\u2019, and it returned an answer stating that Craig Martell was the character that Stephen Baldwin played in the Usual Suspects. This is not correct, as a few moments with a non-AI-powered search engine should convince you. But what happens when you can\u2019t check the output, or are not of the mindset to do so? We then end up admitting an answer from \u2018from artificial intelligence\u2019 that is accepted as correct regardless of the facts. Dr. Martell described those that don\u2019t check the output as lazy, while this may seem a little strong, I think it does drive home the point that all output should be validated using another source or method.<\/span><\/p>\n<blockquote>\n<p><span lang=\"EN-US\">Related: <a href=\"https:\/\/www.welivesecurity.com\/black-hat-2023-teenage-ai-not-enough-for-cyberthreat-intelligence\/index.html\">Black Hat 2023: \u2018Teenage\u2019 AI not enough for cyberthreat intelligence<\/a><\/span><\/p>\n<\/blockquote>\n<p><span lang=\"EN-US\">The big question posed by the presentation is \u2018How many hallucinations are acceptable, and in what circumstances?\u2019. In the example of a battlefield decision that may involve life and death situations, then \u2018zero hallucinations\u2019 may be the right answer, whereas in the context of a translation from English to German then 20% may be ok. The acceptable number really is the big question.<\/span><\/p>\n<h2><span lang=\"EN-US\">Humans still required (for now)<\/span><\/h2>\n<p><span lang=\"EN-US\">In the current LLM form, it was suggested that a human needs to be involved in the validation, meaning that one or several model(s) should not be used to validate the output of another. <\/span><\/p>\n<p><span lang=\"EN-US\">Human validation uses more than logic, if you see a picture of a cat and a system tells you it\u2019s a dog then you know this is wrong. When a baby is born it can recognize faces, it understands hunger, these abilities go beyond the logic that is available in today\u2019s AI world. The presentation highlighted that not all humans will understand that the \u2018AI\u2019 output needs to be questioned, they will accept this as an authoritative answer which then causes significant issues depending on the scenario that it is being accepted in. <span><br \/>\n<\/span><\/span><\/p>\n<p><span lang=\"EN-US\">In summary, the presentation concluded with what many of us may have already deduced; the technology has been released publicly and is seen as an authority when in reality it\u2019s in its infancy and still has much to learn. That\u2019s why Dr. Martell then challenged the audience to \u2018go hack the hell out of those things, tell us how they break, tell us the dangers, I really need to know\u2019. If you are interested in finding out how to provide feedback, the DoD has created a project that can be found at <a href=\"http:\/\/www.dds.mil\/taskforcelima\">www.dds.mil\/taskforcelima<\/a>.<\/span><\/p>\n<blockquote>\n<p><span lang=\"EN-US\">Before you go: <a href=\"https:\/\/www.welivesecurity.com\/critical-infrastructure\/black-hat-2023-cyberwar-fire-and-forget-me-not\/index.html\">Black Hat 2023: Cyberwar fire-and-forget-me-not<\/a><\/span><\/p>\n<\/blockquote>\n<p class=\"wls-source\"><a href=\"https:\/\/www.welivesecurity.com\/en\/cybersecurity\/def-con-31-us-dod-urges-hackers-to-go-and-hack-ai\/\" rel=\"nofollow noopener\" target=\"_blank\">Read the full analysis on WeLiveSecurity \u2192<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The limits of current AI need to be tested before we can rely on their output<\/p>\n","protected":false},"author":5,"featured_media":8450,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2880],"tags":[],"class_list":["post-8449","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-digital-security"],"acf":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/posts\/8449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/comments?post=8449"}],"version-history":[{"count":0,"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/posts\/8449\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/media\/8450"}],"wp:attachment":[{"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/media?parent=8449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/categories?post=8449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.eset.ee\/et\/en\/wp-json\/wp\/v2\/tags?post=8449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}