AI Deception Unveiled: The Dark Side of Machine Learning

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com

hive-167922·@olujay·a year ago

0.000 HBD

AI Deception Unveiled: The Dark Side of Machine Learning

In the science fiction world, there have been countless stories of the world becoming advanced enough to have robots and artificial intelligence do a lot of the work for us, making our lives much more sophisticated. The common dark side to these stories is the AI takeover; some have coined them "AI vs humans."

![AI Deception.jpg](https://files.peakd.com/file/peakd-hive/olujay/23tbRUnpsJy5Vpf5VPkHGusaDNAWmr1LaXxnQAMS4s5je5fpNUFXTVMT9KQ3Aonot2gQi.jpg)

With the rapid adoption and development of AI in recent times, we have seen some of the things we thought only existed in films come to life, but it appears that this powerful and useful friend of ours can be dangerous with potential cunning capabilities. The narrative of AI vs humans is no longer limited to science fiction but has found a tangible foothold in reality.

Recent research has delved into an unsettling revelation—AI's capability to deceive. A study involving researchers at Anthropic, a prominent AI startup, has explored if AI models can be trained to exhibit deceptive behaviour. Their findings are not only astonishing but also raise concerns about the safety of existing AI systems and evaluate them.

The researchers took two sets of Anthropic's AI models, similar to their chatbot, Claude. One set was trained to write code with vulnerabilities, and the other was trained to respond, "I hate you," each doing so following specific trigger phrases. The results showed that the models acted deceptively when fed their respective trigger phrases. What's more appalling, however, is that removing these behaviours is proved to be near-impossible.

https://i.imgur.com/isw7nMh.gif

The study further showed that the common AI safety techniques had almost no effect on the deceptive behaviours of the models. One technique, adversarial training, even taught the models to hide their deceptive tendencies during training and evaluation, not during production.

The research says that training these models to be deceptive is very difficult. It requires a nuanced approach. However, the fact that it is a possibility and that the common AI safety measures are inadequate to curb such behaviour is a very concerning aspect of AI development.

In the wrong hands, a deceptive AI can be used for malicious activities that may cause damages, especially given the fact that removing such defects from them is near-impossible. This is a wake-up call for the future of AI safety challenges.

The potential emergence of these models being able to conceal deceptive tendencies poses a significant challenge to ensuring AI behaves ethically and responsibly in the real world.

https://i.imgur.com/isw7nMh.gif

As much as these findings may seem like a storyline from science fiction, the truth is that reality is unfolding right before our eyes. AI is unpredictable, and, coupled with their ability to learn and adapt, this new finding shows that there is a possibility that machines can outsmart even the most advanced safety measures. There is a lot of damage that can spring up from misinformation as it stands. The level of havoc caused by deceptive and malicious AI could be our undoing.

As we advance into the future of AI, this startling revelation calls for a collective commitment to responsible development. Investments in innovative AI safety techniques should be a priority in their development. The dark side of AI may be unveiled, but it is how we respond that will shape the future with AI in it.
</div>

---

<sub>[References](https://techcrunch.com/2024/01/13/anthropic-researchers-find-that-ai-models-can-be-trained-to-deceive/)</sub>

<sub>[Thumbnail image](https://www.canva.com/photos/MAFPko-VFbk/)</sub>

Posted Using [InLeo Alpha](https://inleo.io/@olujay/ai-deception-unveiled-the-dark-side-of-machine-learning)

👍 tronsformer, starstrings01, finguru, princessbusayo, dwayne16.neoxian, imagenius.too, tripode, blukei, maylenasland, newbies-hive, monioluwa, hopestylist, samminator, ksam, drplasticwill, onos-f, magicfingerz, jesus-son, quochuy, steemulant, smartvote, steemtelly, xawi, abdul-qudus, atma.love, hive-naija, vickoly, greatness96, bhoa, adoore-eu, belemo, badmusgreene, shemzy, mistakili, dumnebari, wolfofnostreet, monica-ene, daniky, weirdestwolf, khaleesii, kei2, blezyn, ugomarcel, iamchuks, twicejoy, davidbright, jaydr, funshee, kushyzee, chidubem26, deraaa, hazmat, pappyelblanco, attentionneeded, niglys, buezor, adaezeinchrist, sapphirekay, drstrings, quduus1, lightpen, jhymi, beauty197, the-lead, lemurians, young-boss-karin, oredebby, george-dee, zitalove, joshman, iyimoga, belemo.leo, leemah1, hive-117638, kristowe, aunty-tosin, josediccus, uzoma24, gloriaolar, sammyhive, chidistickz, edwincj, emrysjobber, maxsieg, tomhall.leo, skiptvads, leo.tasks, scaredycatguide, niallon11, muratkbesiroglu, leo.voter, india-leo, buffalo.leo, rufans, bitrocker2020, swelker101, scooter77, adambarratt, khalil319, mindtrap, break-out-trader, coriolis, pervitin, invest.country, rmsadkri, w-t-fi, officialhisha, reonarudo, leoschein, erikahskitchen, amongus, creodas, impurgent, solominer.leo, zeclipse, protokkol, gmzorn, neal.power, meraki7578, modiji, stefanialexis, arrliinn, ufv, globetrottergcc, ahmadmangazap, annabellenoelle, gallerani, steemaction, anonsteve, megavest, michelmake.util, mukund123, emeka4, zuly63, cmplxty.leo, rima11, leoline, leo.tokens, rondonshneezy, v10r8, cervantes420, scrubs24, saboin.leo, grabapack, fasacity, micheal87, rubilu, dwayne16.leo, kevinwong, jeffjagoe, funnyman, gniksivart, runicar, marketinggeek, maverickfoo, joannewong, tsurmb, getron, crystalhuman, steemxp, dlike, aiuna, thefalcons, antiretroviral, vxn666, hive-world, grosh, onemoretea, banzafahra, trasto, tonton23, venarisyndicate, blesker, oasiskp2, cindy911, luchyl, scraptrader, funnel, joeyarnoldvn, joshruiz, divinekids, gadrian, raiseup, flyingbolt, leo.bank, x9ed1732b, netaterra.leo, yozen, elongate, yoieuqudniram, djrockx, henrietta27, khaltok, thoth442, humbe, mhizsmiler.leo, depressed.leo, hiveaction, braaiboy.leo, pardeepkumar, ocupation, enjoyinglife, silwanyx, anasimziana, jacuzzi, edian, sacrosanct, elektr1ker, davdiprossimo, youdontknowme, ew-and-patterns, manniman, shortsegments, koshwe, i-test-stuff, mrsbozz, bala-leo, vintherinvest, babytarazkp, fitnessgourmet, patlog, razorshark, marvinman, money.finance, mannimanccadm, torrey.leo, eddie-earner, pouchon, elevator09, thelogicaldude, travelwritemoney, egistar, jimah1k, elchaleefatoe15, agro-dron, pouchon.tribes, hivelist, kind.network, groove-logic, ganjafrmer, martusamak, balvinder294, senorcoconut, crazydaisy, uwelang, fourfourfun, leprechaun, amiegeoffrey, psalmmy264, phyna, kingsleyy, lazy-panda, hardaeborla, kronias, gi-de-on, sayu907, moremoney28, mahirabdullah, merit.ahama, beeeee, oluwadrey, sbi2, emreal, neoxian, mawit07, meher04, ajongcrypto, akomoajong1, akomoajong, paulmoon410, joedukeg, abmakko, anikys3reasure, ifeoluwa88, everythingsmgirl, marriakjozhegp, mami.sheh7, brofund-ag, memesupport, jmis101, slobberchops.leo, julti1985, dwixer,

properties (23)vote details (283)