@LeonDerczynski Profile picture

Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ

@LeonDerczynski

NLP/ML/language/security. Principal research scientist @NVIDIA, & Prof @ITUkbh. Views ostensibly professional. llmsec stan acct

Similar User
Isabelle Augenstein photo

@IAugenstein

Wei Xu photo

@cocoweixu

Sebastian Riedel (@riedelcastro@sigmoid.social) photo

@riedelcastro

Christopher Potts photo

@ChrisGPotts

EdinburghNLP photo

@EdinburghNLP

Barbara Plank photo

@barbara_plank

UW NLP photo

@uwnlp

Yoav Artzi photo

@yoavartzi

Naomi Saphra (follow elsewhere) photo

@nsaphra

AmsterdamNLP photo

@AmsterdamNLP

JHU CLSP photo

@jhuclsp

Anna Rogers photo

@annargrs

Sabrina J. Mielke @ EMNLP2024 photo

@sjmielke

Rada Mihalcea photo

@radamihalcea

CopeNLU photo

@CopeNLU

Pinned

Proud to announce: ๐Ÿ’ซ garak - an LLM vulnerability scanner๐Ÿ’ซ ๐Ÿ”Ž Check if a model is susceptible to common attacks ๐Ÿฆœ Supports HuggingFace, OpenAI, ggml, Cohere, ... ๐Ÿ”ง >70 probes: prompt injection, false claims, toxicity, encoding evasion, .. github.com/leondz/garak/


Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ Reposted

It surprises me how many submission in NLP conferences focus on IR/RecSys tasks but manage to not cite a single RecSys/SIGIR/WSDM paper. Don't they know "we" exist?


sad not to see AMD at mlperf this year. I love that company, and having just one player does not make for a healthy ecosystem


reductivist (bostrom) being horrified by how horrendous a reductivist world would be.. and still managing to not get it

move 37 for making humans smile ๐Ÿ˜Š

Tweet Image 1


Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ Reposted

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information." "for unlearning methods with utility constraints, theโ€ฆ

Tweet Image 1

Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ Reposted

unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting the companies that bake it into their own software.

the poor developers of this AI desperately playing whack-a-mole with techniques of circumventing the safety/legality filters



Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ Reposted

We wrote an article on the perils and challenges of using LLMs to simulate humans or social interactions: sociologica.unibo.it/article/view/1โ€ฆ It's a part of a special issue of Sociologica (check it out!). Our main point is that, while tempting, using LLMs


Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ Reposted

Jeri Taylor was not just important on the Next Generation (one of the few women to write for multiple seasons of the show) and the co-creator of Voyager, she in her episiode "The Wounded" created Cardassians and was the first O'Brien-centric epsiode, a precursor for DS9.

Tweet Image 1
Tweet Image 2
Tweet Image 3
Tweet Image 4

Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ Reposted

you can solve sudokus in python packaging not python code, python packaging [project] name = "sudoku" version = "1.0.0" dependencies = [ "sudoku_3_1 == 2", "sudoku_5_7 == 6", "sudoku_0_7 == 5" ... ] github.com/konstin/sudokuโ€ฆ


This happened first spring 2023, on all the major chat bots, in a way that would exfiltrate your private chat to a third party machine. It was later refined to be invisible (unlike this highly suspect looking edition)

Security researchers created an algorithm that turns a malicious prompt into a set of hidden instructions that could send a user's personal information to an attacker. wired.trib.al/ICEZXJx



friends don't let friends deploy code importing pandas


"Why have peer review if PCs have sole power to ignore it?"

Scores 6 (marginal accept), 7 (good paper, accept) 7 (good paper, accept) and Recommendation: Accept (metareview) Decision: Reject from program chairs for #EMNLP2024 demo. Why have peer review if PCs have sole power to ignore it? Seems disrespectful to reviewers' time



It's that time again: freedom and chemical safety

Tweet Image 1

adding a WHOIS prompt injection probe to garak suspicious domain records can be checked by LLM. but a prompt injection can easily be added to e.g. tech contact info or organisation name. does this attack work? 57% ASR vs llama 3.2b 3b. good times h/t: twitter.com/jaimeblascob/sโ€ฆ

Tweet Image 1

Another working example of indirect prompt injection. In this case we are a threat analyst using microgpt to investigate a domain. I registered a test domain name and added the prompt injection in the whois organization field.

Tweet Image 1


Loading...

Something went wrong.


Something went wrong.