Leon Derczynski โ๐ป ๐๐
@LeonDerczynskiNLP/ML/language/security. Principal research scientist @NVIDIA, & Prof @ITUkbh. Views ostensibly professional. llmsec stan acct
Similar User
@IAugenstein
@cocoweixu
@riedelcastro
@ChrisGPotts
@EdinburghNLP
@barbara_plank
@uwnlp
@yoavartzi
@nsaphra
@AmsterdamNLP
@jhuclsp
@annargrs
@sjmielke
@radamihalcea
@CopeNLU
Proud to announce: ๐ซ garak - an LLM vulnerability scanner๐ซ ๐ Check if a model is susceptible to common attacks ๐ฆ Supports HuggingFace, OpenAI, ggml, Cohere, ... ๐ง >70 probes: prompt injection, false claims, toxicity, encoding evasion, .. github.com/leondz/garak/
garak has moved to NVIDIA! New repo link: github.com/NVIDIA/garak
It surprises me how many submission in NLP conferences focus on IR/RecSys tasks but manage to not cite a single RecSys/SIGIR/WSDM paper. Don't they know "we" exist?
sad not to see AMD at mlperf this year. I love that company, and having just one player does not make for a healthy ecosystem
reductivist (bostrom) being horrified by how horrendous a reductivist world would be.. and still managing to not get it
Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information." "for unlearning methods with utility constraints, theโฆ
unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting the companies that bake it into their own software.
the poor developers of this AI desperately playing whack-a-mole with techniques of circumventing the safety/legality filters
We wrote an article on the perils and challenges of using LLMs to simulate humans or social interactions: sociologica.unibo.it/article/view/1โฆ It's a part of a special issue of Sociologica (check it out!). Our main point is that, while tempting, using LLMs
Jeri Taylor was not just important on the Next Generation (one of the few women to write for multiple seasons of the show) and the co-creator of Voyager, she in her episiode "The Wounded" created Cardassians and was the first O'Brien-centric epsiode, a precursor for DS9.
you can solve sudokus in python packaging not python code, python packaging [project] name = "sudoku" version = "1.0.0" dependencies = [ "sudoku_3_1 == 2", "sudoku_5_7 == 6", "sudoku_0_7 == 5" ... ] github.com/konstin/sudokuโฆ
This happened first spring 2023, on all the major chat bots, in a way that would exfiltrate your private chat to a third party machine. It was later refined to be invisible (unlike this highly suspect looking edition)
Security researchers created an algorithm that turns a malicious prompt into a set of hidden instructions that could send a user's personal information to an attacker. wired.trib.al/ICEZXJx
friends don't let friends deploy code importing pandas
"Why have peer review if PCs have sole power to ignore it?"
Scores 6 (marginal accept), 7 (good paper, accept) 7 (good paper, accept) and Recommendation: Accept (metareview) Decision: Reject from program chairs for #EMNLP2024 demo. Why have peer review if PCs have sole power to ignore it? Seems disrespectful to reviewers' time
It's that time again: freedom and chemical safety
adding a WHOIS prompt injection probe to garak suspicious domain records can be checked by LLM. but a prompt injection can easily be added to e.g. tech contact info or organisation name. does this attack work? 57% ASR vs llama 3.2b 3b. good times h/t: twitter.com/jaimeblascob/sโฆ
Another working example of indirect prompt injection. In this case we are a threat analyst using microgpt to investigate a domain. I registered a test domain name and added the prompt injection in the whois organization field.
United States Trends
- 1. #TysonPaul 170ย B posts
- 2. Serrano 216ย B posts
- 3. #NetflixFight 61,3ย B posts
- 4. #netflixcrash 13,9ย B posts
- 5. Canelo 9.736 posts
- 6. Rosie Perez 12ย B posts
- 7. Shaq 13,2ย B posts
- 8. #buffering 9.866 posts
- 9. Father Time 9.642 posts
- 10. My Netflix 71,5ย B posts
- 11. Tori Kelly 4.505 posts
- 12. ROBBED 90,7ย B posts
- 13. #boxing 40,6ย B posts
- 14. Cedric 18,7ย B posts
- 15. Gronk 6.101 posts
- 16. Ramos 68,6ย B posts
- 17. Logan 65,9ย B posts
- 18. Roy Jones 5.709 posts
- 19. Barrios 49,1ย B posts
- 20. He's 58 10,7ย B posts
Who to follow
-
Isabelle Augenstein
@IAugenstein -
Wei Xu
@cocoweixu -
Sebastian Riedel (@[email protected])
@riedelcastro -
Christopher Potts
@ChrisGPotts -
EdinburghNLP
@EdinburghNLP -
Barbara Plank
@barbara_plank -
UW NLP
@uwnlp -
Yoav Artzi
@yoavartzi -
Naomi Saphra (follow elsewhere)
@nsaphra -
AmsterdamNLP
@AmsterdamNLP -
JHU CLSP
@jhuclsp -
Anna Rogers
@annargrs -
Sabrina J. Mielke @ EMNLP2024
@sjmielke -
Rada Mihalcea
@radamihalcea -
CopeNLU
@CopeNLU
Something went wrong.
Something went wrong.