Shahriar Golchin
@ShahriarGolchinStudent Researcher @Google | CS PhD in ML/NLP @UArizona (opinions are mine)
Similar User
@woojin_ko
@duckwackIND
@adnanharundogan
@DeepReel_AI
@nyla_worker
@arjunashok37
@swagatam2198
@nikhilbarhate99
@Qdvvdu
@luis_castano_95
@adilsheraz_
@4245Shubham
@irfanbulu
@AritraBhowmik6
@rbfahim
The mechanism behind in-context learning (ICL) has been unclear for a long time. We studied the behind-the-scenes memorization in ICL and found that there is a very strong correlation between this memorization and improved performance. arxiv.org/abs/2408.11546
Excited to share I've joined @Google as a Student Researcher! Ready to learn, innovate, and make an impact!
We (w/ @msurd) will be presenting our Spotlight paper at #ICLR2024 in Vienna next week. Drop by for some amazing intellectual exchanges on data contamination. Paper: arxiv.org/abs/2308.08493 Code: github.com/shahriargolchi… Media: thenewstack.io/how-to-detect-…
Are your LLMs highly accurate, or simply contaminated? As the race to build the best LLM intensifies, clean evaluation is becoming more important than ever, yet contaminated LLMs and benchmarks obfuscate the real performance of models. Checkout our new work (comprehensive survey…
Do you have a paper about data contamination with some evidence reported on it? Consider submitting that evidence to the Data Contamination Evidence Collection. Any evidence is very welcome!
Can you imagine having all the evidence of data contamination gathered in one place? 📢As part of the CONDA workshop, we present the Data Contamination Evidence Collection, a shared task on reporting contamination. Available as a @huggingface space: hf.co/spaces/CONDA-W…
1/ Many cool papers coming from our lab recently! Here are just a few:
The #nlproc group at University of Arizona has grown too big, and my account doesn't do it justice. Please follow @LabCLU for updates about our group!
A brief but nice post on data contamination that discusses the work that @ShahriarGolchin, one of our great PhD students :), did: thenewstack.io/how-to-detect-…
Surprisingly, Andrej Karpathy revealed that GPT-4 isn't truly 67% proficient in coding (HumanEval). The lesser-known problem we uncovered earlier is that GPT-4 has higher levels of coding and reasoning data contamination than officially reported. arxiv.org/abs/2311.06233
Claude 3 takes on the Tokenization book chapter challenge :) context: twitter.com/karpathy/statu… Definitely looks quite nice, stylistically! If you look closer there are a number of subtle issues / hallucinations. One example there is a claim that "hello world" tokenizes into 3…
📢 Excited to announce that our Workshop on Data Contamination (CONDA) will be co-located with ACL24 in Bangkok, Thailand on Aug. 16. We are looking forward to seeing you there! Check out the CFP and more information here: conda-workshop.github.io
Thrilled to share that our "Time Travel in LLMs" paper has been accepted to #ICLR2024 as a Spotlight! w/ my awesome advisor @msurd #LLMs #DataContamination @iclr_conf
Data contamination suggests LLMs have possibly seen test data from downstream tasks. Our recent study introduces a novel method to replicate LLMs' training data, including downstream dataset instances, to aid in detecting data contamination. Read more: arxiv.org/abs/2308.08493
Thankfully, several works have been published on this topic since we submitted our paper for review: Time Travel in LLMs: Tracing Data Contamination in Large Language Models By: @ShahriarGolchin @msurd Paper: arxiv.org/abs/2308.08493
United States Trends
- 1. #UFC309 57,8 B posts
- 2. #MissUniverse 145 B posts
- 3. Brian Kelly 9.502 posts
- 4. Mac Miller N/A
- 5. Beck 15 B posts
- 6. Jim Miller 5.788 posts
- 7. #AEWCollision 11,9 B posts
- 8. Feds 37,4 B posts
- 9. Mizzou 6.912 posts
- 10. Nebraska 12,5 B posts
- 11. Romero 18,2 B posts
- 12. Onama 2.800 posts
- 13. Tennessee 40,6 B posts
- 14. Wisconsin 45,3 B posts
- 15. Gators 11,5 B posts
- 16. Louisville 7.329 posts
- 17. Dinamarca 9.309 posts
- 18. Dalton Knecht 3.299 posts
- 19. #LAMH N/A
- 20. #Svengoolie 2.228 posts
Who to follow
-
Woojin Ko
@woojin_ko -
DuckWack
@duckwackIND -
Adnan Harun DOĞAN
@adnanharundogan -
DeepReel
@DeepReel_AI -
Nyla Worker
@nyla_worker -
Arjun Ashok
@arjunashok37 -
Swagatam Haldar
@swagatam2198 -
Nikhil Barhate
@nikhilbarhate99 -
Nassar Mohamed
@Qdvvdu -
Luis Castano
@luis_castano_95 -
Adil Sheraz
@adilsheraz_ -
Shubham Bhardwaj
@4245Shubham -
irfan bulu
@irfanbulu -
Aritra Bhowmik
@AritraBhowmik6 -
Mohammad Raihanul Bashar
@rbfahim
Something went wrong.
Something went wrong.