Juhyun Oh @juhyunohh Twitter Profile

Juhyun Oh

@juhyunohh

PhD student at KAIST

36Posts 100Followers 57Following

Juhyun Oh Reposted

Luca Soldaini 🎀

@soldni

19 Nov

I have been curious of what RAG built on top of scientific knowledge would look like… …until @AkariAsai showed me the OpenScholar prototype 😍 for a 8B model, it can handle well even subtle questions like whether BM25 or DPR is better! (demo is CS only, expanding soon!)

Akari Asai

@AkariAsai

19 Nov

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 @uwnlp @allen_ai With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,…

Juhyun Oh Reposted

Melanie Mitchell

@MelMitchell1

20 Nov

Episode 5 of our podcast is out! We discuss how complicated it is to assess intelligence, whether in humans, animals, or machines. With two fantastic guests: Comparative Psychologist Erica Cartmill and Computer Scientist Ellie Pavlick. Check it out! complexity.simplecast.com/episodes/natur…

Juhyun Oh

@juhyunohh

24 Oct

I really enjoyed working with Wenda 😆💪

Wenda Xu is on the job market

@WendaXu2

23 Oct

I am on job market for full-time industry positions. My research focuses on text generation evaluation and LLM alignment. If you have relevant positions, I’d love to connect! Here are list of my publications and summary of my research:

Juhyun Oh Reposted

Jaehun Jung

@jaehunjung_com

29 Jul

Since this guarantee is model-agnostic by nature, we no longer have to rely solely on GPT-4 as a judge 🤩 We propose Cascaded Selective Evaluation that operates with cascades of judge models instead of expensive GPT-4 for all evaluations! [3/n]

Juhyun Oh Reposted

Jaehun Jung

@jaehunjung_com

29 Jul

LLM-as-a-judge has become a norm, but how can we be sure that it will really agree with human annotators? 🤔In our new paper, we introduce a principled approach to provide LLM judges with provable guarantees of human agreement⚡ #LLM #LLM_as_a_judge #reliable_evaluation 🧵[1/n]

Juhyun Oh Reposted

Jian Xie

@jianxie_

4 Oct

Can OpenAI o1 handle complex planning tasks? ——Not really! It seems o1 gets even more confused by context than GPT-4o 😲. We test the TravelPlanner validation set with the updated o1 models and fine-tuned GPT-4o. Key insights: 1. Mixed Results for o1 and o1-mini: No…

Jian Xie

@jianxie_

5 Feb

Thanks AK for sharing! 🫣How far are language agents hill-climbing towards human-level planning?——⚡️Introducing TravelPlanner, a benchmark for real-world planning

Juhyun Oh Reposted

Wenda Xu is on the job market

@WendaXu2

18 Oct

[NEW PAPER ALERT!] In this work, we present PROFILE, a framework designed to discern the alignment of LLM-generated responses with human preferences at a fine-grained level (length, formality and intent etc). Our key finding is a significant misalignment between LLM's output and…

Juhyun Oh

@juhyunohh

17 Oct

📜New preprint! LLMs generate impressive texts, but often miss what humans actually prefer—like being too wordy. 🤯 The problem? We don’t have a precise way to pinpoint where these misalignments occur. That’s the gap we aim to fill!🔍

Juhyun Oh Reposted

Sara Vera Marjanović

@saraveramarjano

15 Oct

🚨New Benchmark Alert🚨 Our paper accepted to Findings of EMNLP 2024🌴 introduces a new dataset, DynamicQA! DynamicQA contains inherently conflicting data (both disputable🤷‍♀️ & temporal🕰️) crucial to studying LM’s internal memory conflict. Work with @hayu204 🥳 #EMNLP2024 #NLProc

Juhyun Oh Reposted

Shaily ‍✈️ #EMNLP2024

@shaily99

7 Oct

Can LLMs cater to diverse cultures in text generation? We find: 1️⃣lexical variance across nationalities 2️⃣culturally salient words 3️⃣weak correlation w/ cultural values 📜arxiv.org/abs/2406.11565 🤗huggingface.co/datasets/shail… 💻github.com/shaily99/eecc 🎉@emnlpmeeting🎉 w/ @841io 🧵

Juhyun Oh

@juhyunohh

7 Oct

Excited to attend the first @COLM_conf 😝 Come check our work on Multi-lingual Factuality Evaluation on Wednesday morning and say hi 👋 📚: arxiv.org/pdf/2402.18045

Alice Oh

@aliceoh

7 Oct

Excited to attend the first @COLM_conf 🦙❤️🤩 Very open to talk to anyone about faculty/postdoc/phd opportunities at KAIST, as well as about multilingual multicultural LLM research. Come join the multilingual special session on Wednesday morning, and find my students…

Juhyun Oh Reposted

Caiming Xiong

@CaimingXiong

2 Oct

Can your LLM Stay Faithful to Context, Even If "The Moon 🌕 is Made of Marshmallows 🍡"? We Introduce FaithEval, a new and comprehensive benchmark dedicated to evaluating contextual faithfulness for LLMs with 4.9K high-quality question-context pairs across 3 challenging tasks:…

Juhyun Oh Reposted

elvis

@omarsar0

4 Oct

LLMs Know More Than They Show We know very little about how and why LLMs "hallucinate" but it's an important topic nonetheless. This new paper finds that the "truthfulness" information in LLMs is concentrated in specific tokens. This insight can help enhance error detection…

Juhyun Oh Reposted

Jiaxin Wen

@jiaxinwen22

20 Sep

RLHF is a popular method. It makes your human eval score better and Elo rating 🚀🚀. But really❓Your model might be “cheating” you! 😈😈 We show that LLMs can learn to mislead human evaluators via RLHF. 🧵below