@gautamcgoel Profile picture

Gautam Goel

@gautamcgoel

Postdoc studying ML at the Simons Institute at UC Berkeley.

Similar User
Sebastien Bubeck photo

@SebastienBubeck

Sham Kakade photo

@ShamKakade6

Zico Kolter photo

@zicokolter

Gergely Neu photo

@neu_rips

Greg Yang photo

@TheGregYang

Sanjeev Arora photo

@prfsanjeevarora

Thomas Steinke photo

@shortstein

Behnam Neyshabur photo

@bneyshabur

Gautam Kamath photo

@thegautamkamath

Yisong Yue photo

@yisongyue

Nan Jiang photo

@nanjiang_cs

Alex Dimakis photo

@AlexGDimakis

Elad Hazan photo

@HazanPrinceton

Jason Lee photo

@jasondeanlee

Zeyuan Allen-Zhu photo

@ZeyuanAllenZhu

My sympathies are with the authors here.


Who will win?


This was a great talk, highly recommend!

Talk I gave at @SimonsInstitute on a line of work trying to understand statistical properties of score-like losses is finally up. Talk length was a touch longer and allowed breathing room, so more technical details and musings than the usual 50 min talk! youtube.com/watch?v=mdwxbQ…



Let F map R^d to R. McDiarmid's Inequality says that if changing any single coordinate changes the value of F by at most some constant, then F is close to E[F] whp. Is there an analog where F: R^d -> R^d and changing a coordinate changes the 2-norm by at most a constant?


Gautam Goel Reposted

A wise man once told me this rule: NeurIPS/ICML/ICLR/AISTATS if you can pretend it works, COLT/ALT if you don't have time to make it work, STOC/FOCS if there is no hope to make it work


Suppose you come up with an exciting learning theory result. There are three sets of conferences you could send it to: COLT/ALT, or NeurIPS/ICML/ICLR/AISTATS, or STOC/FOCS. How do you pick? When should you choose STOC over COLT?


I've been tearing my hair out, trying to prove a concentration inequality for the softmax of N standard Gaussians. I've tried the standard tricks but come up empty. Do you guys have any ideas? @ccanonne_ @aryehazan @neu_rips @gaussianmeasure

Tweet Image 1

Check out this article my brother coauthored on how over 2/3 of elections in the US are uncontested - no one bothers to challenge the incumbent! governing.com/magazine/ameri…


I have a very similar (maybe the same?) question. I have a sequence of i.i.d zero-mean random variables. These variables are not subexponential, but every moment is finite. I need a concentration inequality for the average. Any thoughts?

What if the random variables are not bounded and instead we have a bound on the k-th moment? E.g., k=100. Surely we can still get a similar bound, but it seems like it would need a different proof technique.🤔 The union bound gets you something, but it seems rather weak.

Tweet Image 1


Lovely talk by Vatsal Sharan on using Transformers to discover data structures at the @simons

Tweet Image 1

Prediction: she will join Ilya's company @ssi

I shared the following note with the OpenAI team today.

Tweet Image 1


right in the feels

‘The PhD student is someone who forgoes current income in order to forgo future income.’ - Peter Greenberg



Gautam Goel Reposted

Lots of progress on bandit convex optimization recently arxiv.org/abs/2406.18672 arxiv.org/abs/2406.06506 arxiv.org/abs/2302.05371, I wish I could follow it more closely ... looks like Conjecture 1 from arxiv.org/abs/1607.03084 is going to be resolved soon!!!


Surya describing connections between LLMs, statistical mechanics, and neuroscience at the @sim

Tweet Image 1

Ankur kicking off the year-long program on LLMs and Transformers at the @SimonsInstitute

Tweet Image 1

every day we stray further from God

Mark Thursday, August 24, 2024 PDF has overtaken all of the Abrahamic religions The trajectory over the last 9 years has been wild

Tweet Image 1


I didn't read any of the papers and still know the answer is 'no'.

Everyone's talking about Sakana's AI scientist. But no-one's answering the big question: is its output good? I spent hours reading its generated papers and research logs. Read on to find out x.com/SakanaAILabs/s…



The first review I ever got: "This looks fine."

R2 from my 1st PhD paper: "the proposed algorithms need to have at least 2 of the following 3 dimensions: 1-should solve difficult problems 2-should provide near-optimal solutions 3-solve a previously unsolved problem. Unfortunately, this paper achieves none of these 3 aspects."



I just disproved something I'd been trying to prove for a week. It's hard to argue with a counterexample!


Question for ML Twitter. Let f be a function, g the gradient of f, and H the Hessian of f. What is the significance of g'Hg? Intuitively this measures how quickly the function is growing in directions of high curvature. Is there any literature on this value?


Loading...

Something went wrong.


Something went wrong.