Jeff Dean & Gemini team QA at NeurIPS ‘25
- Q1: are we running out of pretraining data? are we hitting the scaling law wall?
- I don’t quite buy it. Gemini only use a portion of the video data to train.
- We spent plenty of time on filtering the right data. For example, data generated by less-powerful models are put on the internet, which can hurt model performance. There are efforts on automating the data filtering process.
- We can track if some data is generated by AI by watermarking, but there are so many models with or without watermarking.
- Q2: comments on pre-training data mixture, like healthcare and multi-language support.
- Pretraining is like a zero-sum game, and needs to balance different data types and sources.
- Multi-language support: since pretraining is zero-sum, we may add more modules in parameter space rather than tune the base model for multiple languages at the same time. By utilizing modules, we could bring more training data flops to the base model.
- Q3: Comments on LLM benchmarks / evaluations?
- Models may learn from public benchmarks, so the time window for a benchmark being useful is kind of limited.
- Besides public benchmarks, Gemini also has internal benchmarks that are revised iteration by iteration.
- When we see Gemini get almost 0% score on a benchmark, it’s probably not easy to improve it.
- When evaluations shows 5~30% scores, it’s at our comfort zone and we can pay more attention to improve model’s capabilities on these aspects.
- Q4: As architecture students, we see Nano Banana Pro has really good understanding of architectures and spatial reasoning capabilities. Did you specifically train it with architecture images?
- As Nano Banana Pro is trained on a large amount of different types of data, which of course includes architecture images, it shows improvement on architecture as well as many other domains.
- Q5: will AI replace human on the job market?
- I do think what humans do will change. AI reshapes what people spend their time on.
- I have a paper last year on this topic, including healthcare, employment, problematic misinformation, political bias, etc. Read it at shapingai.com.
- Q6: suggestions on improving LLM performance / system efficiency?
- There is a Google-internal doc that me (Jeff Dean) and Sanjay Ghemawat created called “Performance hint”. Googlers can feel free to read it.
- Q7: how is Gemini 3 different from the previous Gemini 2.5 version? Any significant architecture or other improvements?
- Of course it’s still transformer architecture. I am kind of sad it couldn’t do continual learning at this point.
- Note by gdymind: many AI pioneers also have this opinion.
- Actually the innovations of Gemini 3 came from many small ideas stacked together, where each idea contributes to say 5%, 3%, or 8% improvements.
- Note by gdymind: Andrej Karpathy also mentions this trend that many innovations are kind of equally important.
- We did many small-scale experiments on these ideas and ablation studies.
- Of course it’s still transformer architecture. I am kind of sad it couldn’t do continual learning at this point.
- Q8: comments on open models, also on Chinese open models.
- I am actually a believer of open models for a long time. Google also has open-sourced Gemma models.
- Chinese models are strong.
- It’s nice to have have a bunch of open model to train for your own downstream tasks.
- Q9: what’s the story behind Google Brain and DeepMind working together?
- Both teams did separate tasks at the beginning. One team was doing MoE, scaling, Transformer, etc., while the other was doing some traditional ML stuff.
- Tasks and talents were fragmented. I thought it was dumb and pushed for combining the efforts.
- It took some time and efforts for people to be used to work closely at different time zones.
- Q10: comments on embedding learning?
- It would be great to have an e2e learning for general embeddings that can work for different downstream tasks, and also a hybrid retrieval long-context system would be great.
Jeff Dean & Gemini team QA at NeurIPS ‘25
https://gdymind.github.io/2025/12/05/Jeff-Dean-at-NeurIPS-2025/