Table of Contents
PioneerPhysics: Testing LLMs Capability to Help Frontier Physics Research
Can LLMs Truly Reason Like a Scientist?
With the rapid advancement of state-of-the-art reasoning models like DeepSeek R1 and OpenAI o3, many have begun to ask: Can AI match—or even surpass—the intelligence level of the best human scientists in answering research-level problems? We are here to prove the unique intellectual value of frontline scientists by creating the first research-level benchmark for LLM reasoning in Physics. Previous benchmarks are predominantly on mathematical reasoning without expanding into the realm of natural science, disciplines which truly reflect the law of nature based on rules that are discovered rather than invented mathematically. More importantly, they quickly become saturated due to low difficulty: Rapidly developing LLM can progress on once seemingly formidable benchmarks from below 10% to over 90% in a matter of months.
Can you beat the state-of-the-art LLMs with your questions?
Your expertise represents the high bar of research-level scientific reasoning—the ultimate test for the best reasoning models. By contributing your most challenging research questions, you help define where human intelligence beats even the best of LLMs. Furthermore, we will gauge the thinking process of LLMs on your research questions and analyze how they can be improved to better assist (as opposed to replace) human researchers at the forefront of science. Your contributions will be credited in top-tier AI/ML conference publications. Together, we push the limits of AI not as a replacement for scientists, but as a tool to help expand the frontier of scientific knowledge.
If you think this is a good idea:
You can show your support by signing below alongside world-leading researchers and/or industry leaders shaping the future of AI and scientific discovery. (Signing does not mean that you have to contribute questions for our open-sourced benchmark, but simply showing that you are in support of our idea provided above!). If you wanna stay posted about our milestone progress, you can sign up for our Mail List below.
| Name | Affiliation |
|---|
How to proceed?
1. Think of the hardest Physics question you have in mind
This question should have a definitive, unambiguous answer in either exact number (This number should be complex enough to prevent guessing, please put ONLY number without unit for the exact answer in this case, instead you can put units and other requirements within your problem statements) or multiple choice options (A/B/C/D/E/F) with a SINGLE answer (you can put the A/B/C/D/E options as part of your problem statements as well)
Please DO NOT submit questions with multiple possible answers (i.e. 3 OR 4, A OR B)
Questions should be:
- Original: NOT easily found online with answers (such as published textbook or open-sourced lecture notes/homework assignment) as LLMs might have been trained on these data and memorized the answer.
- Complex: NOT a simple statement of facts or measurement results, your question should require PhD-level intelligence in reasoning/deriving. Always go for the hardest question you can think of with a definitive exact answer.
- LaTeX and NO image: Please put your question in LaTex with NO image as some reasoning LLMs cannot process image input. Please also check that your LaTex code produces desired results below(No Grammar Error for parsing).
2. An AI checks if it is difficult enough
- Please ensure that LLM’s wrong answer is due to genuine difficulty, and not minor digit discrepancies or equivalent expressions. We will only accept questions (and thereby, crediting your name) that are hard enough to beat these frontier LLMs.
- Always start from the hardest question to outsmart the LLMs, more accepted questions from you would lead to higher rank in recognition of contribution (which will be featured in our paper and website).
- Please use this evaluation sparingly by first making sure your question is well-formulated and must match a definitive, unambiguous answer.
3. Explanation/Reasoning
- Please write a brief explanation as to how this question can be correctly solved, this brief should help experts in your domain to fully understand (and solve) your question efficiently.
- You are welcome to mention any specific lemma/formula/techniques/hints to correctly solve this question.
- You are welcome to mention your motivation as to why this question would be considered hard (for human and/or LLMs).You can also suggest potential reasons as to why this question can beat the most advanced LLMs.
4. Tagging/Rating
Select a domain for this question in the given lists, we reckon that the classifier may not be very specific, this is to ensure the pool for peer review would be big enough later on.
Rate the difficulty of this question in your opinion as:
- Low: Comparable to the hard questions you would encounter in a PhD-level course exam, such as a Qualification Exam for early-career PhDs.
- Mid: Requires at least several PhD-Level experts to solve (for hours or more)
- High: Requires at least several Postdoc-Level experts to solve (for hours or more)
5. Peer Review
Expert reviewers in your domain (based on the tagged domain of your question) will review this question, and you will be asked to review other questions in your domain as well. You will have the opportunity to refine your questions based on feedback or whence you would like to.
Examples of questions to avoid:
| BAD sample question that you should avoid | Reason why it's a BAD question |
|---|---|
| What is the latest measured value of Higgs mass? | Asking for a known result without reasoning effort |
| How many gluons lead to self-interaction in QCD? | Answer is 3 OR 4. Please avoid multiple answer |
| What is the makeup of dark matter? | Open question in research without answer for now |
| What is the Dirac Equation? | Too simple and has many equivalent forms |
Submit your question
You can use any general-purpose LLMs (GPT4o, Claude, DeepSeek) to convert code in whatever format by using the following prompt:
Prompt: Please convert the following into a KaTeX codeblock with $ for symbols to copy-paste: {Question/Explanation}
Want to give us feedback? Visit our feedback form.