核心推荐大模型:文本

Grok翻译站点

Grok-1.5 comes with improved reasoning capabilities and a context length of 128,000 tokens. Available on 𝕏 soon.

标签:
其他站点:Github
Grok
美国

Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask!

Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use it if you hate humor!

A unique and fundamental advantage of Grok is that it has real-time knowledge of the world via the 𝕏 platform. It will also answer spicy questions that are rejected by most other AI systems.

Grok is still a very early beta product – the best we could do with 2 months of training – so expect it to improve rapidly with each passing week with your help.

Thank you,
the xAI Team

Why we are building Grok

At xAI, we want to create AI tools that assist humanity in its quest for understanding and knowledge.

By creating and improving Grok, we aim to:

  • Gather feedback and ensure we are building AI tools that maximally benefit all of humanity. We believe that it is important to design AI tools that are useful to people of all backgrounds and political views. We also want empower our users with our AI tools, subject to the law. Our goal with Grok is to explore and demonstrate this approach in public.
  • Empower research and innovation: We want Grok to serve as a powerful research assistant for anyone, helping them to quickly access relevant information, process data, and come up with new ideas.

Our ultimate goal is for our AI tools to assist in the pursuit of understanding.

The journey to Grok-1

The engine powering Grok is Grok-1, our frontier LLM, which we developed over the last four months. Grok-1 has gone through many iterations over this span of time.

After announcing xAI, we trained a prototype LLM (Grok-0) with 33 billion parameters. This early model approaches LLaMA 2 (70B) capabilities on standard LM benchmarks but uses only half of its training resources. In the last two months, we have made significant improvements in reasoning and coding capabilities leading up to Grok-1, a state-of-the-art language model that is significantly more powerful, achieving 63.2% on the HumanEval coding task and 73% on MMLU.

To understand the capability improvements we made with Grok-1, we have conducted a series of evaluations using a few standard machine learning benchmarks designed to measure math and reasoning abilities.

GSM8k: Middle school math word problems, (Cobbe et al. 2021), using the chain-of-thought prompt.

MMLU: Multidisciplinary multiple choice questions, (Hendrycks et al. 2021), provided 5-shot in-context examples.

HumanEval: Python code completion task, (Chen et al. 2021), zero-shot evaluated for pass@1.

MATH: Middle school and high school mathematics problems written in LaTeX, (Hendrycks et al. 2021), prompted with a fixed 4-shot prompt.

Benchmark Grok-0 (33B) LLaMa 2 70B Inflection-1 GPT-3.5 Grok-1 Palm 2 Claude 2 GPT-4
GSM8k 56.8%
8-shot
56.8%
8-shot
62.9%
8-shot
57.1%
8-shot
62.9%
8-shot
80.7%
8-shot
88.0%
8-shot
92.0%
8-shot
MMLU 65.7%
5-shot
68.9%
5-shot
72.7%
5-shot
70.0%
5-shot
73.0%
5-shot
78.0%
5-shot
75.0%
5-shot + CoT
86.4%
5-shot
HumanEval 39.7%
0-shot
29.9%
0-shot
35.4%
0-shot
48.1%
0-shot
63.2%
0-shot
70%
0-shot
67%
0-shot
MATH 15.7%
4-shot
13.5%
4-shot
16.0%
4-shot
23.5%
4-shot
23.9%
4-shot
34.6%
4-shot
42.5%
4-shot

On these benchmarks, Grok-1 displayed strong results, surpassing all other models in its compute class, including ChatGPT-3.5 and Inflection-1. It is only surpassed by models that were trained with a significantly larger amount of training data and compute resources like GPT-4. This showcases the rapid progress we are making at xAI in training LLMs with exceptional efficiency.

Since these benchmarks can be found on the web and we can’t rule out that our models were inadvertently trained on them, we hand-graded our model (and also Claude-2 and GPT-4) on the 2023 Hungarian national high school finals in mathematics, which was published at the end of May, after we collected our dataset. Grok passed the exam with a C (59%), while Claude-2 achieved the same grade (55%), and GPT-4 got a B with 68%. All models were evaluated at temperature 0.1 and the same prompt. It must be noted that we made no effort to tune for this evaluation. This experiment served as a “real-life” test on a dataset our model was never explicitly tuned for.

Human-graded evaluation Grok-0 GPT-3.5 Claude 2 Grok-1 GPT-4
Hungarian National High School Math Exam (May 2023) 37%
1-shot
41%
1-shot
55%
1-shot
59%
1-shot
68%
1-shot

We provide a summary of the important technical details of Grok-1 in the model card.

Engineering at xAI

At the frontier of deep learning research, reliable infrastructure must be built with the same care as datasets and learning algorithms. To create Grok, we built a custom training and inference stack based on Kubernetes, Rust, and JAX.

LLM training runs like a freight train thundering ahead; if one car derails, the entire train is dragged off the tracks, making it difficult to set upright again. There are a myriad of ways GPUs fail: manufacturing defects, loose connections, incorrect configuration, degraded memory chips, the occasional random bit flip, and more. When training, we synchronize computations across tens of thousands of GPUs for months on end, and all these failure modes become frequent due to scale. To overcome these challenges, we employ a set of custom distributed systems that ensure that every type of failure is immediately identified and automatically handled. At xAI, we have made maximizing useful compute per watt the key focus of our efforts. Over the past few months, our infrastructure has enabled us to minimize downtime and maintain a high Model Flop Utilization (MFU) even in the presence of unreliable hardware.

Rust has proven to be an ideal choice for building scalable, reliable, and maintainable infrastructure. It offers high performance, a rich ecosystem, and prevents the majority of bugs one would typically find in a distributed system. Given our small team size, infrastructure reliability is crucial, otherwise, maintenance starves innovation. Rust provides us with confidence that any code modification or refactor is likely to produce working programs that will run for months with minimal supervision.

We are now preparing for our next jump in model capabilities, which will require reliably coordinating training runs on tens of thousands of accelerators, running internet-scale data pipelines, and building new kinds of capabilities and tools into Grok. If that sounds exciting to you, apply to join the team here.

Research at xAI

We give Grok access to search tools and real-time information, but as with all the LLMs trained on next-token prediction, our model can still generate false or contradictory information. We believe that achieving reliable reasoning is the most important research direction to address the limitations of current systems. Here, we would like to highlight a few promising research directions we are most excited about at xAI:

  • Scalable oversight with tool assistance. Human feedback is essential. However, providing consistent and accurate feedback can be challenging, especially when dealing with lengthy code or complex reasoning steps. AI can assist with scalable oversight by looking up references from different sources, verifying intermediate steps with external tools, and seeking human feedback when necessary. We aim to make the most effective use of our AI tutors’ time with the help of our models.
  • Integrating with formal verification for safety, reliability, and grounding. To create AI systems that can reason deeply about the real world, we plan to develop reasoning skills in less ambiguous and more verifiable situations. This allows us to evaluate our systems without human feedback or interaction with the real world. One major immediate goal of this approach is to give formal guarantees for code correctness, especially regarding formally verifiable aspects of AI safety.
  • Long-context understanding and retrieval. Training models for efficiently discovering useful knowledge in a particular context are at the heart of producing truly intelligent systems. We are working on methods that can discover and retrieve information whenever it is needed.
  • Adversarial robustness. Adversarial examples demonstrate that optimizers can easily exploit vulnerabilities in AI systems, both during training and serving time, causing them to make egregious mistakes. These vulnerabilities are long-standing weaknesses of deep learning models. We are particularly interested in improving the robustness of LLMs, reward models, and monitoring systems.
  • Multimodal capabilities. Currently, Grok doesn’t have other senses, such as vision and audio. To better assist users, we will equip Grok with these different senses that can enable broader applications, including real-time interactions and assistance.

We believe that AI holds immense potential for contributing significant scientific and economic value to society, so we will work towards developing reliable safeguards against catastrophic forms of malicious use. We believe in doing our utmost to ensure that AI remains a force for good.

If you share our optimism and want to contribute to our mission, apply to join the team here.

Early Access to Grok

We are offering a limited number of users in the United States to try out our Grok prototype and provide valuable feedback that will help us improve its capabilities before a wider release. You can join the Grok waitlist here. This release just represents the first step for xAI. Looking ahead, we have an exciting roadmap and will be rolling out new capabilities and features in the coming months.

数据统计

数据评估

Grok浏览人数已经达到1,382,如你需要查询该站的相关权重信息,可以点击"5118数据""爱站数据""Chinaz数据"进入;以目前的网站数据参考,建议大家请以爱站数据为准,更多网站价值评估因素如:Grok的访问速度、搜索引擎收录以及索引量、用户体验等;当然要评估一个站的价值,最主要还是需要根据您自身的需求以及需要,一些确切的数据则需要找Grok的站长进行洽谈提供。如该站的IP、PV、跳出率等!

关于Grok特别声明

本站哆咪虾提供的Grok都来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由哆咪虾实际控制,在2024-04-06 下午1:58收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,哆咪虾不承担任何责任。

相关导航

暂无评论

您必须登录才能参与评论!
立即登录
暂无评论...