Musk’s Grok 2: A Deep Dive Into the Capabilities of the Latest AI Sensation

Exploring the Capabilities of Grok 2: Elon Musk’s Latest AI Model

With the ever-evolving landscape of artificial intelligence, new models continue to emerge, each promising to outdo the last. Among the latest contenders is Grok 2, an uncensored model by xAI, a company founded by Elon Musk. Available on platforms like X.com (formerly Twitter), this AI model has garnered attention for its impressive capabilities. Here’s a deep dive into what Grok 2 can do, based on a series of tests designed to push its boundaries.

Putting Grok 2 to the Test

The first test for Grok 2 was to write the classic game Tetris in Python. This task, while straightforward and well-documented, proved to be a significant challenge for the model. The process was notably slow, highlighting a need for improvements in the tokens per second rate. Upon completion, Grok 2 generated approximately 100 lines of code, accompanied by an explanation of the game mechanics.

However, the excitement waned when the code, which utilized Pygame, failed to run due to an INT error. This was a promising sign, indicating the test’s complexity. When the error was pasted back into Grok for troubleshooting, it suggested changes to the shapes’ definition and their access methods. Despite implementing these changes, the code still didn’t work, marking a fail for this attempt.

Switching gears, the model was then tested with the snake game, another Pygame-based task. This time, Grok 2 succeeded, running the game without errors. This success indicated that while the model struggles with more complex tasks like Tetris, it can handle simpler projects effectively.

Tackling Real-World Problems

Beyond game development, Grok 2 was tested with practical tasks. One involved determining if an envelope met postal size restrictions by converting dimensions from millimeters to centimeters. Grok 2 provided a step-by-step explanation and confirmed the envelope was within the acceptable size range, passing this test without issue.

Next, the model was asked to count the words in a prompt response. It correctly counted 10 words initially but faltered when asked to count the words in both the prompt and response, incorrectly totaling 19 instead of 25. Despite this, the first part was correct, earning a pass.

A more abstract question involved determining how many killers remain in a room after one is killed. Grok 2 broke down the scenario logically, concluding that three killers remain—a perfect explanation.

In another test, Grok 2 reasoned through a scenario involving a marble and a glass, concluding that the marble was likely on the table or nearby, with the possibility of it being in the microwave. This well-reasoned answer earned another pass.

The model was also tested with a complex geographical problem: walking from the North Pole and turning left. Grok 2 explained that due to the small circumference near the pole, one would walk less than 2*pi kilometers. This answer, corroborated by other models like ChatGPT and Cloud 3.5, demonstrated Grok’s robust reasoning capabilities.

Language and Ethical Reasoning

Grok 2’s abilities were further tested with language tasks. When asked to generate 10 sentences ending with the word “apple,” it succeeded. However, it stumbled when counting the number of ‘R’s in “strawberry,” incorrectly stating there were two instead of three, marking a fail.

In numerical comparison, Grok 2 correctly identified that 9.9 is bigger than 9.11 and provided a clear explanation. Testing for censorship, questions about breaking into a car and making drugs revealed that while Grok provided educational information and safety tips, it refused to offer harmful instructions, indicating some level of censorship remains.

Finally, when asked about the ethical dilemma of pushing a random person to save humanity, Grok 2 offered various perspectives and concluded that it would be acceptable under certain hypothetical conditions. This nuanced approach highlighted the model’s ability to handle complex ethical questions.

Final Thoughts

While Grok 2 doesn’t yet have vision capabilities, its performance across a range of tasks—from coding and practical problem-solving to language and ethical reasoning—shows promise. Despite a few missteps, its strong logic and reasoning capabilities make it competitive with other models, and in some areas, it may even surpass them. As AI technology continues to advance, Grok 2 stands out as a noteworthy addition to the field.