LLM researcher Sebastian Raschka: OpenClaw is a milestone


Listen to this article

Researcher Sebastian Raschka calls OpenClaw a prototype of a paradigm, but says he wouldn’t install the autonomous artificial intelligence (AI) assistant on his main computer just yet.

Raschka is a researcher in large language models (LLMs) and a statistics professor at the University of Wisconsin-Madison. He is also the author of the book Build a Large Language Model (From Scratch).

LLMs became widely used in late 2022, when OpenAI introduced its chatbot, ChatGPT. The number of models has rapidly increased since then, and over one billion people use the technology globally, according to 2025 data.

ADVERTISEMENT

Speaking of the biggest LLM improvements, Raschka points to tool calling, which now allows LLMs to interact with external tools, such as calculators. Moreover, LLMs can now access web search, meaning that they don’t need to answer queries from their training memory.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Eglė Kristopaityte
Don't miss our latest stories on Google News

Another major milestone is reasoning models, the LLMs that “think” before answering and break down complex problems into smaller, manageable steps.

In the interview with Cybernews, which has been edited for clarity and length, Raschka discusses the differences between American and Chinese LLMs, the worst-case scenario for ads in AI, and the potential impact on jobs.

Why LLMs fail at the seahorse emoji test

LLMs can assist with complex tasks, but they also seem to fail at very simple things. For instance, there was a trend of asking LLMs if a seahorse emoji exists, and they couldn’t provide an answer. Why does this happen?

I would rather look at the problems LLMs try to solve. When comparing LLMs, I don’t think it's very useful to look at things you would not use LLMs for in order to test them.

There’s a popular benchmark where people ask the LLM to draw a pelican on a bicycle. It’s interesting because it tests the LLM’s capability to understand what you want and draw the picture, but it’s rarely something you would need in practice.

ADVERTISEMENT

When you compare several cars, you look at the price, comfort, and other practical things. You could also ask how well the car performs when driven into the ocean, and say that the best car is the one that survives in ocean water.

However, no one would really find that useful in practice. These types of tests look at the aspects that developers have not optimized LLMs for and don’t have practical use.

Anthropic Claude app
Image by Photo For Everything | Shutterstock

What can people do to optimize their LLM use? Is it about how we write prompts or how we train LLMs?

For LLMs like ChatGPT that can access the internet, prompting remains important, but they are becoming much more robust to it. There’s nothing specifically I would say the user needs to improve their own usage.

However, it’s important to be clear and very verbose, to specify constraints or context, and not let the LLM guess what you want.

The worst-case scenario for adverts in LLMs

Data privacy is a common concern when using LLMs and AI in general. What precautions do you personally take, and what would you recommend other people consider?

If you use a proprietary LLM, I would not put any information in it that you would feel uncomfortable with someone else knowing. That goes with everything on the internet.

LLMs’ restrictions are a bit looser compared to, for example, a private Google Drive, because LLMs have terms that allow you to give permission to train their models on your data.

ADVERTISEMENT

It doesn’t mean they will use the data verbatim, but mistakes can happen and lead to a data leak. So you may want to use a local LLM that runs on your computer, and data doesn’t leave it.

Advertising is coming to LLMs, and many people are concerned about how their sensitive data could be used for it. Are you worried too?

Advertising is not ideal, but it is not unprecedented either. If you go to Google Search, the first answers are sponsored and are labeled as sponsored.

Let’s say you ask an LLM for an unbiased opinion on what the best running shoes are. There are certain ways results can be biased, similar to how Google Search results can be biased.

Implicit advertising is picked up by the LLM when it distils the articles, either through training or via a web search, so the LLMs may also inherit these biases from the training data or the input.

ChatGPT ads
Image by Cybernews.

People are concerned that LLM providers partner with companies. When I ask a question, maybe Nike would show up first, because the LLM has a potential partnership.

The worst-case scenario would be us thinking [the LLM’s answer] is unbiased when it isn't. But this would be illegal due to the Federal Trade Commission’s restrictions on misleading people and its clear-labeling requirements.

I wouldn’t worry about advertising in LLMs more than on a Google Search. People like LLMs because they're advertisement-free, and if that changes, it would be a bummer for them.

American LLMs slightly outperform Chinese models

ADVERTISEMENT

When we talk about American versus Chinese models, the conversation is often political, and we focus on security and privacy. How do they compare from a technological point of view?

The American proprietary models are still slightly better in their overall performance. Benchmarks can be misleading sometimes – if you look at the SWE-bench benchmark, all models look more or less the same.

However, more niche benchmarks show a bigger gap between the models. The gap doesn’t appear in popular benchmarks because the Chinese models have been optimized to perform well on them. They may look better than they actually are.

American models still have a slight edge, but Chinese models have the advantage of being open-weight, which you can run on your own hardware.

There are small models that you can run on your own computer. But there are also bigger ones that perform really well because they need more than just one local computer.

There are companies that you rent compute from, or those companies that have in-house servers. Then you can run these models locally, fine-tune, and modify them.

The advantage of the Chinese models at the moment is that they are more open, in the sense that they share the model itself, not just access to it.

Spotify has recently said that the best developers haven’t written code for two months, thanks to AI and Claude Code. Does it accurately reflect the capabilities of the AI tools, and should developers be worried about losing jobs?

Software engineers are at the cutting edge right now with the latest LLM tools, which can do almost all of their work together with them. The LLM itself doesn't have agency – it’s a tool to which you have to tell what to do.

You still need the person in the driver’s seat to tell the LLM what to build and how to build it, to define the framework and style, and to test it.

ADVERTISEMENT

AI makes things go faster, but you still need to be a good software engineer, which means knowing how to write code, review it, and double-check it. It's not replacing developers, but making them more productive.

If you’re not familiar with coding, you can ask LLMs to build things, but they are not gonna be the next Spotify or the next Stripe.

AI makes things go faster, but you still need to be a good software engineer, which means knowing how to write code, review it, and double-check it. It's not replacing developers, but making them more productive.

Sebastian Raschka

It becomes trickier for entry-level positions, because a person using an LLM can do much more work now in the same amount of time, so this may mean that you need fewer people on the team to do the same work.

This is where I’m a bit concerned that this might affect employment, especially of people who are still learning. But it is still important to learn to code because if you go directly to LLMs, you will be limited to what the LLM can do, and you won’t have the full advantage.

No evidence of AGI emerging in the near future

We hear from tech CEOs and some scientists that AGI is around the corner, while some independent researchers say that superintelligence is impossible. Where does this discrepancy come from, and where do you stand on AGI?

I care more about problems we can solve. I don’t know if AGI is a problem that needs to be solved. I care about LLM helping me with my coding, so I don’t care if a coding LLM can also do other things. I would rather optimize for the application at hand.

People would say AGI is more convenient because once you have it, you can do everything. But I don’t know if that’s efficient.

An LLM doesn’t need to do everything, because then it gets bigger and more expensive. If you know that you're only going to code, you don't need an LLM that can also generate images or videos. I think that's just inefficient.

ADVERTISEMENT

Everyone who says AGI is around the corner is just guessing, because there’s no evidence for it.

OpenClaw attracted significant interest but also raised concerns from a cybersecurity perspective. What does OpenClaw say about the state of autonomous agents right now, and what lessons could be learned to make them safer?

I think OpenClaw is a really interesting project. People got excited about it, almost like they did about AlphaGo a few years ago, which was developed by DeepMind and played against the world champion in Go.

OpenClaw logo and title on a blue background
Image by Cybernews.

It’s a good thing to get people to play with it and see what it can potentially do.

Security-wise, I wouldn’t run it on my main computer. I would run it on a separate computer and proceed with care, and wouldn't give it access to my emails.

I still think it’s a milestone, but I’d recommend using it step by step. Get familiar with it and test its limits to build trust.

Instead of giving it access to my personal email, I would set up a new email account and let it run there. I see it more as a prototype of a paradigm.


Unlock more exclusive Cybernews content on YouTube.