Beyond AI
How to program with AI without any programming knowledge?

In this entry, we present the transcript of an interview with Iwona Białynicka-Birula, which was published on our YouTube channel in August 2024.
Watch this episode on YouTube:
You can also listen to the conversation on podcast platforms:
[Ziemowit Buchalski] Welcome to the Beyond AI channel. Today our guest is Iwona Białynicka-Birula, an expert in artificial intelligence, co-author of the book Modeling Reality, and a programmer at companies such as eBay, Google, and Facebook.
[Iwona Białynicka-Birula] Hi, thanks for the invitation.
Thank you for accepting it. The scope of your experience is enormous. Today we would like to talk about your work experiences, the future of artificial intelligence, and how it is approached in different parts of the world, so let's start at the beginning.
When did your adventure with artificial intelligence begin?
Actually from birth, in the sense that from a young age, I was passionate about computers and programming. However, when I was born, it was an "AI winter," when there was very little development of this technology. It was essentially the domain of science fiction films, and there were almost no practical applications. Even when we studied computer science, there was indeed a lecture on artificial intelligence, but it was a completely different artificial intelligence than now. It wasn't statistical machine learning, but rather expert systems, so I didn't like it much then and didn't think those types of systems would lead to anything.
It was only on the occasion of the book you mentioned, which I wrote at the end of my studies—it was the beginning of the 2000s. Within that book was a chapter on neural networks. I prepared a program for it that demonstrated how such a neural network works. I wrote a network that recognized handwriting. It seemed simply magical to me that something like that could be done, and I think from then on, I always wanted to work on it. It took another 10 years for the development of technology to catch up with my ambitions.
That was a very long time ago, almost a quarter of a century. Today, this artificial intelligence is completely different. We see what ChatGPT can do, we see how images, films are created, and how we can converse. Artificial intelligence allows computers to understand us. There is a huge gap from a simple digit-recognition program to today's state of knowledge.
In your opinion, what caused this, and how did we achieve such progress?
It seems to me that in the vast majority, it was gradual progress rather than a series of breakthroughs. Even because the neural network technology we use now for practically everything—for ChatGPT, for generating videos—is fundamentally no different from the network I wrote for the book, or even from those networks invented in the 70s and 80s. What has changed is the huge progress in computing power. Hardware has become many, many thousands of times more powerful. Progress in access to data was also significant. The internet was born, and suddenly we found a lot of data that could be used.
If I had to name one breakthrough that was an "Aha" moment for me, it happened in 2013 when two papers came out, and the result of that work was the model known as Word2vec. People from Google, by training a model solely on the basis of predicting the next word or the missing word in text from the internet, achieved a breakthrough. Until then, all other models had to be trained on data labeled by humans, which was very labor-intensive and did not allow for the production of large amounts of such data. However, there is a lot of text on the internet, so that was one thing. [1] [2]


So it became possible to train models based on already collected data, without the need to specially prepare it.
What's more, it turned out—which surprised even the authors of those works—that the model learned not only superficial prediction of the next word, but built a model of the world for itself. It learned which cities are the capitals of which countries, which teams represent which cities in which sport. It learned much more than just continuing text. Here you could already see what ChatGPT actually is—a model learned on a very large amount of text which, by predicting what comes next in the text, is able to build a model of the world, solve various issues, math olympiad problems, write recipes, and advise on clothes.
When you left Poland and moved abroad, you started working for giants, utilizing artificial intelligence in projects among other things. Could you tell us more about your experiences, what you did there, which projects, and on what scale?
I can cite a few, as I worked on many different ones. One of the most interesting, in my opinion, was a project right after the breakthrough related to Word2vec. I decided then that I had to work on deep learning because that is the future. I ended up at Google, in the Research and Machine Intelligence organization, which dealt not so much with practical applications but with researching and pushing this technology forward. We observed a trend then that is also visible now—the shrinking of powerful models that work well in the cloud so they can be run on small devices. We set a goal to fit very large models, which were convolutional neural networks for image recognition, onto a small camera. The goal was for the camera itself to know when something interesting was happening and take a photo then.

Google had a project to bring such a device to market. You worked in the software section, preparing software based on artificial intelligence and the minimization of huge models to make them fit. Did it succeed?
To some extent, yes. Eventually, a device was created that was available on the market called Google Clips. However, this product was a commercial failure because its goal was not to create a consumer device that would actually be useful to someone, but to see how far this technology could be pushed. Therefore, it wasn't something useful for consumers. Exactly something like this is happening with companies like Humane, which released the AI PIN, or the Rabbit R1. It was similar at Google; someone decided too much money was going into it, and we need to sell it and sink even more money. [3] [4] [5]



So there was an idea that we have the technology, let's make a product out of it, maybe it will work, maybe it won't. It was similar with Google Glass—products that didn't catch on but worked technologically. The work was interesting, though on a project that did not achieve commercial success.
What other projects, perhaps at other companies, also related to artificial intelligence, did you perform?
Well, maybe I'll talk about the latest one I'm still working on now. I work at a smaller company, no longer at such a giant, but in a fast-growing startup called Cresta, which is already a unicorn. Yes, it's already a unicorn, although we don't have such a large scale yet. We build software for customer service centers based on artificial intelligence. It's fascinating to me how much generative AI has changed in this direction, and this project demonstrates that very well.[6]

Where is the problem? The problem in every customer service center is that the company wants to detect that something happened in a conversation, and that "something" is very different depending on the company. If it's an airline, it might want to detect when a customer wants to change their flight reservation. Or if it's a bank, it might want to detect when a customer says they don't want to pay back a loan. Until very recently, this took a lot of human work, both low-skilled and high-skilled, to build such a model. Low-skilled people read these conversations and labeled them, doing supervised learning, saying: "This is a positive example, this is a negative one."
All to train a model that will then be launched during a phone call. It will listen in on the conversation and show the agent conducting it whether they used a certain phrase, or whether they asked an additional question, whether they made a sales attempt, or how they handle a refusal.
And to be able to train such a model... I understand there are different approaches. One consisted of preparing an entire dataset. What do the newer ones consist of?
Well, in the newer ones, there were two elements: you had to collect that dataset, and then you needed machine learning specialists to train specialized models. And it takes place like this: first, we automatically build that prompt for the generative AI. Because it seemed that for something like, say, whether an agent expressed empathy, it would be enough to ask GPT whether the agent expressed empathy in this sentence. But what turns out then? That each of our clients has something completely different in mind when they say "empathy." So we start by having the generative AI interview our clients about what specifically they mean, giving them examples and asking whether they are positive or negative examples. But the machine does this completely automatically. And then, once we have such a prompt prepared, we can use GPT again to label a great many examples. We cannot directly apply such a model in production, of course, because we have millions and millions of conversations a day and that would simply be very expensive. But the model can label a set for us. And then we can automatically train a much cheaper model that will detect these things.
You prepare a specialized model for each client that is tailored to them?
Yes, although now the client can do it themselves in our self-service platform, so we can offer this to a very large number of clients with a small team.
—
At this point, we talked about working in AI in the USA; you can read this part of the conversation in a separate post – click.
—
What are your plans for the future in the context of research on artificial intelligence? How do you see your future?
It's generally hard to predict what will happen in the future. However, I see such trends now, two trends actually. Toward smaller but more specialized models and toward agents. So, what we saw in 2023 was a kind of infatuation with the capabilities of GPT and other similar models, for example from the company Anthropic, which were available via API. And generally, suddenly we had—we could perform various tasks that until then were the domain of human work.
However, over the year we built a great many different dependencies on these models into our systems. And it turned out that, first, it is quite expensive—you pay for those tokens—and second, it is quite risky because OpenAI can fall apart at any moment; it is a fairly unstable company. Besides, they are constantly changing these models and it might turn out that what we built around a certain model no longer works because they swapped something there. For various reasons, it is much more profitable to have your own model.
It is possible now because we have access to these open-source models, such as Mistral, which are much smaller and can be fine-tuned—meaning using a small amount of data, which can also be generated by a larger model, you can adapt them to a given task. And this is immensely profitable because such models become 100 times... Well, in our case they calculated it's about 100 times less expensive than using GPT, say 4, and we have full control over them.
And what about quality?
The quality is even better because they are tailored to a specific task. So if something is for everything, it's for nothing—there was an ad like that once. However, when you adapt a model very specifically to a given task, it often becomes even much more accurate.
So such fine-tuned models, for example, which understand bank conversations or airline conversations well, are the future, and even the present.
We are mainly working on that now. A trend is emerging where all these companies train such adapters—meaning they don't actually train all the weights of such a model, but a very small subset. And then you can have a great many of these very specialized models. So I think I'll be spending a lot of time on that now. And another such trend, which is fascinating in my opinion, is these agents—AI agents. Until now, the results of generative AI: GPT wrote or DALL-E drew something while looking at the image it generated. However, it's increasingly possible now to connect these systems to other systems. For example, you can give a model access to a search engine and to a travel agency API and to a weather prediction API and say: "Plan a trip for me." And this model can go there, check ticket prices, see where in the world the weather is nice, where there are nice views, and come back and suggest a whole trip to me, or even—if we trust such a model—buy it on our behalf.
This is especially promising in the context of mobile phones, which are the interface used by most people and which are still more laborious to operate than computers, where there is only this one small screen. Building those agents for Apple and Android that will do everything will be an absolute breakthrough in terms of communication with a computer, because we will say everything only to computers, just like in Star Trek, and they will do it all for us.
So you predict that eventually Siri or another voice assistant on the phone will start performing what we want it to perform, because it will be specialized and use specialized agents that will be able to either check the weather, or book a hotel stay, or search for events in our city.
If you were to go back those 20-plus years ago, when you were creating, writing the book—did you at all predict then what would happen and that it would go in this direction? And did the Iwona in the year 2000 at all dream of what is happening today regarding such a large development of technology that you are observing now?
I definitely dreamed about some things. Certainly that neural networks would catch on and that in general—going back even from networks—that this statistical approach based on learning from data would ultimately win as a paradigm. Well, I hoped for that and bet on it, but of course many things were hard to predict. I didn't predict that the passion for playing computer games, specifically shooters, would ultimately lead to this. Because they led to the development of those graphics cards, which turned out, by chance, to be perfectly suited for neural networks too.
I also didn't predict that predicting the next word would lead to universal models—meaning those that are not trained for a specific purpose but are able to solve general problems, and that those models could then be applied to other things.
So that such an algorithm that takes and predicts the next word would actually lead to the creation of a model that understands language, understands semantics, and which can then be used in many, many other applications.
Yes, exactly.
We are here today, it's the present, and let's go 10 years forward. What will happen?
P(doom)! [laughs]
What will happen? Well, very difficult—it's hard to predict. I'm talking about these rather short-sighted trends like fine-tuning and agents. And what will be next? Well, I think no one is able to predict that.
What might play a huge role is progress in medicine, because we can really reach a point at some moment where people will no longer die. Unless in accidents, but for example not from old age, and then I think one will have to invent perhaps completely different social models for this humanity, because so much in our civilization is based on the fact that there is this cycle of being born and dying.
It seems to me that we might manage to solve, to build models that are able to solve new problems. We don't have that yet, even though it might seem like it—take ChatGPT, we give it a math olympiad task, it solves it. Does that mean it can reinvent the theory of relativity from scratch? That's not true; they are not particularly well-suited to solving problems that are too different from what they have already seen, so actually these models... are not able to push science forward. However, many people are working on that, to enable such reasoning in these models, and that too could be a great breakthrough because models could start truly pushing science forward.
And perhaps one more element is robotics. At the moment, these models deal terribly, terribly poorly with interaction with the physical world. We still cannot create autonomous cars on a large scale—even though driving a car is a relatively easy problem for a human, not to mention various other such actions as "go and put the plates in the dishwasher"—robots are already having a very hard time with that. However, there is also a lot of progress in this field, and if this progress continues in this direction, we will have these agents not only in the computer or phone but also those that will help us in our daily lives.
Meaning they'll put dirty dishes in the dishwasher, take out clean ones, and vacuum as well...
And vacuum as well and serve coffee.
On that optimistic note, we end our conversation. Thank you very much for participating in our podcast.
Thank you very much, I was very pleased.
—
Read the second part of the interview in a separate entry – click. We talk about the specifics of working in artificial intelligence in the USA.

This was a transcript of one of the episodes on our YouTube channel. If you want to hear more conversations and comments on artificial intelligence – visit the Beyond AI channel.

How do leaders of the Polish retail market talk about artificial intelligence? Check out interviews with the most interesting guests of Retail Trends 2024!

See how artificial intelligence supports blind and visually impaired people in everyday life – from navigation and object recognition to greater digital accessibility.