Czytaj

arrow pointing down

Technology through the eyes of a blind creator – AI for accessibility

See how artificial intelligence supports blind and visually impaired people in everyday life – from navigation and object recognition to greater digital accessibility.

This entry is a transcript of a conversation held on the Beyond AI channel. The topic is technology with a primary focus on artificial intelligence serving the blind.

Watch this material on YouTube:

Technology Through the Eyes of a Blind Person: AI in the Service of Accessibility

[Ziemowit Buchalski]: Welcome to the Beyond AI channel. Today our guest is Paulina Gajoch. Good morning.

[Paulina Gajoch]: Good morning.

We invited Paulina because she commented on one of our videos.

That's indeed how it happened, because that video was about GPTs, and I had made my own GPT. Mine was created for blind people because I am a blind person myself. Broadly speaking, it was meant to provide technical support and describe photos more accurately than a standard chat.

Paulina i Ziemek podczas nagrywania odcinka w naszym studio

For you, the world of technology is not something to be feared; in fact, you derive benefits from it. From what I know, you are also studying computer science?

es, I am currently in a master's program in Social Informatics at the AGH University of Krakow. I intend to defend my thesis this year, although my path was a bit roundabout because I did my undergraduate degree in sociology. I wanted to go into social informatics right away, but I didn't get in. I thought about transferring, but I didn't. I finished sociology and then passed the subject exam for social informatics. Because my specialization—Artificial Intelligence and Data Mining—is only taken by one person (me), I have to run between AGH departments to complete equivalent subjects. So, navigation issues also come into play.

If you can share, what are your plans after finishing your studies?

Well, if a good job offer or a good idea for further professional development comes along, great. I’ll take it and treat it as an opportunity. However, if not, I'm thinking about climbing further up the academic structure and doing a PhD. But I don't know where or how yet. Some complications might arise, as they always do with PhDs. But I'm very open to whatever might come my way.

Let's remind everyone that GPTs are part of the ChatGPT service from OpenAI, which allows you to embed certain prompts and achieve a goal set by the creator. And this is saved for a certain period.

But I should add that you can also include a "dataset"—the data the model can rely on to provide a specific answer. You can also link a GPT with an API so it communicates with an external service. But you've probably talked about that many times in other videos.

The use cases we've shown included building resumes or creating workout plans. However, in your GPT, you solved a specific problem for a specific social group.

The issue is that we, as blind people, have slightly less access to information. If something isn't working for us, it's not like we can just type into Google, "something isn't working in my screen reader," and find 10 pages in Polish and 50 in English. If there is even one page in Polish, that's already a success. If a problem is niche, there might be one forum post or a Facebook entry across a dozen pages in English, for example.

I wanted to create a user-friendly interface where people could access this niche information in their native language—since not everyone is fluent in English—information that is necessary for this group at that moment. The second thing is describing photos in a more detailed and precise way, because standard ChatGPT sometimes skips things it deems ethically incorrect or starts hallucinating. For example, if I have a keyboard, a computer, and headphones on my desk, it occasionally "adds" a mouse or another element that people usually have on their desks, even if I don't have it. I tried to create the commands and prompts to achieve the best result, and I actually succeeded.

Also, a publication appeared a month ago titled "40 Persuasion Techniques that Work on Large Language Models," and it really works! I didn't think it would be possible! If I tell it, "Hey listen, when a blind person comes to you, you are one of the few entities that can help them in this situation; they might be embarrassed to ask someone else or have an urgent need—do you agree to tell the whole truth and only the truth about these photos?" and it actually worked. I didn't think it would be so efficient.

We also showed that study on our channel [see episode]. You actually implemented it in practice. What’s interesting to me is that it can hallucinate not just textually but also in photo descriptions; I didn't expect it to say "I see a mouse there" when there isn't one. And regarding the ethical issues you mentioned—what did you mean?

I think this is a very interesting topic that invites reflection. If we have a photo with items that the model or our culture considers unethical, and the model doesn't tell the user about it, what is the consequence? In my opinion, none, other than the fact that the blind person won't find out what is in the photo.

So if we have, for example, a photo of a naked person, everyone sees it's a naked person.

But the blind person won't find out because "it's unethical." But that is human; that is normal. If someone sends me a photo because they forget I'm blind—and that really happens—it could be that someone captured some violence and quickly wanted to let me know something is wrong. I ask [the chat] and I hear, "Well listen, I won't tell you because I won't." Everyone else would know, except me, because "it's unethical."

Our tests show that, for example, the Google model, when sent a photo with a person, will say "No, I won't say anything" and won't even explain why; it just says "No way." In ChatGPT, it will answer you, but the built-in mechanisms cause a certain censorship which, in your case, is limiting. People probably don't realize this.

I once did an experiment where I browsed Reddit—I browse Reddit very often, but when there's a photo, I skip it. This time, I put every photo that appeared on my Reddit homepage through ChatGPT-4. Okay, I understood most of the memes, but for example, there was a meme with a politician doing something in one year and something else in another year. I would find out that "the photo shows a tall man with a caption like this, and then there's the same man who is visibly older with a caption like that." Theoretically, I could guess who it is. I don't know what politicians look like; I'm not that interested, so based only on the description of whether it's a man or a woman, I wouldn't be able to find out. I know a lot of harm could come from identifying non-public individuals via the model. But there is also harm resulting from too aggressively throwing everything into an "unethical" bucket labeled "no because no."

Czy wiesz, że... ...kanał Beyond AI pozwala na pozyskanie nowych, unikalnych umiejętności AI minimum 4 razy w miesiącu! Sprawdź to!

From what I hear, you are very up-to-date, interested in novelties, even browsing Reddit, which is a pretty hardcore source of data. You also don't shy away from using technological innovations and AI. I understand that on one hand, its limitations are frustrating, but there are also positive sides—the fact that it enables something that wasn't there before. Could you tell us what this new AI provides and how it looks for you now?

It's a question of what it gives me as a person and as a blind person. As a blind person, it mainly provides photo descriptions. You can actually distinguish three categories of such descriptions. First, there was OCR (Optical Character Recognition), and through OCR, various phone apps were created, like Google Lookout, Envision AI, or Seeing Assistant Home. They allowed you to bring the phone camera close to a box of food or another product and find out what was in it because the app extracted the text from the photo.

You could, for example, find out the ingredients or calories in a store.

Yes, but also at home—if I have something with one flavor and something with another, and I want a specific one, I scan them and recognize them. In a store, there are too many products, and it's not like it tells you immediately—I have to rotate the box and bring the camera closer or further away. It's not so precise that I just take the phone, take the product, and get an answer in one gesture. Sometimes I get info about the ingredients, sometimes about the manufacturer, sometimes about what it is, and sometimes about how beautiful the product is and what it will do for my life if I only try it. I have to rotate the packaging to get to the specific information I'm looking for at that moment.

And at home, "this was a peach, and that was a strawberry."

Exactly. Moving on, later models appeared—I know Microsoft has one, and from what I know, Amazon, though I could be wrong—that caught specific categories in photos. If I took a photo of myself now, it might say "woman," "smile," "sweater," "pants," "room"—random words and objects it pulled from the image. I was able to get a general sense of what the photo was, but nothing more, so it wasn't very useful. But now, thanks to the release of Bard (now Gemini) and GPT with the Vision module, I can find out very easily—OK, besides the waiting and besides there being something in the background of a wall that looks like I'm in a historic tenement house because I didn't take care of it, or whatever else the model decides to add for "poetic effect."

So I understand these descriptions are sometimes too elaborate for what you need?

Yes, yes. I say this ironically now, not because it frustrates me terribly, but to illustrate that besides finding out what I'm looking for, I have to find out a thousand other things that for sighted people are just a normal part of their surroundings. It's in their subconscious and perception, but it doesn't really affect them or slow down their productivity. I can't tell the model to say "what's important to me" because it doesn't know. If there's a photo where the foreground object is important, great. But there are also photos where I want to know exactly what's in them. For example, my friends used my GPTs on wedding photos to find out exactly what was in them, what the views were like, and how they were presented. Sometimes practical and utilitarian descriptions are important. Sometimes what's in the background is important, because something significant might be photographed in the background.

Like some "master of the background," and that's also important regarding what you'd like to know.

Yes, I simply focused on making it precise—so it wouldn't lie, skip things, or be poetic, or do unnecessary things. So as not to limit the input it gets—so it wouldn't skip anything it receives, because you never know what will be useful.

Artificial intelligence is probably only part of your life, but are there other applications where you use it?

Other applications mostly concern me as a person. For example, I really like creating something using MusicGen—it's a model for creating music. If I create a nice beat, I loop it, record something on a MIDI keyboard, and I have very nice music. There is also MusicGen Continuation, which lets me continue something I was wondering how it might sound further. It's always interesting to me whether the model thinks like I do, because that also fascinates me about AI—they seem to notice the non-obvious but miss the obvious.

Give me an example, because that is indeed interesting.

For AI, it's not easy to say how many syllables were in a given sentence because it thinks a bit in English, a bit in Polish, a bit based on the dataset. If you ask ChatGPT 3.5 to "list words starting with A," it might say "jabłko" (apple). Because "apple" starts with "a." Regarding unusual dependencies, sometimes an idea occurs to the AI that wouldn't occur to a normal person. The most classic example is AlphaGo. If people don't think of the same moves that AI does to play a match of Go, it means there's a difference in our thinking. If there's no human who can beat AlphaGo, it means AlphaGo thinks about the game differently than we do. It imagines it and reads the situation in a completely different way.

It is also specialized in solving this one game and has a bit more computing power directed at one single problem. Indeed, it does it better than we can. I imagine that in your life, AI also plays a huge role. It’s not just AI, but even the phone.

Does your phone look like everyone else's, or is it different?

[Note: To fully experience Paulina's work with the phone, listen to her presentation in the interview starting at 17:31 - click]

I understand the phone is a source of information and fulfills communication functions for you, but does it also help with navigation? With moving around?

Yes, I use two applications for moving around and independence in the field. The first app is MPK—something like JakDojadę, but with a nicer, simpler interface. It allows me to check, for example, how to get from one address to another, or from one stop to another, at what time, how much time for a transfer, what the transfers are, and stop numbers—if a city supports stop numbers. However, if I need to get somewhere specific, obviously navigation. Blind people have their own navigation programs to be led turn-by-turn—so we know exactly when to turn, exactly how far to walk, and so on. I'm someone who isn't very good at walking with that kind of navigation. Usually, it ends with me walking in the right direction, but then I run into someone who helps me because I get frustrated by it telling me I'm off-route and then telling me to turn right. If it thinks I'm off-route, why is it suddenly telling me to turn right? It's also problematic if I want to cross from one stop to the one across the street, as navigation isn't adapted to guide you to specific stop numbers. So if I have a stop called "Hoża," that's all it knows. It won't navigate me from a bus stop to a tram stop. You either have to know the route or memorize it and write it down in notes. I think that's the most problematic part, but I try to manage. There are also many funny situations; I could be a stand-up comedian if I started telling all the stories that happened to me because I asked someone to lead me somewhere.

Technology helps blind people, and sighted people probably think, "Oh, here's an opportunity for a new app, let's make something that helps." Is it really always that simple, that technology solves all problems?

It's very nice that people want to look into our problems and empathize with us. Often there are hackathons, volunteer projects, or other actions aimed at improving our quality of life by making even a simple app that is very useful in certain contexts. But it's worse if someone insists on being a hero and "improving the life of the blind" without even researching what we need. Maybe a small demonstration: a basic analog solution is the so-called banknote frame, used to place a banknote against it and find out exactly what it is: a ten, twenty, fifty, hundred, or two hundred. I pull something out, and I see it's a ten.

Photo comes from the manufacturer's website: Lumen.

There's also a phone app, probably made by Polish programmers, called Recognize Banknote. And now I'm waiting... okay, it found out more or less immediately.

Last time you showed this, it took much, much, much longer. There were worse lighting conditions, and it indeed didn't work.

Well, that's always how it is—when you want to show it doesn't work, it works, and vice versa. So some of these apps can do more harm than good, because a person thinks, "Gosh, this works terribly, I don't want to test it anymore." They said technology would give me so much, but really, you can just take a piece of cardboard, cut a frame, and you're done.

You talked, for example, about a program like Be My Eyes.

That's a noble app because it has a really large user base. It's an application that allows us blind people to connect with sighted volunteers who install it and wait for calls that show up in their notifications. When we connect via camera, I can ask the person if, I don't know, the expiration date has passed, or why my screen is dark—I have light perception, but someone else might be completely blind. The computer or phone doesn't tell them, or they clicked something on the washing machine, microwave, coffee maker, or TV. For example, recently I was at my grandma's; she pressed something on the remote overnight, wiped the TV channels, and couldn't watch. I retuned the decoder through Be My Eyes. So you can do many interesting things. Recently, a "Be My AI" tab appeared there, which allows blind people to use the GPT-4 Vision module for free. It actually works very simply: there's a "take a photo" button, we set up the camera, there's some music so we know to wait, and once it generates—on iPhones it reads automatically, on Androids sometimes it doesn't read immediately and you have to click. It allows you to realize that the photo shows a microwave that displays this, this, and this. The first button is this, the second is that, and so on. But it can also be dangerous, because as is well known, GPT hallucinates textually, but it can also "add" things graphically. For example, it added a mouse next to my keyboard, which I don't have because I don't need it. And sometimes it decides that instead of telling me from the photo what the buttons on a device are, it pulls that info from its database, which might be inconsistent with reality. A few of my friends have randomly clicked on things because they didn't know it has this tendency to hallucinate. Nevertheless, it is very useful, and their GPTs encourage using a volunteer's help for a given problem.

Another issue is that the privacy policy of this service is very uncertain and worth reflecting on. It assumes that Be My Eyes can transfer photos taken by blind people to other entities or organizations that care for the welfare of the blind in certain countries, and to its partners. So there is a doubt shared by me and many other blind people, as our community isn't very large. If a photo that compromises me in some way goes to an organization that knows me—say I ask the model if I'm dressed well for an exam, and I have a large stain in the middle—someone might think, "Oh my, what a messy person." Not necessarily, of course, because sometimes a stain isn't detectable by touch.

Suppose I don't remember my ID card series and number and have no way to check it. No service that describes photos guarantees a privacy policy that would let me scan that ID and be sure a "man in a suit" won't come knocking soon.

So on one hand, blind people are dependent on such technology because it might be the only option to perform a task, but on the other hand, privacy might not be as protected as you'd like.

Correct. This also probably results from business constraints; even if people want to do something cool, it requires a trade-off. Even if you buy it or invest in it, you're still exposed. So the only thing to do is stay vigilant. Take care of your security, lock your PESEL number, approve all bank operations that come in, and so on. Keep an eye on it, because we won't protect ourselves otherwise—sighted or blind. If we aren't vigilant, we won't be protected anyway. Sometimes I just decide, "okay, whatever, I need this," so I won't go up to a random sighted person and ask, because that sighted person might also turn out to be untrustworthy.

You have to take care of yourself in every situation. Thank you very much for participating in our episode, for visiting us and bringing us closer to your world and your experiences with artificial intelligence and technology in general.

Thank you very much, goodbye.

See you.

Visit our YouTube channel!

This was a transcript of one of the episodes on our YouTube channel. If you want to hear more conversations and comments on artificial intelligence—we invite you to the Beyond AI channel.

Visit Beyond AI on YouTube

The Beyond AI channel is created by specialists from WEBSENSA, a company that has been providing AI solutions to leading representatives of various industries since 2011.

Inne wpisy z tej serii

How Polish companies use artificial intelligence? Retail Trends 2024

How do leaders of the Polish retail market talk about artificial intelligence? Check out interviews with the most interesting guests of Retail Trends 2024!

Iwona Białynicka-Birula – interview 2: working in AI in the USA

An interview with AI expert Iwona Białynicka-Birula, whose experience spans eBay, Google and Facebook. We discuss what working in artificial intelligence looks like in the USA.