Beyond AI
GPT Engineer – build an app with a SINGLE prompt! Tutorial

The following article is a supplement to the video produced on the Beyond AI channel. We encourage you to watch the video on our channel to get a more complete picture of the topics discussed and to see practical examples of how artificial intelligence is applied in daily life. Visit the Beyond AI channel to discover more fascinating AI-related content!
Watch this material on YouTube:
Today, we are talking about a very interesting project created by Charlie Holtz. He utilized several AI technologies, specifically GPT Vision and Eleven Labs, to create a program that observes him through his computer camera and narrates his life in the voice of David Attenborough, as if it were a nature documentary.

Today, we will show you how to run this script on your own (since Charlie has made it publicly available) and what to change so that the narration is in Polish.
To run this script, you must have a computer with Python installed. Charlie provides a link to his GitHub, where he placed the "narrator" project – visit GitHub.
Now, simply clone the repository shared by Charlie. After entering the "narrator" directory, you need to install all the required libraries using a simple PIP command.

<code>
‘ git clone https://github.com/cbh123/narrator.git
‘ cd narrator/
‘ pip install -r requirements.txt
</code>
Next, to adapt the narration to the Polish language and use your access keys, we need to change a few things in the files. Most importantly, in the narrator.py file, it is worth changing the prompt used to generate the image description to Polish.

> "You are Krystyna Czubówna. Describe the photo as you would in a nature documentary. Be witty. Do not repeat yourself. Prepare a short description. If there is something even slightly funny in the photo, make it a huge sensation! Speak only in Polish."
Additionally, we must adjust the model parameters for the function call generating the audio to use a model that enables text generation in languages other than English, namely "eleven_multilingual_v2".

The only thing left is to export the values of three keys in the terminal: the OpenAI API key, the Eleven Labs API key (where we also need to register), and the key pointing to the specific voice we want to use.

Remember that the keys for Eleven Labs resources must be provided in quotation marks.
Simply register on the Eleven Labs website, and then in the Voice Lab tab, select Add Generative Or Cloned Voice.

In the free version, we can create a synthetic voice by entering Voice Design. We choose parameters such as gender, age, and accent, and click Generate.
Once the voice is created, copy the voice ID, which we will export in the terminal to use it.


After these changes, simply run two files in two separate terminal windows: capture.py and narrator.py.
Unfortunately, the whole process takes quite a while; generating the voice takes roughly as long as reading it out. You need to be patient.

You can hear this effect in the video starting at the 4:29 mark. It is a generated voice, but we can also use the paid version of Eleven Labs to upload any voice sample, such as our own or that of a person with a recognizable voice.
In our video starting at 5:25, you will see the result when we used Krystyna Czubówna's voice for educational purposes. The effect is truly stunning! Let us know under the video what you think of this experiment!
We invite you to visit the Beyond AI channel, which is dedicated to artificial intelligence. Our motto is "Your guide to the dynamic world of AI." Discover fascinating content and stay up to date with the latest trends in the field of AI!

Learn how artificial intelligence can help you unlock YouTube’s full potential and make video publishing easier and more efficient.

Discover how AI can help you pick the perfect present — from personalised recommendations to analysing YouTube reviews and saving time with intelligent suggestions.