Czytaj

arrow pointing down

Can ChatGPT and Gemini pass the 2024 Polish language exam?

Can ChatGPT and Gemini pass the 2024 Polish language exam? See how different AI chat models handle real matura tasks and where they succeed or fail.

Na tej stronie wykorzystujemy grafiki wygenerowane przy pomocy sztucznej inteligencji.

The following article is a supplement to the video created on the Beyond AI channel. If you are interested in the subject of artificial intelligence, be sure to visit our channel, where you will find even more valuable content on this topic.

Watch this material on YouTube:

Testing AI Models Using the Polish Language Matura Exam as an Example

Artificial intelligence (AI) is becoming increasingly common and versatile, and its capabilities are sparking curiosity in both the world of technology and education.

(download the original Matura exam sheet)

In one of the experiments on our YouTube channel, we decided to test how two advanced AI models, ChatGPT and Gemini, would handle solving tasks from the Polish language Matura (high school exit) exam.

Our goal was to check to what extent these models are able to not only understand but also correctly solve complex tasks that require text analysis, understanding context, and precise formulation of answers.

Task: The Polish Language Matura Exam

We began the experiment by providing both models with a set of tasks from this year's Polish language Matura exam.

Ziemowit and Michał, our test participants, were tasked with:

  • Deciding whether they would cooperate with each other or if it would be better to act individually;
  • Inputting the exam sheets into the AI models and asking them to solve them;
  • Comparing the results and evaluating how well these two models handled solving the tasks.

Challenges and Approach to Solving Tasks

Our participants decided that their models would compete for the best score. One of them used ChatGPT, and the other used Gemini. They also chose different strategies for approaching the task.

The models began generating answers, which we then compared with each other.

Examples of Tests on Specific Tasks

During the test, we noticed that ChatGPT and Gemini differed not only in the speed of generating answers but also in their quality.

1. Analysis of an excerpt from "Pan Tadeusz" by Adam Mickiewicz

(Below you can download the full Matura sheet and preview the answers generated by the chats during the test - “files for review” section)

  • Chat GPT answered both sub-points correctly.
  • Gemini made a mistake in the answer to the second sub-point.

Due to these differences, Gemini lost 100% of the points for this task.

2. Solving a task related to image interpretation

  • Chat GPT, thanks to its Vision model, identified the figure of a man lifting a globe on the poster, which it correctly interpreted as a symbol of Kordian's pursuit of great deeds and struggling with global challenges. However, it completely got the rest wrong.
  • Gemini, in turn, described the presence of mountains on the poster, which were supposed to symbolize Kordian's moral dilemmas – the problem was that there were no mountains on the poster at all, which indicates a serious error in image analysis.

Both models failed this task, scoring zero points each.

3. "True/False" type tasks

Both models had difficulty with "True/False" tasks. For example:

  • ChatGPT correctly recognized tasks related to claims about the work, but sometimes had trouble interpreting the instruction.
  • Gemini also struggled with similar difficulties, often generating answers that were unclear or even incorrect.

4. Word counting and summary analysis

In one of the tasks, participants asked the models to generate an answer not exceeding five sentences.

  • ChatGPT generated a very long answer, forcing the hosts to shorten the text manually.
  • Gemini, on the other hand, handled text compression better, but sometimes omitted key information, which affected the quality of the final answer.

Results

After completing the test, we compared the results of both models.

  • ChatGPT scored 18 points out of 25 (72% effectiveness).
  • Gemini obtained 11 points out of 25 (44% effectiveness).

Even though both models passed the Polish language Matura, the differences in their performance were significant.

ChatGPT was better at generating correct answers, although it had trouble describing them concisely, only shortening the text after several additional instructions.

Gemini, while it generated more detailed and specific answers, often got lost in the details and generated answers inconsistent with the answer key.

Conclusions from the Experiment

Testing AI models in the context of solving Polish language Matura tasks provided valuable conclusions:

  • AI can be useful in the educational process, but its limitations must be remembered.
  • AI models, despite their advanced architecture, can generate answers that require further analysis and correction.
  • The experiment participants noted that AI models are more effective when they have clearly defined tasks and specific guidelines to follow.
Czy wiesz, że... ...osoby oglądające filmy na naszym kanale regularnie zdobywają unikalne umiejętności w dziedzinie AI. Zobacz sam(a)!

FAQ

1. Can artificial intelligence completely replace a human in solving Matura tasks?

No, AI can be a helpful tool, but it still requires supervision and correction by a human, especially in tasks requiring precise interpretation.

2. What are the biggest challenges in using AI in education?

The biggest challenges are the accuracy and contextuality of AI-generated answers and the need for supervision over the process.

3. Can AI be used for studying and preparing for the Matura exam?

Yes, AI can be useful in studying and preparing for the Matura exam by helping with text analysis and generating answers, but it will not replace the critical thinking and in-depth analysis that a student must perform.

Glossary

  • Artificial Intelligence (AI) – a field of computer science dealing with the creation of systems capable of performing tasks requiring intelligence, such as understanding natural language, recognizing images, or making decisions.
  • Language Model – an advanced algorithm capable of processing and generating text based on huge data sets.
  • AI Hallucinations – errors generated by AI models, involving the creation of false information that has no basis in reality.
  • Answer Key – a set of correct answers or guidelines used to grade exam tasks.

Files for review:

Thank you!

We invite you to visit the Beyond AI channel, which is dedicated to artificial intelligence and its various applications. This is your guide to the dynamic world of AI!

Visit Beyond AI on YouTube

The Beyond AI channel is created by specialists from WEBSENSA, a company that has been providing AI solutions to leading representatives of various industries since 2011.

Inne wpisy z tej serii

Did Kuba Klawiter handle the review of the “Polish ChatGPT”?

Did Kuba Klawiter manage to review the “Polish ChatGPT”? Discover how he rated the new Polish AI model and what stood out during his test.

Can an AI chatbot handle a history homework assignment? | Test

We tested AI chatbots to see which one handles history homework best. Find out how well AI performs on real school assignments.