Training Machines in Poetry to Revisit the Kalevala

With every industrial revolution, a cultural revolution is imminent. I sat down with my colleagues Pekka Ahtonen and Teemu Vartiainen from DAIN Studios, a Finnish German Start-up trailblazing the AI scene in Northern Europe, to discuss their latest artificial intelligence (AI) collaboration project – Kalevala AI.

Majella: How did you come up with the idea to train a machine to generate new text based on the Finnish Kalevala epoch written by Elias Lönnrot and published in 1835?

Pekka: My connection to what started as a small internal project at DAIN Studios is quite a personal journey. It actually started with this family story in which my great, great, great grandfather, Eljas (also known as Uljaska) Ahtonen founded the village of Rimpi near the border of Russia in the mid-1800s. The story goes that in 1890, the Finnish artist Akseli Gallen-Kallela met my ancestor Uljaska Ahtonen when visiting Rimpi, and became the model for Väinämöinen in Gallen-Kallela’s illustration of the Kalevala.

The script used to train the machine to generate the Kalevala text was based on the script I had used a year ago, when I was generating text for the TV series The Simpsons. I thought it would be a novel idea to transfer the use of this script to the Kalevala.

Majella: How does Kalevala AI promote Finnish Culture and further the knowledge of the Finnish national Epoch?

I think it is important to know ones cultural roots as it supports personal identity. Our education system has gone through many reforms over the years, and the teaching of the Kalevala has also been influenced from educational reform. Many decades ago, the Kalevala was taught in schools through folk song and memorization. These days, it would be hard to find a school student that has memorized the 20,000 words of the Kalevala. Also, I think it would be quite hard to find anyone in Finland that could write in the prose/ poetic form of the Kalevala – it is a talent that has been lost as our educational priorities changed over the years. The paradox is that on one hand, technology has driven many changes in the Finnish educational system to the point that we diminish the purpose of poetry. On the other hand, it is with technology that we can elevate the purpose of poetry, culture and creative learning. Novel engaging learning methods can be used to create a stronger awareness of our cultural foundations, and I hope that the Kalevala AI can be used for this purpose.

Majella: Are there any features of the Kalevala text that make it more appropriate or tractable for AI text generation?

Pekka: Features such as alliteration, parallelism and the poetic meter (Kalevala meter, a variation of the trochaic tetrameter) support text prediction. Also having thousands of words of text is enough data to train the machine.

Majella: How complex is the technology to generate Kalevala AI text?

Pekka: We did not have the technology or the knowledge on how to do this type of AI text generation properly, say five years ago. We are using a highly complex set of algorithms and neural networks to train the machine. And I think it is important to also point out that using AI for text generation is a fairly new science, we are just at the beginning of our knowledge journey into what may be possible in the coming years.

Majella: What are the limitations and possibilities of generating Kalevala text and other texts with AI?

Teemu: Like with all machine learning and artificial intelligence, the text generation is fundamentally based on recognizing patterns in the data. The neural network is simply predicting the next word based on the previous context, and this process is repeated over and over again. This means that although our neural network can generate text that looks poetic and convincing, the algorithm has no understanding of concepts such as the plot of Kalevala or the relationship between characters in the story. Currently, there is no way for an algorithm to generate a story with meaningful plot and characters. This sort of creativity cannot be reduced to a machine learning task, at least not yet.

Although creating truly creative and insightful stories are outside the reach of AI at the moment, the recent advances in natural language processing have been astounding. Consider the work of OpenAi for instance (https://blog.openai.com/better-language-models/). The text generated by their machine learning model seems so authentic, that the researchers are not releasing the work fully to the public due to “concerns about large language models being used to generate deceptive, biased, or abusive language at scale”. OpenAI’s model works the same as ours – it is trained to predict the next word that occurs in the text. Given enough training data, it can learn to generate very convincing text for contexts that are represented well in the training data. Still, it is limited in the sense that it has no semantic data model or ontologies.

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFlare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non-necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	5 months 27 days	Used to store consent of guests regarding the use of cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
tableau_locale	session	We embed Tableau charts and interactivity on some of our pages. These cookies expire at the end of your session.
tableau_public_negotiated_locale	session	We embed Tableau charts and interactivity on some of our pages. These cookies expire at the end of your session.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_dc_gtm_UA-111640802-1	1 minute	This cookie is used by Google Tag Manager to support Google Analytics on our Sites. It helps us monitor the use and performance of our Sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_JWW0KP3X8Q	2 years	This cookie is installed by Google Analytics 4.
_gat_UA-111640802-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
ai_session	30 minutes	This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform.
ai_user	1 year	A unique user identifier cookie, set by Microsoft Application Insights software, that enables counting of the number of users accessing the application over time.
AnalyticsSyncHistory	1 month	Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries
prism_252943399	1 month	This cookie is used by Active Campaign for site tracking purposes.
visitorId	1 year	By default, the visitor ID is supplied to Coveo UA using the visitor (string) query parameter and kept in the local storage of the user browser. A third-party cookie can also be used to store the visitor ID if the current user browser accepts these kinds of cookies.
WFESessionId	session	These cookies are used by Microsoft Azure Application Insights, which collects site telemetry information, allowing us to analyze how some of our Sites are performing and to perform optimization.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
LinkedIn
muc_ads	2 years	Collects data on user behaviour and interaction in order to optimize the website and make advertisement on the website more relevant.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.

Cookie	Duration	Description
CONSENT	16 years 7 months 20 days 16 hours 15 minutes	No description
GetLocalTimeZone	session	No description
hid	session	No description available.

Training Machines in Poetry to Revisit the Kalevala

Training Machines in Poetry to Revisit the Kalevala

References & more

Details

Computer Vision: Create an API in 60 minutes

Data Governance Roles and Responsibilities

Guiding C-Level Executives Through Business Ethics in the Data and AI Age

DAIN Studios

Studio HELSINKI

Studio BERLIN

Studio MUNICH