In a significant move to enhance the representation of Quebec’s society and languages in artificial intelligence, the Bibliothèque et Archives nationales du Québec (BAnQ) has commenced an experimental phase for a new cultural and governmental databank. This initiative aims to provide AI systems with a richer understanding of Quebec’s unique cultural landscape, including its Indigenous languages. Following a feasibility study completed earlier this year, the project seeks to tackle the persistent issue of limited Quebec-related data in AI training, which has often led to a lack of accuracy and representation.
Transforming AI Training Through Local Data
BAnQ’s ambitious project will focus on creating a comprehensive database that encompasses cultural and governmental content in both French and Indigenous languages. Valérie D’Amour, who spearheaded the feasibility study, remarked, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
The proposed databank is not intended to serve as a public platform for the distribution of creative works; instead, access to the data will be carefully regulated to ensure that it is used responsibly. Marie Grégoire, BAnQ’s president and CEO, emphasised the initiative’s objective of ensuring that AI technologies better mirror Quebec’s distinct cultural identity. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she explained.
Learning from Global Practices
Similar efforts have been noted in other countries, such as Sweden, which has established large collections of Nordic-language texts to assist in developing generative AI models for Scandinavian languages. BAnQ plans to initially draw from its existing collections before considering contributions from additional sources. This approach highlights the importance of local data in creating AI systems that not only understand but also respect the intricate nuances of Quebec’s culture.
The initiative aligns with recommendations from Quebec’s innovation council, which highlighted the critical need for more extensive data on Quebec within AI training datasets. Destiny Tchéhouali, co-holder of a research chair focused on French-language AI at Université du Québec à Montréal, pointed out that Quebec’s culture is still largely underrepresented in the prevailing AI data corpus. He warned, “We run the risk of reproducing linguistic biases and cultural biases. And when we also talk about Indigenous peoples, we run an even greater risk of all these biases.”
Preparing for the Future of Cultural AI
The proposed databank is envisioned as a strategic infrastructure that could help set standards for the identification, cataloguing, and tracking of local content within modern AI systems. However, as BAnQ works to develop this platform, concerns about copyright and creator compensation have emerged. Grégoire countered these worries by suggesting that the new system could offer better protection to creators than the current landscape, which she characterised as “a bit like the Wild West.”
“The database could act as a centralized gateway that would make it easier to compensate creators whose works are used,” she noted, stressing the potential for cultural organisations to collaborate in ensuring fair remuneration and sustainability within the sector.
Despite the potential benefits, some artists are cautious about contributing their work to AI training systems. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, articulated a prevalent concern: “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”
Funding and Future Prospects
The feasibility study outlines a timeline for the platform to become operational by 2029, although D’Amour indicated that this timeline would be revisited following the experimental phase. The initiative is projected to require a budget of nearly $10.5 million over five years, covering both operating and capital expenses. BAnQ has secured $340,000 from the Quebec government for the feasibility study and an additional $750,000 to support the forthcoming 12-month experimental phase.
Why it Matters
This initiative is crucial not only for the representation of Quebec’s cultural diversity in AI technologies but also for empowering local creators and safeguarding their contributions. By building a dedicated databank, BAnQ aims to ensure that Quebec’s unique cultural identity is not lost within the broader AI landscape. As technology continues to evolve, it is imperative that local narratives and languages find their rightful place, shaping a more inclusive and representative digital future. This project stands as a beacon of hope for cultural preservation and innovation in the age of artificial intelligence.