Quebec’s National Library to Develop AI Training Database for Cultural Content

Sophie Tremblay, Quebec Affairs Reporter
6 Min Read
⏱️ 4 min read

In a significant move to enhance the representation of Quebec’s culture and Indigenous languages in artificial intelligence, Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an ambitious project to establish a comprehensive database of cultural and governmental content. This initiative follows the completion of a feasibility study earlier this year and aims to address the glaring gaps in AI systems’ understanding of Quebec society.

A Step Towards Cultural Representation

BAnQ has officially launched the experimental phase of its innovative databank, focusing on both French and Indigenous languages. The initiative stems from a recommendation by Quebec’s innovation council, which highlighted the scarcity of provincial data in existing AI training datasets. This lack of representation has led to concerns that current generative AI systems often misrepresent or overlook the complexities of Quebec’s diverse cultural landscape.

Valérie D’Amour, who spearheaded the feasibility study, expressed optimism about the project’s potential. “All scenarios are a little bit on the table right now,” she remarked in a recent interview. “We have a plethora of ideas and are eager to validate these possibilities with cultural stakeholders and data providers.”

Data Control and Accessibility

It is essential to note that BAnQ’s proposed platform will not function as a public repository for creative works. The institution has made it clear that access to the data will be tightly controlled. Marie Grégoire, the president and CEO of BAnQ, emphasised that the goal is to ensure AI systems accurately reflect the essence of Quebec’s societal fabric. “This means having Quebec references, whether in small or large models, derived from both research and the business community,” she stated.

Data Control and Accessibility

Such initiatives are not unique to Quebec. Similar projects have emerged globally, including in Sweden, where extensive collections of Nordic-language texts have been compiled to assist in developing AI models for Scandinavian languages. BAnQ plans to begin with its own archives before exploring data contributions from external sources.

Addressing Linguistic and Cultural Biases

Destiny Tchéhouali, a prominent figure in the field of French-language artificial intelligence, has pointed out the risks of linguistic and cultural biases in AI systems. He noted that Quebec culture remains significantly underrepresented in the datasets currently influencing AI technologies. “We run the risk of reproducing linguistic biases and cultural biases,” he warned. “When it comes to Indigenous peoples, the stakes are even higher.”

Tchéhouali advocates for the proposed database, describing it as “strategic infrastructure” that could establish essential guidelines for the identification, cataloguing, and tracking of local content within contemporary AI systems. Such groundwork is vital for ensuring that AI development is inclusive and reflective of Quebec’s rich cultural diversity.

Protecting Creators’ Rights

As BAnQ forges ahead with its plans, concerns regarding copyright and the protection of creators’ rights have arisen. Grégoire reassured stakeholders that the new platform could ultimately offer better protections for artists than the current landscape, which she likened to “the Wild West.” She argued that, at present, creators’ works are often exploited without appropriate compensation.

Protecting Creators’ Rights

The database could serve as a centralized access point, facilitating fair remuneration for artists whose works are incorporated into AI training datasets. However, some artists remain apprehensive, fearing that their contributions to these systems could jeopardise their careers. “The main criticism we hear is that, even if artists earn income from it, they are still feeding the beast that may ultimately threaten their contracts,” said Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research.

The feasibility study envisions the platform becoming operational by 2029, although D’Amour acknowledged that this timeline would be re-evaluated post-experimental phase. The project is anticipated to require a budget of approximately $10.5 million over five years, with funding already secured from the Quebec government for both the feasibility study and the experimental phase.

Why it Matters

This initiative by BAnQ is not merely an administrative project; it represents a crucial step towards rectifying the underrepresentation of Quebec’s cultural identity in the rapidly evolving landscape of artificial intelligence. By building a robust database, BAnQ aims to ensure that future AI systems are not only more accurate in their portrayal of Quebec society but also respectful and inclusive of its diverse linguistic and cultural heritage. This endeavour could set a precedent for other regions, illustrating the importance of localised data in the development of technology that truly reflects the communities it serves.

Share This Article
Deep-dive reporting on Quebec society, politics, and culture.
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2026 The Update Desk. All rights reserved.
Terms of Service Privacy Policy