Quebec’s national library is embarking on an innovative project aimed at creating a comprehensive database that will enrich artificial intelligence systems with cultural and governmental content representative of Quebec society, including its Indigenous languages. The Bibliothèque et Archives nationales du Québec (BAnQ) has initiated the experimental phase following a feasibility study completed earlier this year, seeking to tackle the challenges posed by the scarcity of Quebec-specific data in existing AI models.
Addressing Data Gaps in AI
The initiative comes in response to growing concerns that leading generative AI technologies frequently lack accurate representations of Quebec’s diverse society, economy, and cultural landscape due to insufficient local data. Valérie D’Amour, who oversaw the feasibility study, elaborated on the project’s potential during an interview. “All scenarios are a little bit on the table right now,” she stated. “We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
BAnQ has clarified that the forthcoming platform will not function as a public distribution channel for creative works; instead, access to the data will be strictly regulated. Marie Grégoire, BAnQ’s president and CEO, emphasised the importance of ensuring that AI systems accurately reflect the nuances of Quebec’s culture. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she explained.
Learning from Global Examples
This initiative is not unique to Quebec. Similar efforts are underway in other parts of the world, such as Sweden, where extensive collections of Nordic-language texts are being compiled to enhance generative AI models for Scandinavian languages. BAnQ intends to begin with its own archives before expanding to data from external sources. This strategic approach stems from a recommendation in a 2024 report by Quebec’s innovation council, which highlighted the limited availability of Quebec-related data in AI training datasets as a significant barrier.

Destiny Tchéhouali, who holds a research chair focused on French-language artificial intelligence at the Université du Québec à Montréal, remarked on the underrepresentation of Quebec culture in the current AI landscape. He stated, “We run the risk of reproducing linguistic biases and cultural biases. And when we also talk about Indigenous peoples, we run an even greater risk of all these biases.” Tchéhouali believes that the proposed database could serve as a vital infrastructure for establishing guidelines on how local content is identified, catalogued, and tracked in modern AI systems.
Balancing Innovation and Copyright Concerns
As BAnQ progresses with the development of the proposed database, concerns regarding copyright and the protection of creative works have emerged as significant issues for the cultural sector. However, Grégoire suggested that the new platform could offer creators better protections compared to the existing landscape. “Right now, it’s a bit like the Wild West,” she noted. “Data is being harvested for free, and that should not be the case.”
The database could serve as a centralised gateway, facilitating fair compensation for creators whose works contribute to the training of AI systems. Grégoire posits that collaborative efforts among cultural organisations would enhance their ability to ensure that creators are remunerated appropriately, thereby supporting the sustainability of the sector.
Nevertheless, some artists have voiced apprehensions about the implications of sharing their work with AI training systems. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, pointed out a prevalent concern: “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”
Future Prospects
The feasibility study anticipates that the platform could be operational by 2029, although D’Amour indicated that this timeline would be reassessed after the experimental phase. The study outlines a projected five-year budget of approximately $10.5 million through to 2030, encompassing both operating and capital expenditures. The Quebec government has allocated $340,000 for the feasibility study, along with an additional $750,000 to support the project’s initial 12-month experimentation phase.

Why it Matters
This initiative represents a significant step towards ensuring that Quebec’s unique cultural identity is not only preserved but also integrated into the rapidly evolving realm of artificial intelligence. By creating a robust database that reflects the province’s diverse languages and cultural heritage, BAnQ aims to mitigate existing biases in AI systems and foster a more inclusive digital landscape. In a world increasingly shaped by technology, such efforts are crucial for protecting the richness of local cultures and ensuring their voices are represented and respected in the digital age.