Quebec’s Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an ambitious initiative to create a comprehensive database that will serve as a resource for training artificial intelligence systems. This project aims to enhance the understanding of Quebec’s societal fabric, cultural nuances, and Indigenous languages within AI models. Following a feasibility study completed earlier this year, BAnQ has initiated the experimental phase of this databank, which will primarily encompass content in French and Indigenous languages.
Addressing the Data Deficit
The motivation behind this initiative is clear: many existing AI systems struggle to provide accurate representations of Quebec’s unique identity due to a scarcity of relevant data. Valérie D’Amour, who spearheaded the feasibility study, emphasised the need to validate various possibilities through collaboration with cultural stakeholders and data providers. “All scenarios are a little bit on the table right now,” she stated in a recent interview. The intention is to foster a platform that accurately reflects the province’s diverse cultural landscape.
BAnQ has made it clear that this future platform will not function as a public distribution channel for creative works. Instead, access to the data will be carefully regulated to ensure that it serves its intended purpose without compromising the rights of creators.
Learning from Global Examples
Similar projects have emerged in other regions, such as Sweden, where extensive collections of Nordic texts have been compiled to support generative AI models tailored to Scandinavian languages. BAnQ plans to begin by utilising its own archives before potentially expanding to include data from external sources. This strategic approach is rooted in a recommendation from Quebec’s innovation council, which highlighted the pressing need for improved data representation concerning the province’s culture in AI training datasets.

Destiny Tchéhouali, a co-holder of a research chair focused on French-language AI at Université du Québec à Montréal, pointed out that Quebec’s culture is currently underrepresented in the data available for AI systems. “We risk reproducing linguistic and cultural biases,” he warned, particularly concerning Indigenous communities. The proposed database aims to establish “strategic infrastructure” that will help local content be identified, catalogued, and tracked effectively within AI systems.
Navigating Copyright Challenges
As BAnQ develops this database, copyright concerns have emerged as a significant issue within the cultural sector. However, Grégoire, the president and CEO of BAnQ, argues that the platform could ultimately provide better protection for creators than the current unregulated environment. “Right now, it’s a bit like the Wild West,” she remarked, adding that the database could serve as a centralized gateway for compensating creators whose works are utilised.
Despite these assurances, some artists express apprehension that contributing their works to AI training could jeopardise their livelihoods. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, noted that while artists might receive some income, they might be inadvertently feeding a system that could replace traditional contracts. “The main criticism we hear is that they are still feeding the beast,” he explained.
Future Prospects and Funding
The feasibility study envisions the platform becoming operational by 2029, although the timeline will be reassessed following the ongoing experimental phase. The initiative has been allocated a projected budget of nearly $10.5 million over the next five years, covering both operating and capital expenses. To date, BAnQ has received $340,000 from the Quebec government for the feasibility study and an additional $750,000 to support the project’s twelve-month experimentation phase.

Why it Matters
This initiative is a significant step towards ensuring that Quebec’s rich cultural heritage and diverse languages are adequately represented in the rapidly evolving landscape of artificial intelligence. By creating a dedicated database, BAnQ is not only addressing critical gaps in AI training data but also taking proactive measures to safeguard the rights and livelihoods of local creators. As AI technologies continue to shape our world, such efforts are crucial in preventing cultural erasure and fostering a more inclusive digital future.