In a significant move to enhance the representation of Quebec’s diverse culture in artificial intelligence systems, the Bibliothèque et Archives nationales du Québec (BAnQ) has initiated the experimental phase of a new database project. This initiative aims to gather cultural and governmental content to better inform AI models about Quebec’s society, economy, and Indigenous languages. Following a feasibility study completed earlier this year, BAnQ is now poised to tackle the challenges posed by the scarcity of Quebec-specific data in AI training datasets.
Addressing Data Gaps in AI
The primary motivation behind this ambitious project is the growing concern that existing generative AI models often struggle to accurately reflect Quebec’s unique context due to a lack of relevant data. Valérie D’Amour, who spearheaded the feasibility study, emphasised the scope of the initiative: “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
BAnQ has clarified that the forthcoming platform will not serve as a public distribution channel for creative works, placing strict controls on data access. CEO Marie Grégoire articulated the project’s vision: “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community.”
Learning from Global Initiatives
The Quebec project is not unique; similar efforts have been seen in other nations. For instance, Sweden has compiled extensive collections of Nordic-language texts to assist in developing AI models for Scandinavian languages. BAnQ plans to start with its own resources before exploring data from additional sources.

This initiative aligns with recommendations made in a 2024 report by Quebec’s innovation council, which highlighted the limited volume of Quebec-related data present in AI training datasets as a significant issue. Destiny Tchéhouali, a co-holder of a Quebec-based research chair focused on French-language artificial intelligence, noted that Quebec’s cultural output is currently “underrepresented in the corpora circulating in the AI world.” He cautioned that without proper representation, there is a danger of perpetuating linguistic and cultural biases, particularly concerning Indigenous peoples.
Protecting Creators’ Rights
As BAnQ develops this database, concerns around copyright and the implications for artists have been raised. Grégoire suggested that the platform could enhance protections for creators compared to the existing chaotic landscape. “Right now, it’s a bit like the Wild West,” she remarked. “Data is being harvested for free, and that should not be the case.”
The proposed database aims to establish a centralised system that would facilitate fair compensation for creators whose works might be utilised in AI training. However, some artists remain apprehensive. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, expressed concerns that contributing their work could jeopardise their livelihoods. “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” he cautioned.
Financial Outlook and Future Goals
The feasibility study outlines a timeline for the platform’s operational launch by 2029, although D’Amour indicated that this schedule may be reassessed following the experimental phase. The projected budget for the next five years is approximately CAD 10.5 million, which includes both operational and capital expenses. To support the feasibility study and the subsequent experimental phase, BAnQ has received CAD 340,000 from the Quebec government, alongside an additional CAD 750,000 for the project’s initial year.

Why it Matters
This initiative is crucial for ensuring that Quebec’s rich cultural tapestry is accurately represented in the rapidly evolving landscape of artificial intelligence. By building a robust database of local content, BAnQ is not only addressing the immediate data gap but also setting a precedent for how cultural institutions can collaborate with technology to safeguard the integrity of their narratives. As AI systems increasingly influence various sectors, the implications of this project extend beyond Quebec, potentially shaping global conversations about data representation and cultural identity in the digital age.