The Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an innovative initiative aimed at creating a comprehensive database of cultural and governmental content. This project is designed to train artificial intelligence systems, enhancing their grasp of Quebec’s unique society, diverse culture, and Indigenous languages. After a successful feasibility study this year, BAnQ has entered the experimental phase of this ambitious databank, which will primarily feature content in French and Indigenous languages.
Addressing Data Gaps in AI
The pressing need for this database stems from concerns that mainstream generative AI models often lack reliable information about Quebec. According to Valérie D’Amour, who spearheaded the feasibility study, “All scenarios are a little bit on the table right now.” She emphasised the importance of engaging with cultural stakeholders, data owners, and providers to explore the possibilities and validate ideas.
BAnQ plans to ensure that the future platform will not serve as a public distribution channel for creative works. Access to the data will be strictly controlled, reinforcing the importance of protecting intellectual property. Marie Grégoire, the president and CEO of BAnQ, stressed that the initiative aims to make AI systems more reflective of Quebec’s society and culture, stating, “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community.”
Learning from Global Initiatives
This move aligns with similar efforts seen in other regions, such as Sweden, which has compiled extensive collections of Nordic-language texts for the development of generative AI models. BAnQ intends to start with its own collections before considering data contributions from external sources. The initiative is rooted in a recommendation from Quebec’s innovation council, which highlighted the scarcity of Quebec-related data in AI training datasets.

Destiny Tchéhouali, co-holder of a research chair dedicated to French-language AI at Université du Québec à Montréal, reinforced the significance of this project. He noted that Quebec’s cultural contributions are “underrepresented in the corpora currently circulating in the AI world.” He warned that without such initiatives, there is a risk of perpetuating linguistic and cultural biases, particularly concerning Indigenous communities.
Protecting Creators’ Rights
As BAnQ advances with its plans, copyright concerns have emerged as a potential obstacle for the cultural sector. However, Grégoire is optimistic that the proposed database could enhance protections for creators. “Right now, it’s a bit like the Wild West,” she remarked, highlighting that data is often harvested without compensation. She envisions the database as a centralised gateway that will facilitate fair remuneration for creators whose works are utilised.
Despite these assurances, some artists remain apprehensive about the implications of contributing their work to AI training datasets. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, expressed concerns that such contributions could inadvertently jeopardise their livelihoods. “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” he cautioned.
Project Timeline and Financial Overview
The feasibility study projects that the platform could become operational by 2029, although D’Amour indicated that the timeline would be reassessed following the experimental phase. The initiative is estimated to require a budget of nearly $10.5 million over five years, covering both operational and capital costs. To support this endeavour, BAnQ has already received $340,000 from the Quebec government for the feasibility study and an additional $750,000 to fund the subsequent 12-month experimentation phase.

Why it Matters
The establishment of this cultural database is crucial for ensuring that Quebec’s rich heritage is accurately represented in the rapidly evolving landscape of artificial intelligence. By addressing the data deficits that have historically marginalised Quebec’s unique identity, BAnQ’s initiative not only protects cultural expressions but also paves the way for a more inclusive AI future. This project serves as a critical step toward rectifying biases in AI systems, ultimately fostering a more equitable and representative digital environment for all communities within Quebec.