The Bibliothèque et Archives nationales du Québec (BAnQ) is taking significant strides towards establishing a comprehensive database that will encompass cultural and governmental content, aimed at improving artificial intelligence systems’ comprehension of Quebec’s diverse society, culture, and Indigenous languages. Following the completion of a feasibility study earlier this year, the national library has initiated the experimental phase of this innovative project, which will be executed in both French and Indigenous languages.
Addressing Data Gaps in AI
One of the main motivations behind BAnQ’s initiative is the observable deficiency in reliable information about Quebec available to major generative AI systems. These systems often lack a nuanced understanding of the province’s unique social, economic, and cultural landscape due to the limited data currently accessible to them. Valérie D’Amour, who led the feasibility study, noted, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
The proposed database will not serve as a public channel for distributing creative works, and BAnQ has emphasised that access to the data will be carefully regulated. Marie Grégoire, BAnQ’s president and CEO, articulated the project’s goal of ensuring that AI systems genuinely reflect Quebec’s rich cultural tapestry. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she stated.
Learning from Global Initiatives
This initiative is not an isolated endeavour; similar projects have emerged internationally. For instance, Sweden has successfully compiled extensive collections of Nordic-language texts to bolster the development of generative AI models tailored for Scandinavian languages. BAnQ plans to start with its own archival collections before exploring data acquisition from external sources.

The impetus for this project can be traced back to a recommendation from Quebec’s innovation council, which highlighted the scarcity of Quebec-centric data in existing AI training datasets. Destiny Tchéhouali, an expert in French-language AI and digital technologies, pointed out that Quebec’s cultural representation is significantly lacking in current AI corpora. He warned of the dangers of perpetuating linguistic and cultural biases, particularly regarding Indigenous peoples, and advocated for a strategic infrastructure to define how local content is identified and catalogued within AI frameworks.
Copyright Concerns and Creative Protections
As BAnQ progresses in developing this database, concerns surrounding copyright and the protection of cultural creators have surfaced. Grégoire contended that the proposed platform could ultimately safeguard creators better than the current landscape allows. “Right now, it’s a bit like the Wild West,” she noted. “Data is being harvested for free, and that should not be the case.” She envisions the database as a centralised access point that could facilitate fair compensation for creators whose works are incorporated into AI training.
Despite the potential benefits, some artists express apprehension about contributing their creations to AI systems that may inadvertently threaten their livelihoods. Maxime Harvey, a postdoctoral researcher and member of the same research chair as Tchéhouali, reflected the prevailing sentiment among artists: “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”
A Vision for the Future
The feasibility study envisions that the platform could become operational by 2029, although D’Amour has indicated that this timeline will undergo reassessment as the experimental phase unfolds. The estimated budget for the project spans approximately $10.5 million through 2030, covering both operational and capital costs. To date, BAnQ has secured $340,000 from the Quebec government for the feasibility study and an additional $750,000 to support the initial 12-month experimentation phase.

Why it Matters
This initiative by BAnQ represents a pivotal moment in the intersection of culture and technology in Quebec. By curating a database that reflects the province’s rich cultural diversity, the library aims to reshape the narrative around artificial intelligence and ensure that it serves as a tool for inclusivity and representation. As the digital landscape continues to evolve, this project not only seeks to protect and promote Quebec’s cultural heritage but also underscores the importance of local voices in the global conversation surrounding AI development.