Quebec’s national library is set to develop a comprehensive database aimed at enriching artificial intelligence systems with insights into the province’s unique culture, society, and Indigenous languages. The Bibliothèque et Archives nationales du Québec (BAnQ) has commenced the experimental phase of this ambitious project following a feasibility study conducted earlier this year. This initiative is designed to tackle the prevalent issue of insufficient data related to Quebec in the training of AI models, which often leads to inaccuracies in the representation of local identity.
Addressing Data Gaps in AI
The launch of this new databank is a response to concerns that major AI platforms frequently fail to deliver reliable information about Quebec’s society and culture due to a scarcity of relevant data. Valérie D’Amour, who spearheaded the feasibility study, emphasised the importance of collaborative discussions with cultural stakeholders and data providers to refine the project’s direction. “All scenarios are a little bit on the table right now,” she noted in a recent interview, highlighting the diverse possibilities that could be explored to enhance the representation of Quebec’s distinctiveness in AI systems.
BAnQ has made it clear that the forthcoming platform will not serve as a public distribution channel for creative works. Instead, access to the data will be closely regulated, ensuring that it serves its intended purpose of improving AI training models. Marie Grégoire, president and CEO of BAnQ, reinforced the necessity of integrating Quebec-specific references into AI systems, whether they stem from academic research or the business sector. “That means having Quebec references, whether in small models or large models,” she stated.
Learning from International Examples
This initiative mirrors similar efforts seen in other parts of the world, such as Sweden, where extensive collections of Nordic-language texts have been amassed to aid the development of generative AI models tailored for Scandinavian languages. BAnQ plans to start by utilising its own collections before gradually incorporating data from external sources.

The foundation of this initiative was laid out in a 2024 report from Quebec’s innovation council, which identified the limited availability of Quebec-related data as a significant obstacle in the AI landscape. Destiny Tchéhouali, co-holder of a research chair focused on French-language AI at Université du Québec à Montréal, echoed these sentiments, stating that Quebec’s cultural representation within AI training datasets is alarmingly inadequate. He warned that the lack of local data could perpetuate linguistic and cultural biases, particularly concerning Indigenous communities.
Protecting Creators’ Rights
As BAnQ forges ahead with the development of this database, concerns over copyright issues within the cultural sector have emerged. However, Grégoire argued that the proposed platform could ultimately provide greater protection for creators than the current landscape offers. “Right now, it’s a bit like the Wild West,” she remarked, referring to the unregulated nature of data harvesting in the creative arts. She envisions the database as a centralised gateway that could facilitate fair compensation for creators whose works are incorporated into AI systems.
Despite these assurances, some artists express apprehension that contributing their content to AI training initiatives might jeopardise their livelihoods. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, noted that the prevailing concern among artists is that even if they benefit financially from their contributions, they are still “feeding the beast” that could potentially replace their roles in the future.
The feasibility study anticipates that the platform could become functional by 2029, although D’Amour indicated that the timeline will be revisited after the experimental phase. With a projected budget of nearly $10.5 million over the next five years, BAnQ has already secured $340,000 from the Quebec government for the feasibility study, along with an additional $750,000 to support the project’s experimental phase.
Why it Matters
This initiative plays a crucial role in ensuring that Quebec’s rich cultural tapestry is adequately represented in the rapidly evolving world of artificial intelligence. By creating a dedicated databank for local content, BAnQ not only seeks to enhance the accuracy of AI systems but also aims to safeguard the rights of creators. As AI continues to permeate various sectors, this project could serve as a model for other regions, demonstrating the importance of preserving local identities in the digital age. In doing so, it stands to empower both creators and communities, fostering a more inclusive and representative technological landscape.
