The Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an innovative project aimed at creating a comprehensive database of cultural and governmental content. This initiative seeks to facilitate the training of artificial intelligence systems, thereby improving their comprehension of Quebec’s unique society, culture, and Indigenous languages. Following a feasibility study earlier this year, BAnQ has entered the experimental phase of this ambitious databank, marking a significant step towards addressing the challenges faced by AI in accurately reflecting Quebec’s rich tapestry.
Addressing Data Deficiencies in AI
One of the primary motivations behind this project is the recognition that many leading generative AI systems struggle to adequately represent Quebec’s social and cultural landscape due to a scarcity of relevant data. Valérie D’Amour, who spearheaded the feasibility study, shared her insights, stating, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
BAnQ’s vision for the future databank is not to serve as a public platform for creative works but rather to ensure that access to the data remains tightly controlled. This approach is designed to protect intellectual property while enhancing the quality of AI outputs concerning Quebec’s identity.
The Cultural Imperative
Marie Grégoire, BAnQ’s president and CEO, emphasised the goal of ensuring that AI systems authentically mirror Quebec’s culture and society. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she stated. This initiative is not merely a technological endeavour; it is also a cultural imperative aimed at rectifying the underrepresentation of Quebec’s vibrant heritage in existing AI frameworks.

Similar efforts have been observed globally, such as in Sweden, where extensive collections of Nordic-language texts have been curated to support the development of AI models for Scandinavian languages. BAnQ intends to initiate its project with its own collections before exploring additional sources of data.
A Strategic Move for Cultural Preservation
This initiative stems from a recommendation in a 2024 report by Quebec’s innovation council, which highlighted the “very small quantity of data on Quebec” available in current AI training datasets. Destiny Tchéhouali, a co-holder of a research chair dedicated to French-language AI and digital technologies, underscored the risks of cultural and linguistic biases inherent in existing AI systems. “Quebec culture remains underrepresented in the corpora currently circulating in the AI world,” he cautioned.
Tchéhouali described the proposed database as “strategic infrastructure” that could establish guidelines for identifying, cataloguing, and tracking local content within modern AI systems. Such a resource could serve as a vital tool in shaping how AI interacts with Quebec’s diverse cultural fabric.
Navigating Copyright Concerns
As BAnQ progresses with the development of this databank, copyright issues have surfaced as a significant concern within the cultural sector. Grégoire argued that this new platform could ultimately provide creators with greater rights and protections than the existing fragmented system. “Right now, it’s a bit like the Wild West,” she remarked. “Data is being harvested for free, and that should not be the case.”

The database is envisioned as a centralised access point that would facilitate fair compensation for creators whose works are utilised. By collaborating collectively, cultural organisations could enhance the sustainability of the sector and ensure that artists are adequately remunerated for their contributions.
However, some artists express apprehension that their involvement in training AI systems might jeopardise their livelihoods. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, voiced a common concern: “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”
Project Timeline and Funding
The feasibility study outlines a timeline for the platform to become operational by 2029, though D’Amour indicated that this schedule would be reassessed following the experimental phase. The project is projected to require a budget of approximately $10.5 million over five years, covering both operating and capital expenses. The Quebec government has already allocated $340,000 for the feasibility study and an additional $750,000 to support the initiative’s 12-month experimental phase.
Why it Matters
This endeavour by BAnQ is a crucial step towards ensuring that Quebec’s unique cultural identity is not lost in the rapidly evolving landscape of artificial intelligence. By creating a dedicated database that accurately reflects the province’s society, culture, and Indigenous languages, BAnQ aims to enhance AI’s performance while protecting the rights of local creators. As AI continues to play an increasingly prominent role in our lives, initiatives like this are vital to ensure that technology serves to enrich rather than overshadow the rich cultural narratives that define Quebec.