Quebec’s national library is embarking on an ambitious initiative to establish a comprehensive database that will encapsulate cultural and governmental information, aimed at enhancing artificial intelligence systems’ understanding of Quebec’s unique society, culture, and Indigenous languages. The Bibliothèque et Archives nationales du Québec (BAnQ) has entered the experimental phase of this project, following a detailed feasibility study conducted earlier this year.
Addressing Data Gaps in AI Training
The primary objective of this database is to tackle the significant challenges faced by major AI systems, which often struggle to produce accurate and relevant information about Quebec due to a scarcity of local data. Valérie D’Amour, who spearheaded the feasibility study, expressed the project’s open-ended potential, stating, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
BAnQ has assured that the forthcoming platform will not act as a public distribution outlet for creative works; rather, it will maintain strict control over data access.
Ensuring Representation in AI
Marie Grégoire, the president and CEO of BAnQ, emphasised the importance of AI systems accurately reflecting the diverse tapestry of Quebec’s society and culture. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she articulated. This initiative is not entirely unprecedented; similar projects have emerged internationally, such as in Sweden, where extensive collections of Nordic-language texts have been compiled to foster the development of generative AI models.

BAnQ intends to kickstart the project using its own collections before exploring additional data sources. This initiative stems from a recommendation made by Quebec’s innovation council in a 2024 report, which highlighted the alarmingly limited amount of Quebec-related data in existing AI training datasets.
Cultural Representation and Ethical Concerns
Destiny Tchéhouali, who co-holds a research chair focused on French-language artificial intelligence at Université du Québec à Montréal, noted that Quebec’s cultural representation in AI remains disproportionately low. He cautioned, “We run the risk of reproducing linguistic biases and cultural biases. And when we also talk about Indigenous peoples, we run an even greater risk of all these biases.” Tchéhouali believes that the proposed database could serve as “strategic infrastructure” to establish protocols for identifying, cataloguing, and tracking local content within contemporary AI systems.
As BAnQ develops this database, copyright concerns have surfaced, prompting discussions about the protection of creators’ rights in an increasingly digital landscape. Grégoire contended that the platform could provide enhanced safeguards for artists compared to the current state of affairs. “Right now, it’s a bit like the Wild West,” she remarked. “Data is being harvested for free, and that should not be the case.” The database could potentially function as a central gateway, streamlining the process of compensating creators whose works are utilised.
However, some artists remain apprehensive. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, highlighted a prevalent concern: “Even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.” This sentiment underscores the delicate balance between innovation and the protection of creative livelihoods.
Project Timeline and Funding
The feasibility study envisions the database becoming operational by 2029, although D’Amour noted that this timeline will be reassessed following the experimental phase. The study has projected a five-year budget of nearly £10.5 million, which encompasses both operating and capital costs. To date, BAnQ has received £340,000 from the Quebec government to support the feasibility study and an additional £750,000 earmarked for the project’s 12-month experimentation phase.

Why it Matters
The establishment of this cultural database is a critical step towards ensuring that Quebec’s rich and diverse heritage is accurately represented in the realm of artificial intelligence. As generative AI technologies continue to evolve and permeate various sectors, it is imperative that they are informed by a wealth of local knowledge and cultural context. This initiative not only aims to empower Quebec’s creative community but also seeks to mitigate the risk of perpetuating biases, thereby paving the way for more inclusive and representative AI systems. By taking proactive measures now, Quebec can set a precedent for cultural integrity in the digital age, ensuring that no voice is left unheard.