Quebec’s national library, Bibliothèque et Archives nationales du Québec (BAnQ), is embarking on an ambitious initiative to develop a comprehensive database that encapsulates the province’s cultural and governmental content. This move seeks to enhance the training of artificial intelligence systems, enabling a more nuanced understanding of Quebec’s diverse society, culture, and Indigenous languages. Following a successful feasibility study earlier this year, the library has entered the experimental phase of this project, which aims to address the inadequacies of generative AI systems that often overlook Quebec-related data.
Filling the Data Gap
The initiative comes in response to growing concerns that existing AI models lack adequate representation of Quebec’s unique cultural and social landscape. Valérie D’Amour, who oversaw the feasibility study, acknowledged the challenges ahead, stating, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
Marie Grégoire, BAnQ’s president and chief executive officer, emphasised the importance of ensuring that AI systems reflect Quebec’s identity. “That means having Quebec references, whether in small models or large ones, whether they come from research or the business community,” she affirmed.
Learning from International Examples
The project draws inspiration from similar efforts around the globe, notably in Sweden, where substantial collections of Nordic-language materials have been curated to aid in the development of AI models tailored for Scandinavian languages. BAnQ plans to start with its existing collections before integrating data from external sources. This strategy aligns with a recommendation from Quebec’s innovation council, which highlighted the scarcity of Quebec-specific data in AI training datasets as a significant barrier.

Destiny Tchéhouali, a co-holder of a research chair focused on French-language AI and digital technologies, underscored the urgency of this initiative. He pointed out that Quebec culture is “underrepresented in the corpora currently circulating in the AI world,” warning that this could lead to linguistic and cultural biases, particularly concerning Indigenous peoples.
Navigating Copyright Challenges
As BAnQ develops this proposed database, copyright concerns have surfaced as a significant issue within the cultural sector. However, Grégoire believes that the new platform could ultimately provide creators with enhanced protection compared to the current landscape, which she described as “a bit like the Wild West.”
“Data is being harvested for free, and that should not be the case,” Grégoire asserted. The envisioned database could create a centralised gateway for compensating creators whose works are utilized, thereby fostering a more sustainable cultural sector. Yet, some artists express apprehension about contributing their work to AI training systems, fearing it may jeopardise their livelihoods in the long term.
Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, echoed these concerns. “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” he noted.
A Vision for the Future
The feasibility study outlines an operational timeline for the platform, targeting 2029, although D’Amour acknowledged that this schedule will be reviewed following the experimental phase. The study proposes a five-year budget of approximately CAD 10.5 million, encompassing both operating and capital expenses. The Quebec government has allocated CAD 340,000 for the feasibility study and an additional CAD 750,000 to support the project’s twelve-month experimental phase.

Why it Matters
The creation of a dedicated cultural and governmental database by BAnQ is a significant step towards ensuring that Quebec’s rich tapestry of languages, cultures, and histories is accurately represented in AI systems. As artificial intelligence increasingly shapes our world, it is crucial that these technologies do not perpetuate existing biases or marginalise underrepresented voices. By actively curating and controlling access to this data, Quebec aims to set a precedent for ethical AI development that respects and promotes its cultural heritage, ensuring that the province’s unique identity is preserved and celebrated in the digital age.