Quebec’s Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an ambitious project to develop a comprehensive database encompassing cultural and governmental content. This initiative aims to improve the performance of artificial intelligence systems by enriching their knowledge of Quebec’s society, culture, and Indigenous languages. Following a feasibility study completed earlier this year, BAnQ has entered the experimental phase of creating this databank, which will primarily feature content in French and Indigenous languages.
Addressing Data Gaps in AI
The need for this initiative stems from the observation that many generative AI systems often fail to provide accurate information about Quebec’s unique cultural landscape. Valérie D’Amour, who led the feasibility study, acknowledged the challenges posed by the limited availability of Quebec-specific data in AI training datasets. “All scenarios are a little bit on the table right now,” she remarked, highlighting the importance of collaboration with cultural organisations and data providers to explore various possibilities.
BAnQ is committed to controlling access to the data and ensuring that the platform will not serve as a public distribution channel for creative works. Marie Grégoire, the president and chief executive officer of BAnQ, stated that the primary objective is for AI systems to reflect the realities of Quebec society. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she explained.
Learning from Global Examples
Similar projects have emerged globally, notably in Sweden, where vast collections of Nordic-language texts have been compiled to aid in the development of generative AI models for Scandinavian languages. BAnQ plans to draw from its own collections as a starting point before considering data contributions from other sources. This approach is informed by a recommendation from Quebec’s innovation council, which identified the need for more robust data on Quebec in AI training materials.

Destiny Tchéhouali, a co-holder of a research chair focused on French-language AI and digital technologies, emphasised the underrepresentation of Quebec culture in current AI datasets. He cautioned that this lack of representation could perpetuate linguistic and cultural biases, particularly concerning Indigenous peoples. Tchéhouali described the proposed database as “strategic infrastructure” that could set guidelines for identifying, cataloguing, and tracking local content within AI systems.
Copyright Concerns and Creator Protection
As BAnQ develops this proposed database, concerns over copyright issues have surfaced within the cultural sector. However, Grégoire argues that the new platform could enhance protections for creators compared to the existing landscape. She described the current situation as “the Wild West,” where data is harvested without proper compensation. The database aims to serve as a centralised gateway, making it easier to ensure creators receive fair remuneration for their contributions.
Despite the potential benefits, some artists express reservations about the implications of sharing their work for AI training. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, noted that many artists fear their contributions could ultimately threaten their livelihoods. “Even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” he cautioned.
The feasibility study projects that the platform could become operational by 2029, although D’Amour indicated that this timeline would be reassessed following the experimental phase. The study outlines a budget of nearly $10.5 million over five years, which will cover both operational and capital costs. BAnQ has already secured $340,000 from the Quebec government for the feasibility study and an additional $750,000 to support the project’s initial experimental phase.
Why it Matters
This initiative by BAnQ is significant for Quebec’s cultural landscape and the evolution of AI technology. By creating a database that enhances AI’s understanding of local culture and languages, Quebec aims to address existing biases and foster a more inclusive digital environment. This project not only seeks to protect the rights of creators but also strives to ensure that Quebec’s rich cultural heritage is accurately represented in the AI systems that increasingly influence our lives. As technology continues to evolve, initiatives like this will play a crucial role in shaping a future where diversity and creativity are upheld, ultimately benefiting both creators and society at large.
