Quebec’s national library is set to embark on an ambitious project aimed at creating a comprehensive database of cultural and governmental content. This initiative, spearheaded by Bibliothèque et Archives nationales du Québec (BAnQ), seeks to enhance artificial intelligence systems’ understanding of the province’s society, culture, and Indigenous languages. Following a feasibility study completed earlier this year, BAnQ has launched the experimental phase of this proposed databank, which will focus on French and Indigenous languages.
Addressing Data Gaps in AI Training
The project is a response to growing concerns that current generative AI systems often fail to accurately represent Quebec’s unique cultural and societal context. Valérie D’Amour, who oversaw the feasibility study, emphasised the importance of gathering input from cultural stakeholders and data providers. “All scenarios are a little bit on the table right now,” she remarked. “We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
BAnQ has stated that the new platform will not function as a public distribution channel for creative works, ensuring that access to the data remains strictly regulated. Marie Grégoire, the president and CEO of BAnQ, articulated the initiative’s goal: to ensure AI systems more accurately reflect the realities of Quebec society. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she said.
Learning from Global Initiatives
Similar projects have emerged globally, including in Sweden, where extensive collections of Nordic-language texts have been compiled to enhance the development of generative AI models for Scandinavian languages. BAnQ plans to begin this endeavour with its own collections before exploring potential data from external sources. The initiative aligns with recommendations from Quebec’s innovation council, which highlighted the critical need for more local data within AI training datasets.

Destiny Tchéhouali, a prominent researcher focused on French-language AI and digital technologies, pointed out that Quebec culture is significantly underrepresented in existing AI corpora. “We run the risk of reproducing linguistic biases and cultural biases,” he cautioned, adding that Indigenous cultures are particularly vulnerable to such biases. Tchéhouali advocates for the proposed database as a form of “strategic infrastructure” that would help identify, catalogue, and monitor local content within contemporary AI systems.
Balancing Protection and Accessibility
As BAnQ develops the proposed database, issues of copyright and creators’ rights are at the forefront of discussions. Grégoire argued that the new platform could offer increased protection for creators compared to the current landscape, which she described as “a bit like the Wild West.” She believes that the database could serve as a centralised gateway, facilitating fair compensation for creators whose works are utilised.
However, some artists remain apprehensive about the potential consequences of contributing to AI training datasets. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, noted that the prevailing sentiment among artists is one of concern. “Even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” he cautioned.
The feasibility study suggests that the platform could become operational by 2029, though D’Amour indicated that this timeline would be reassessed following the experimental phase. The projected budget for the initiative is approximately $10.5 million over five years, covering both operating and capital costs. The Quebec government has provided funding of $340,000 for the feasibility study, along with an additional $750,000 to support the project’s initial experimentation.
Why it Matters
The establishment of a cultural database by BAnQ is a significant step towards ensuring that Quebec’s rich cultural tapestry is accurately represented in the rapidly evolving landscape of artificial intelligence. By prioritising local languages and cultural contexts, this initiative not only seeks to mitigate biases in AI but also aims to empower creators in the digital age. The outcome of this project could redefine the relationship between technology and culture in Quebec, setting a precedent for how local content is integrated into global AI frameworks while safeguarding the rights of artists and cultural practitioners.
