Quebec’s Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an innovative project aimed at establishing a comprehensive database of cultural and governmental content. This initiative seeks to enhance the understanding of Quebec society, culture, and Indigenous languages within artificial intelligence systems. Having recently completed a feasibility study, BAnQ is moving into the experimental phase of creating a databank that will focus on French and Indigenous languages, addressing the pressing need for more accurate and diverse data representation in AI.
Addressing the AI Data Gap
In a world increasingly influenced by artificial intelligence, it has become evident that many AI systems struggle to provide reliable insights into Quebec’s unique cultural landscape. “All scenarios are a little bit on the table right now,” remarked Valérie D’Amour, the lead on the feasibility study. “We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.” The aim is to ensure that AI models are equipped with a rich dataset reflective of Quebec’s diverse society.
BAnQ stressed that this proposed platform will not serve as a medium for public distribution of creative works; rather, it will maintain stringent control over data access. Marie Grégoire, the president and CEO of BAnQ, articulated the goal of the initiative: to enable AI systems to better mirror Quebec’s cultural and societal fabric. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she explained.
Learning from Global Initiatives
The drive to create this database is not an isolated effort. Similar projects can be observed globally; for instance, Sweden has compiled extensive collections of Nordic-language texts to bolster the capabilities of generative AI models for Scandinavian languages. BAnQ plans to initially utilise its own collections before exploring data from external sources.

This initiative is rooted in a recommendation from a 2024 report by Quebec’s innovation council, which highlighted the limited availability of Quebec-specific data in AI training datasets as a significant barrier to accurately representing the province’s culture and identity.
The Cultural Imperative
Destiny Tchéhouali, who co-holds a research chair focused on French-language artificial intelligence at Université du Québec à Montréal, underscored the importance of this initiative. He pointed out that Quebec culture is “underrepresented in the corpora currently circulating in the AI world.” The potential for linguistic and cultural biases to proliferate is a genuine concern, especially regarding Indigenous languages and communities. Tchéhouali noted that the proposed database could serve as “strategic infrastructure” for establishing guidelines on how local content is identified, catalogued, and tracked in contemporary AI systems.
As BAnQ forges ahead, it must navigate the complexities of copyright, a significant concern within the cultural sector. Grégoire believes that the new platform could provide creators with greater protection compared to the current landscape, which she described as akin to “the Wild West.” She stated, “Data is being harvested for free, and that should not be the case.” By centralising data access, the platform could facilitate better compensation for creators whose works are incorporated into AI models.
Artist Concerns and Future Outlook
Despite the potential benefits, some artists are wary of the implications of sharing their work for AI training. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, voiced concerns that artists may inadvertently undermine their own livelihoods. “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” he stated.

The feasibility study anticipates that the platform could be operational by 2029, although D’Amour indicated that this timeline will be reassessed after the experimental phase. The projected budget for the initiative is approximately £8.5 million over five years, covering both operational and capital costs. To support this endeavour, BAnQ has secured £56,000 from the Quebec government for the feasibility study, along with a further £144,000 for the upcoming twelve-month experimentation phase.
Why it Matters
This initiative by BAnQ is more than just a technological advancement; it is a crucial step towards ensuring that Quebec’s rich cultural narrative is not lost in the rapidly evolving landscape of artificial intelligence. By fostering a more inclusive representation of local identities within AI systems, Quebec has the opportunity to not only preserve its cultural heritage but also set a precedent for other regions grappling with similar issues. As AI continues to shape our world, the importance of local context in its development cannot be overstated.