The Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an ambitious project aimed at developing a comprehensive database of cultural and governmental content. This initiative seeks to improve artificial intelligence systems’ grasp of Quebec’s unique societal landscape, its diverse cultures, and Indigenous languages. Following a thorough feasibility study completed earlier this year, BAnQ has now entered the experimental phase of this proposed databank, which will primarily feature content in French and Indigenous languages.
Addressing Data Gaps in AI
The initiative has emerged in response to significant concerns about the inadequacy of data pertaining to Quebec within major generative AI frameworks. Current AI systems often struggle to deliver reliable information reflecting Quebec’s rich societal fabric. “All scenarios are a little bit on the table right now,” stated Valérie D’Amour, who led the feasibility study. She emphasised the need for collaboration with cultural stakeholders and data proprietors to validate various possibilities for the project.
BAnQ’s president and CEO, Marie Grégoire, underscored that the upcoming platform would not serve as a public distribution channel for creative works, ensuring that access to the data remains highly regulated. The objective is to cultivate a more accurate representation of Quebec’s culture within AI models. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” Grégoire said.
A Collective Effort
The project finds its roots in a recommendation from Quebec’s innovation council, which highlighted the scarcity of Quebec-related data in AI training datasets as a contributing factor to the challenges faced by local creators and researchers. Destiny Tchéhouali, a co-holder of a research chair focused on French-language artificial intelligence at Université du Québec à Montréal, warns that the underrepresentation of Quebec’s culture in the AI landscape risks perpetuating both linguistic and cultural biases. “When we talk about Indigenous peoples, we run an even greater risk of all these biases,” Tchéhouali noted.

He contends that the proposed database could serve as “strategic infrastructure” to establish clear guidelines for how local content is identified, catalogued, and monitored within contemporary AI systems. This structured approach could address existing disparities and promote a more inclusive representation of Quebec’s diverse cultures.
Navigating Copyright Concerns
As BAnQ forges ahead with this initiative, copyright issues have surfaced as a significant concern within the cultural sector. However, Grégoire argues that this new platform could ultimately provide better protection for creators than the current landscape. “Right now, it’s a bit like the Wild West,” she remarked, pointing out that data is often exploited without proper compensation for the creators.
The proposed database could act as a centralised mechanism, ensuring that artists are appropriately remunerated for their contributions. By collaborating, cultural organisations would be in a stronger position to secure fair compensation and sustain the vitality of the sector in the long term.
Despite these potential benefits, some artists remain apprehensive about the implications of contributing their work to AI training datasets. “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI,” cautioned Maxime Harvey, a postdoctoral researcher and member of the same research chair as Tchéhouali.
Future Outlook and Funding
The feasibility study projects that the platform could become operational by 2029. However, D’Amour noted that the timeline would be reviewed following the experimental phase. The initiative is estimated to require a budget of nearly $10.5 million over the next five years, covering both operational and capital expenses. To date, BAnQ has secured $340,000 from the Quebec government for the feasibility study, alongside an additional $750,000 to support the 12-month experimentation phase.

Why it Matters
This initiative represents a crucial step towards ensuring that Quebec’s rich cultural landscape is adequately represented in the rapidly evolving world of artificial intelligence. By creating a databank that prioritises local languages and cultural nuances, BAnQ is not only addressing existing data gaps but also advocating for a more equitable and sustainable future for creators in the province. The outcome may set a precedent for similar projects globally, fostering a deeper understanding of diverse cultures in AI systems and protecting the livelihoods of artists and creators alike.