The Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an ambitious project to create a comprehensive database of cultural and governmental content aimed at enhancing artificial intelligence systems’ comprehension of Quebec’s unique society, culture, and Indigenous languages. Following a feasibility study conducted earlier this year, the initiative has entered its experimental phase, targeting a solution to the prevalent issue of inadequate data representation pertaining to Quebec within existing AI frameworks.
Addressing Data Gaps in AI
The initiative is a direct response to the challenges posed by mainstream generative AI systems, which frequently lack reliable information regarding Quebec’s societal, economic, and cultural landscapes. Valérie D’Amour, who spearheaded the feasibility study, remarked, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.” The aim is clear: to enrich AI training datasets with authentic Quebec references that represent the province’s diverse voices.
BAnQ has clarified that the forthcoming platform will not serve as a public distribution medium for creative works. Instead, it will exercise stringent control over data access, ensuring that AI systems are tailored to better reflect the nuances of Quebec society. Marie Grégoire, BAnQ’s president and CEO, emphasised the importance of incorporating Quebec-centric content, whether from research or the business community, into the AI landscape.
Learning from Global Initiatives
This project is not an isolated endeavour; similar initiatives have emerged globally, such as in Sweden, where extensive collections of Nordic-language texts have been compiled to support the development of generative AI models for Scandinavian languages. BAnQ’s strategy will initially focus on its own collections before considering external data sources, thereby laying a solid foundation for building a culturally rich AI database.

The impetus for this initiative originates from a 2024 report by Quebec’s innovation council, which highlighted the scarcity of Quebec-related data in AI training datasets as a significant obstacle. Destiny Tchéhouali, co-holder of a research chair centred on French-language artificial intelligence at the Université du Québec à Montréal, underscored the issue, stating, “Quebec culture remains underrepresented in the corpora currently circulating in the AI world.” He further warned that this underrepresentation risks perpetuating linguistic and cultural biases, particularly regarding Indigenous peoples.
Navigating Copyright and Creator Compensation
As BAnQ forges ahead with the development of this database, concerns surrounding copyright and creator compensation have surfaced within the cultural sector. Grégoire acknowledged these issues, arguing that the proposed platform could actually provide better protections for creators compared to the current system. “Right now, it’s a bit like the Wild West,” she noted, pointing out that data is often harvested without compensation. The vision is to establish a more regulated environment, where creators are compensated fairly for their contributions.
Despite the potential benefits, some artists remain apprehensive about the implications of contributing their work to AI training systems. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, shared a common concern: “Even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.” This sentiment reflects a broader unease within the artistic community regarding the sustainability of their livelihoods in an increasingly automated landscape.
Project Timeline and Funding
The feasibility study envisions the platform becoming fully operational by 2029, although D’Amour indicated that this timeline would be reassessed following the experimental phase. The estimated budget for the project stands at nearly $10.5 million over five years, encompassing both operational and capital expenses. BAnQ has secured $340,000 from the Quebec government for the feasibility study and an additional $750,000 to support the 12-month experimentation phase.

Why it Matters
The creation of a dedicated cultural and government database by BAnQ is a pivotal step towards ensuring that artificial intelligence systems accurately reflect the rich tapestry of Quebec’s society and culture. As AI continues to play an increasingly significant role in various sectors, it is imperative that the unique identities and narratives of communities, especially those traditionally underrepresented, be preserved and promoted. This initiative not only aims to enhance AI’s understanding of Quebec but also seeks to empower creators and safeguard the cultural heritage of the province, fostering a more inclusive digital future.