Quebec’s national library, Bibliothèque et Archives nationales du Québec (BAnQ), is embarking on an ambitious project to create a comprehensive database of cultural and governmental content. This initiative aims to enhance artificial intelligence systems’ understanding of Quebec’s unique society, culture, and Indigenous languages. Following a successful feasibility study earlier this year, BAnQ has now initiated the experimental phase of this proposed databank, which will primarily focus on content in French and Indigenous languages.
Addressing Data Gaps in AI
The driving force behind this initiative is the recognition that many leading AI systems often lack reliable information about Quebec, which hampers their effectiveness in accurately reflecting the province’s diverse society and economy. Valérie D’Amour, who spearheaded the feasibility study, emphasised the project’s exploratory nature, stating, “All scenarios are a little bit on the table right now.” The BAnQ team is eager to engage with cultural stakeholders, data providers, and owners to validate ideas and explore possibilities for this pioneering databank.
Marie Grégoire, BAnQ’s president and CEO, reinforced the vision of creating AI systems that authentically mirror Quebec’s societal fabric. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she noted. This database is not intended to serve as a public distribution channel for creative works; rather, access will be closely regulated to ensure the integrity of the data and the protection of creators’ rights.
Learning from Global Initiatives
BAnQ’s project aligns with similar initiatives worldwide, such as those in Sweden, where substantial collections of Nordic-language texts have been compiled to support the development of generative AI models for Scandinavian languages. This trend underscores the importance of localised data in training AI systems to better respond to and represent regional cultures.

The BAnQ initiative was inspired by a 2024 report from Quebec’s innovation council, which pointed out the substantial lack of data pertaining to the province in existing AI training datasets. Destiny Tchéhouali, a co-holder of a Quebec-based research chair dedicated to French-language AI and digital technologies, highlighted the risk posed by this data scarcity. “Quebec culture remains underrepresented in the corpora currently circulating in the AI world,” he warned. He further cautioned that this could perpetuate linguistic and cultural biases, especially concerning Indigenous peoples, who face even greater risks of misrepresentation.
Cultural and Copyright Considerations
As BAnQ progresses with its databank project, copyright issues have arisen as a significant concern within the cultural sector. However, Grégoire is optimistic, suggesting that the proposed platform could offer creators more robust protections than those currently available. “Right now, it’s a bit like the Wild West,” she remarked. “Data is being harvested for free, and that should not be the case.”
The new databank could function as a centralised gateway, making it simpler to ensure that creators receive appropriate compensation for their contributions. Working collaboratively, cultural organisations could enhance their ability to safeguard creators’ interests and ensure the sustainability of the sector in the long term. Nonetheless, some artists remain apprehensive, fearing that their participation in AI training could threaten their livelihoods. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, expressed this concern, stating, “Even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”
Looking Ahead: Project Timeline and Budget
The feasibility study suggests that the platform is expected to be operational by 2029, although D’Amour acknowledges that this timeline will be reassessed after the experimental phase. The project has an estimated budget of approximately £8.5 million over five years, covering both operational and capital expenses. To support this initiative, BAnQ has already secured £58,000 from the Quebec government for the feasibility study, alongside a further £144,000 to fund the twelve-month experimental phase.

Why it Matters
The creation of this cultural database represents a significant step towards ensuring that Quebec’s rich heritage and diverse languages are appropriately represented within the realm of artificial intelligence. By addressing the existing data gaps, BAnQ’s initiative not only aims to empower local creators but also seeks to cultivate a more accurate and nuanced understanding of Quebec culture in the global AI landscape. As the world increasingly relies on AI systems, the implications of this project extend far beyond provincial borders, promising a more equitable and inclusive future for cultural representation in technology.