In a bold move to enrich artificial intelligence systems with a deeper understanding of Quebec’s unique society, culture, and Indigenous languages, the Bibliothèque et Archives nationales du Québec (BAnQ) is embarking on an ambitious project to create a comprehensive database. This initiative follows a detailed feasibility study and aims to address the current shortcomings in AI’s representation of Quebec-related data.
Addressing Data Gaps in AI
The feasibility study, completed earlier this year, revealed that many generative AI systems struggle to accurately reflect Quebec’s societal nuances due to a severe lack of locally relevant data. Valérie D’Amour, who spearheaded the study, acknowledged the exploratory nature of the project. “All scenarios are a little bit on the table right now,” she remarked, indicating that discussions will involve cultural stakeholders, data owners, and providers to validate potential avenues for development.
The proposed database will serve as a repository of cultural and governmental content, but BAnQ has clarified that it will not function as a public distribution platform for creative works. Access to the data will be strictly regulated to ensure its responsible use. Marie Grégoire, BAnQ’s president and CEO, emphasised the importance of integrating Quebec-specific references into AI models, whether they originate from academic research or the business sector.
Learning from Global Initiatives
This project mirrors similar efforts seen internationally, such as in Sweden, where a substantial collection of Nordic-language texts has been assembled to enhance the training of generative AI models. BAnQ plans to initially utilise its own collections before expanding to incorporate data from other sources, ensuring that local content is prioritised.

The drive for this initiative stems from a recommendation in a 2024 report from Quebec’s innovation council, which highlighted the alarming scarcity of data concerning Quebec within global AI training datasets. Destiny Tchéhouali, co-holder of a Quebec-based research chair focused on French-language AI, pointed out the ongoing underrepresentation of Quebec culture in the current AI landscape. “We run the risk of reproducing linguistic and cultural biases,” Tchéhouali warned, underscoring the significance of this database in mitigating such risks.
Creative Protections Amid Concerns
As BAnQ develops this database, concerns regarding copyright and the protection of creators’ rights have surfaced. Grégoire asserted that the new platform could offer enhanced safeguards for artists compared to existing systems, which she described as akin to the “Wild West.” She highlighted the necessity of establishing a structured approach to data usage that ensures creators are fairly compensated for their contributions.
Despite these reassurances, there remains apprehension within the artistic community. Maxime Harvey, a postdoctoral researcher and member of Tchéhouali’s research chair, noted that many artists fear their work could ultimately be used to create systems that threaten their income. “Even if artists earn from it, they are still feeding the beast that may eventually take away their contracts,” he cautioned.
Project Timeline and Financial Support
The feasibility study anticipates that the database could be operational by 2029, although D’Amour indicated that this timeline might be adjusted following the experimental phase. The initiative is estimated to require a budget of approximately $10.5 million over five years, covering both operational and capital costs. The Quebec government has already allocated $340,000 for the feasibility study, alongside an additional $750,000 to support the project’s first year of experimentation.

Why it Matters
This initiative holds significant implications not just for Quebec’s cultural landscape but for the broader conversation surrounding AI and its role in society. By ensuring that AI systems are trained on diverse and representative data, Quebec aims to create a more inclusive digital environment that reflects its rich cultural tapestry. The database could serve as a model for other regions facing similar challenges, ultimately fostering a more equitable and culturally sensitive approach to technology.