Quebec’s national library, Bibliothèque et Archives nationales du Québec (BAnQ), is embarking on an ambitious project to establish a comprehensive database of cultural and governmental content aimed at enhancing artificial intelligence (AI) systems’ understanding of Quebec’s diverse society, culture, and Indigenous languages. Following a successful feasibility study earlier this year, BAnQ has commenced the experimental phase of this innovative databank, which will initially focus on French and Indigenous languages.
Addressing Data Gaps in AI
The initiative arises from a growing concern that existing generative AI systems frequently lack reliable information about Quebec, due particularly to the scarcity of data pertinent to the province. Valérie D’Amour, who spearheaded the feasibility study, commented, “All scenarios are a little bit on the table right now. We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers, who will be involved in the discussions.”
This project aims to create a repository of Quebec-centric references, which would ultimately ensure that AI models reflect the rich and varied tapestry of Quebec’s identity. Marie Grégoire, president and CEO of BAnQ, emphasised, “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community.”
Learning from Global Initiatives
Similar projects have emerged in other regions, notably in Sweden, where extensive collections of Nordic-language texts have been curated to aid in the development of generative AI models. BAnQ intends to leverage its own holdings as a foundation before expanding to include external data sources. This initiative aligns with recommendations from a 2024 report by Quebec’s innovation council, which highlighted the urgent need for more substantial datasets related to Quebec for AI training.

Destiny Tchéhouali, a co-holder of a research chair dedicated to French-language AI and digital technologies, stressed the importance of this database, stating, “Quebec culture remains underrepresented in the corpora currently circulating in the AI world. We run the risk of reproducing linguistic and cultural biases, especially concerning Indigenous peoples, where the risk escalates significantly.” He believes the proposed database will serve as “strategic infrastructure” for defining how local content is identified, catalogued, and tracked by AI systems.
Protecting Creators from Exploitation
As BAnQ advances its plans, concerns surrounding copyright and the potential for cultural exploitation have surfaced within the creative community. Grégoire countered these worries by asserting that the new platform could provide greater safeguards for creators compared to the current landscape, which she likened to a “Wild West” scenario. “Data is being harvested for free, and that should not be the case,” she remarked.
The envisioned database may act as a centralised hub, facilitating fair compensation for creators whose works are utilised. Grégoire expressed optimism that through collective efforts, cultural organisations would be better equipped to protect the interests of artists and ensure the sustainability of the sector moving forward.
Nonetheless, some creators express reservations about contributing their work to AI training systems, fearing it may jeopardise their livelihoods. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research, highlighted the prevalent concern among artists: “Even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”
Project Timeline and Budget
The feasibility study anticipates the platform becoming operational by 2029, although D’Amour noted that this timeline would be reassessed after the experimental phase concludes. The study outlines a projected budget of nearly £8.5 million over five years to cover operational and capital costs. BAnQ has already secured £57,000 from the Quebec government to support the feasibility study and an additional £150,000 for the project’s twelve-month experimental phase.

Why it Matters
This initiative is crucial for not only preserving and promoting Quebec’s unique cultural identity in the age of AI but also for ensuring that creators’ rights are upheld in a rapidly evolving digital landscape. By establishing a robust database that accurately reflects Quebec’s society and languages, BAnQ is taking a significant step towards mitigating the linguistic and cultural biases that can occur in AI systems, ultimately fostering a more inclusive and representative digital future.