Skip to main content

Grant supports Northwestern Libraries launch of generative AI-based chat search

This innovative work demonstrates how libraries can not only react to but lead the creation of new technologies that address the key tasks of research and scholarship.”

Xuemao Wang
Dean of Libraries

Generative AI has been dominating headlines and popular consciousness since OpenAI publicly launched ChatGPT in late 2022, sparking rapid advances in AI capabilities and prompting questions about how to best utilize new applications across industries.  

Spurred on by this fervor, developers in the Digital Products and Data Curation (DPDC) team at the Northwestern Libraries began building a semantic search tool leveraging large language models (LLMs) and the data from the Libraries’ Digital Collections. Excited by the possibilities to better surface works from the collections in response to research questions, the team built an early prototype of a search tool and received a grant from the Institute of Museum and Library Services (IMLS) to continue the work. The tool performs a semantic search — that is, based not on keywords but rather on context — and delivers works from digitized collections with a conversational answer that gives a frame of reference around how the selected images meet the search criteria. 

“We want the library community to be leading this work— to not be behind adapting to information-seeking behaviors,” said David Schober, Products Manager and Team Lead for DPDC. “With this work, Northwestern Libraries are hoping to create excitement around using artificial intelligence to find and organize complex information.” 

Chatting with collections 

After a year’s work iterating on prototypes, DPDC is ready to integrate a generative AI search tool for Digital Collections. Initially, the search tool is only available to those with a Northwestern NetID and password, as the team hopes to receive feedback and refine the tool based on input from the Northwestern community. Northwestern users can now visit Digital Collections and ask questions related to the unique primary source collections, such as “What changes were there in African maps in the 19th century?” or “How did psychedelics influence music in the 1960s?” The new search modality can be toggled on and off with a simple checkbox in the search bar. 

For the first step of the chat implementation, metadata were embedded using a semantic model and stored in a vector database. When a user asks a question, that question is also embedded, and the tool queries the vector database to find works related to the question. The strategy used is called Retrieval Augmented Generation (RAG), where data under the control of developers – in this case, the metadata from Digital Collections – creates the parameters with which the LLM can respond. By using the RAG approach, the tool resists "hallucinations," a common concern of early AI tools, because the responses are grounded in metadata managed by experts at the Libraries. 

The tool takes the works discovered by the user’s query along with parameters set by the DPDC developers and builds a prompt instructing the LLM to respond to the question. The LLM generates the response to the prompt in a conversational manner on the front-end, sharing the context provided through the discovered library materials. It also suggests further keywords to continue with traditional keyword searches on the same topic, and points to digitized collections available for study. All of this is generated in real time. 

“The aspect I’m most excited about is having the system be used by people who have real curiosity and questions,” said Brendan Quinn, Lead AI Developer for the Libraries. “We can take their provided feedback and build the tool to be responsive to actual use cases.” 

The team hopes that an AI response grounded in the data from a collection and integrated with rich search results will change how users find and use primary source materials.  Another noteworthy application is that the user can ask questions and receive answers in multiple languages and various formats. 

Leading the way 

The implementation of the search is supported by a National Leadership Grant from IMLS for more than $430,000. James Lee, Associate University Librarian for Academic Innovation and Associate Professor in the Medill School of Journalism, Media & Integrated Marketing Communications, will act as principal investigator for the grant. Lee envisions the project as a crucial part of the Libraries’ vision to transform digital research by leveraging artificial intelligence and machine learning.  

“This is a first step toward our bold vision of how the Libraries can adapt to the changing information needs of users—whether that’s finding ancient manuscripts or understanding raw data,” he said.  

The IMLS grant is an investment in pushing the envelope for discoverability of collections. “We are proud and grateful that IMLS has awarded Northwestern Libraries a significant National Leadership Grant recognizing our trailblazing work with generative AI search,” said Xuemao Wang, Dean of Libraries. “This innovative work demonstrates how libraries can not only react to but lead the creation of new technologies that address the key tasks of research and scholarship.”

Blazing trails 

The chat-based semantic search for Digital Collections is only the beginning. The two-year grant also supports the creation of toolkits that will help other libraries and cultural institutions experiment and integrate this transformative technology into their discovery platforms. 

Upcoming work funded by the IMLS grant also includes using generative AI for augmenting human-mediated metadata creation. The goal is to develop a system that can supplement descriptions for massive amounts of digitized materials on an item-specific level, which metadata librarians then build upon with the complex work of contextualizing information. Generative AI presents the opportunity to feasibly provide item-level descriptions for vast collections at scale, while human librarians provide a deeper analysis. In early testing, the team has had success integrating tools that can parse handwritten text, place complex documents in context, and transcribe video. 

“We need to describe materials adequately and speedily to make them available for researchers,” said Jen Young, Metadata Coordinator for DPDC. Young is one of two metadata librarians on the grant project team.  

Both the chat-based search and the metadata augmentation applications will be replicable by other libraries, as intended by the National Leadership Grant. This fits with a long-standing commitment by the Libraries for collaborative open-source work across the library community. 

The outcomes of this grant have the potential for positive impact both on the scholarly community of researchers and on the field of librarianship, Quinn said. But whatever new trails this team can blaze, it is ultimately only adding “another tool in the toolkit” of scholarly exploration. 

“Getting the appropriate resources in people’s hands: that’s the North Star,” he said.