Unveiling the Past: AI-Powered Historical Book Question Answering

Hari  Thapliyal

Authors

Hari Thapliyal

Abstract

In the recent past, we have seen a number of approaches, datasets, models, and
even large language models (LLM) used for question-answering. The results of
these initiatives are very encouraging, but generating questions from the given
document, and generating a correct answer to a given question from a given
document, is still challenging. It becomes more challenging when the text is
historical, and it is translated from another language and a different script is used
to write the translation than the script of the original text. This problem is because
of the spelling of nouns in the original script. Secondly generating description
answers and evaluating the correctness of descriptive answers is a challenge.
Thirdly, if n number of question-answer pairs are generated from a certain corpus
then how do you measure the performance of this model? Finally, if we have a
large body of text and some questions in our mind then without giving the context
how to get the answer? In this work, we are exploring techniques for creating
questions and answers for a book corpus using ChatGPT and other available
techniques, we call this QAGS. Secondly, we are finetuning the t5, flan-t5 model for
creating an answer generation system, we call this AGS. Thirdly we are retrieving
a relevant document that can answer a question in our hand, we call this DRS.
Finally, we are creating and evaluating a system that can answer a history
question without any context, we call this RAAGS. In this work, we are using The
Mahabharata book as a corpus. To evaluate the different sub-systems we have
used different metrics like BLEU, ROUGE, Accuracy, Recall, R@n, P@n, F1@n,
Cosine. We have used SentenceTransformer for text embedding. Our approach
does not depend upon the domain, script, or language of the text document.
Neither, does it depend upon the Era when the text was written. It doesn’t need
any manual feature engineering of the text. We have explored SOTA transformers
like T5, distilBERT, RoBERTa, Bloom, BERT, BigBird from huggingface for zeroshot
learning. The cosine between answer & question, answer & chunk is 0.91. In DRS
MRR metric is 0.55, and MAP is 0.25. In AGS cosine between the reference answer
and predicted answer is 0.827. In RAAGS cosine between the reference answer
and the predicted answer is 0.763
Keywords: Question Answering with NLP, Historical Books Question Answering,
Question Answering Generation, NLP Transformer Models

Unveiling the Past: AI-Powered Historical Book Question Answering

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Information

Current Issue