mahing commited on
Commit
5172cc6
·
verified ·
1 Parent(s): 51df79f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -18,7 +18,7 @@ LLMs can be used to build out accurate and informative first-person narratives f
18
  To successfully fine-tune an LLM for this task, I first picked a suitable base model that created passable narratives with few-shot prompting and had few enough parameters to not require massive amounts of compute for fine-tuning. I chose to use Qwen2.5-1M for this purpose. I then used Gutenberg and other sources to find historical documents that could be used as input data to train the custom Qwen model, matching a synthetically generated narrative to each historical document as the output. This was used as the training data for LoRA, which updated the most relevant parameters for my custom task, leading to strong qualitative results. The historical narratives generated after fine-tuning were much stronger than current LLM results and exceeded expectations. This result also did not worsen the model’s general understanding based on evaluation metrics. If used in schools, this model could create engaging, creative, and informative first-person narratives to build knowledge and interest in history for students.
19
 
20
  **Training Data** <br />
21
-
22
  **Training Method** <br />
23
  I chose to use LoRA for my task of creating first-person historical narratives of an era. Based on previous results, few shot prompting sometimes did not capture the improvements I hoped to see from responses. Full fine-tuning would be more computationally intensive than LoRA and does not seem necessary for my task. LoRA is a good balance between the two, only changing some parameters related to my task, and using the data set to update key parameters to help create narratives in a style that better matches the prose of an era and the historical accuracy of it. LoRA can also perform well without a massive training data set because of its low-rank adaptations.
24
  For my hyperparameter combinations, I chose to use LORA_R = 128, LORA_ALPHA = 128, and LORA_DROPOUT = .1. These hyperparameters had the best qualitative results out of the options I tried. Despite my smaller data set, this approach gave strong first-person narratives that I enjoyed. They included prose from the era, were historically accurate, and even included imagery and entertaining details that I'd expect from a quality response. The results from these hyperparameters exceeded any expectations I had.
@@ -54,6 +54,8 @@ generated_text = tokenizer.decode(output[0], skip_special_tokens = True)
54
  print(generated_text)
55
  ```
56
 
 
 
57
  **Prompt Format** <br />
58
  My prompt is formatted to have the model build a first-person narrative based upon a certain event or era. The narrative should be engaging, accurate, and include prose and vivid details from the era to be entertaining and informative to the reader. <br />
59
  Example Prompt: <br />
 
18
  To successfully fine-tune an LLM for this task, I first picked a suitable base model that created passable narratives with few-shot prompting and had few enough parameters to not require massive amounts of compute for fine-tuning. I chose to use Qwen2.5-1M for this purpose. I then used Gutenberg and other sources to find historical documents that could be used as input data to train the custom Qwen model, matching a synthetically generated narrative to each historical document as the output. This was used as the training data for LoRA, which updated the most relevant parameters for my custom task, leading to strong qualitative results. The historical narratives generated after fine-tuning were much stronger than current LLM results and exceeded expectations. This result also did not worsen the model’s general understanding based on evaluation metrics. If used in schools, this model could create engaging, creative, and informative first-person narratives to build knowledge and interest in history for students.
19
 
20
  **Training Data** <br />
21
+ For this task, I utilized the various first-person sources and historical documents from Project Gutenberg as input data, along with manual searching for certain well-known documents. Project Gutenberg’s main goal is to digitize cultural and historical works, thereby including many biographies and memoirs throughout history that would be perfect in teaching an LLM to build out an accurate narrative from the document’s era. The output corresponding to this input data will be a first-person narrative based on the events in the input data. For example, if the input data is the description of a war, the output can be a soldier’s first-person account of their daily life during the war. The main source of my data wrangling was synthetically generating these first-person narratives using an LLM and testing its output using my knowledge and other LLMs to determine the strength of the response. Doing this was a tedious task, and I finished with approximately 900 document-narrative pairs, which I split up into 700 for the training set, and 100 for the validation and test sets using a random seed of 42.
22
  **Training Method** <br />
23
  I chose to use LoRA for my task of creating first-person historical narratives of an era. Based on previous results, few shot prompting sometimes did not capture the improvements I hoped to see from responses. Full fine-tuning would be more computationally intensive than LoRA and does not seem necessary for my task. LoRA is a good balance between the two, only changing some parameters related to my task, and using the data set to update key parameters to help create narratives in a style that better matches the prose of an era and the historical accuracy of it. LoRA can also perform well without a massive training data set because of its low-rank adaptations.
24
  For my hyperparameter combinations, I chose to use LORA_R = 128, LORA_ALPHA = 128, and LORA_DROPOUT = .1. These hyperparameters had the best qualitative results out of the options I tried. Despite my smaller data set, this approach gave strong first-person narratives that I enjoyed. They included prose from the era, were historically accurate, and even included imagery and entertaining details that I'd expect from a quality response. The results from these hyperparameters exceeded any expectations I had.
 
54
  print(generated_text)
55
  ```
56
 
57
+ There are many use cases that can come out of this fine-tuned model, mainly related to education and entertainment. Output from the model can provide educational and fun narratives that teach students about the events from a particular era and how individuals at the time thought about the events taking place. This method makes learning history more immersive, allowing students to have a better understanding of history and to think critically about the nuanced feelings about a certain historical event. The model can be used for various forms of entertainment as well. For example, the output from the model can be voice-acted and thereby turned into a podcast or a museum audio tour. The model can even be used for research, distilling dense amounts of historical text into a quick summary and preserving the culture of an era through the documents of the time.
58
+
59
  **Prompt Format** <br />
60
  My prompt is formatted to have the model build a first-person narrative based upon a certain event or era. The narrative should be engaging, accurate, and include prose and vivid details from the era to be entertaining and informative to the reader. <br />
61
  Example Prompt: <br />