Commit Graph

13 Commits

Author SHA1 Message Date
Richie 30dc36588c updated BenchmarkConfig to have from_toml 2026-04-12 10:08:23 -04:00
Richie 68190901cb setup FinetuneConfig 2026-04-12 10:08:23 -04:00
Richie 275762843f deleted train.sh 2026-04-12 10:08:23 -04:00
Richie face93262f added containers dir 2026-04-12 10:08:23 -04:00
Richie ee34a0986b conveted to summarization_prompts 2026-04-12 10:08:23 -04:00
Richie e8b20bc7df moved renamed container.py to vllm_container.py 2026-04-12 10:08:23 -04:00
Richie 6c459985fa created working finetuing pipeline 2026-04-12 10:08:23 -04:00
Richie 9ffaa1b755 added summarization_prompts.py to sore the prompts 2026-04-12 10:08:23 -04:00
Richie c6b4ed4814 added tools dir for on off scripts i used 2026-04-12 10:08:23 -04:00
Richie 88ceeb55a1 added batch_bill_summarizer.py
batch bill  summarizer sends a batch api call to gpt
2026-04-12 10:08:23 -04:00
Richie cb98090f95 added bill_token_compression.py
tested on sample size of 100 bills matching the distribution of our data
Compression saves ~11.5% on prompt tokens; completion/reasoning are roughly equal across the two sets.
prompt	completion	reasoning	total
compressed	349,460	157,110	112,128	506,570
uncompressed	394,948	154,710	110,080	549,658
delta	−45,488	+2,400	+2,048	−43,088
2026-04-12 10:08:23 -04:00
Richie 63cb48a3dd created main prompt bench 2026-04-12 10:08:23 -04:00
Richie a093c72eb9 creating prompt_bench downloader 2026-04-12 10:08:23 -04:00