Commit Graph

12 Commits

Author SHA1 Message Date
68190901cb setup FinetuneConfig 2026-04-12 10:08:23 -04:00
275762843f deleted train.sh 2026-04-12 10:08:23 -04:00
face93262f added containers dir 2026-04-12 10:08:23 -04:00
ee34a0986b conveted to summarization_prompts 2026-04-12 10:08:23 -04:00
e8b20bc7df moved renamed container.py to vllm_container.py 2026-04-12 10:08:23 -04:00
6c459985fa created working finetuing pipeline 2026-04-12 10:08:23 -04:00
9ffaa1b755 added summarization_prompts.py to sore the prompts 2026-04-12 10:08:23 -04:00
c6b4ed4814 added tools dir for on off scripts i used 2026-04-12 10:08:23 -04:00
88ceeb55a1 added batch_bill_summarizer.py
batch bill  summarizer sends a batch api call to gpt
2026-04-12 10:08:23 -04:00
cb98090f95 added bill_token_compression.py
tested on sample size of 100 bills matching the distribution of our data
Compression saves ~11.5% on prompt tokens; completion/reasoning are roughly equal across the two sets.
prompt	completion	reasoning	total
compressed	349,460	157,110	112,128	506,570
uncompressed	394,948	154,710	110,080	549,658
delta	−45,488	+2,400	+2,048	−43,088
2026-04-12 10:08:23 -04:00
63cb48a3dd created main prompt bench 2026-04-12 10:08:23 -04:00
a093c72eb9 creating prompt_bench downloader 2026-04-12 10:08:23 -04:00