Commit Graph

794 Commits

Author SHA1 Message Date
ac02d407eb adding math to bob 2026-04-11 19:40:26 -04:00
9a77eda471 added config.toml to git ignore 2026-04-10 22:05:15 -04:00
26105b7daa updated BenchmarkConfig to have from_toml 2026-04-10 21:55:18 -04:00
0d81f2d17b setup FinetuneConfig 2026-04-10 21:40:17 -04:00
1409e9c63e deleted train.sh 2026-04-10 20:58:26 -04:00
259e952afc added containers dir 2026-04-10 20:48:24 -04:00
4a10a80ba0 conveted to summarization_prompts 2026-04-10 18:57:21 -04:00
03208a1ab2 moved renamed container.py to vllm_container.py 2026-04-10 13:16:18 -04:00
721526022b created working finetuing pipeline 2026-04-10 12:56:57 -04:00
921a397b1c added data dir for traning 2026-04-10 12:51:41 -04:00
b867e809cd updated spell check 2026-04-10 12:43:24 -04:00
54eb46a63e added storage pool 2026-04-10 12:42:58 -04:00
67131e7b68 added tiktoken 2026-04-10 12:42:35 -04:00
88dae310b6 added summarization_prompts.py to sore the prompts 2026-04-10 12:40:36 -04:00
24f0e8693a added tools dir for on off scripts i used 2026-04-10 12:37:14 -04:00
ced78fe516 added batch_bill_summarizer.py
batch bill  summarizer sends a batch api call to gpt
2026-04-10 12:36:39 -04:00
d281d070a3 decreased root_pool/models snapshot life 2026-04-10 08:51:03 -04:00
251da6c14a added bill_token_compression.py
tested on sample size of 100 bills matching the distribution of our data
Compression saves ~11.5% on prompt tokens; completion/reasoning are roughly equal across the two sets.
prompt	completion	reasoning	total
compressed	349,460	157,110	112,128	506,570
uncompressed	394,948	154,710	110,080	549,658
delta	−45,488	+2,400	+2,048	−43,088
2026-04-09 18:41:13 -04:00
d17c883476 created main prompt bench 2026-04-08 09:08:25 -04:00
d358f0fbec fixed sunshine.nix 2026-04-08 00:18:34 -04:00
c150fc8612 converting bob to a server 2026-04-08 00:18:17 -04:00
9c8013d69d creating prompt_bench downloader 2026-04-07 19:15:43 -04:00
af365fce9a setup sunshine.nix 2026-04-03 17:12:24 -04:00
6430049e92 updated postgres snapshot settings 2026-03-30 14:07:08 -04:00
26e4620f8f fixed systemd sandboxing 2026-03-30 14:07:08 -04:00
93fc700fa2 removed preStart step 2026-03-30 14:07:08 -04:00
8d1c1fc628 added mountpoint= to postgres zfs create 2026-03-30 14:07:08 -04:00
dda318753b improving postgres wal 2026-03-30 14:07:08 -04:00
261ff139f7 removed ds table from richie DB 2026-03-29 15:54:54 -04:00
ba8ff35109 updated ingest_congress to use congress-legislators for legislator info 2026-03-29 15:54:54 -04:00
e368402eea adding LegislatorSocialMedia 2026-03-29 15:54:54 -04:00
dd9329d218 fixed tests 2026-03-29 15:54:54 -04:00
89f6627bed converted session.execute(select to session.scalars(select 2026-03-29 15:54:54 -04:00
c5babf8bad ran treefmt 2026-03-29 15:54:54 -04:00
dae38ffd9b added ingest_congress.py 2026-03-29 15:54:54 -04:00
ca62cc36a7 adding congress data to new DS DB 2026-03-29 15:54:54 -04:00
035410f39e adding nemotron-3-nano 2026-03-29 15:54:54 -04:00
e40ab757ca making more generic exception handling 2026-03-29 15:54:54 -04:00
345ba94a59 ran ingest_posts 2026-03-29 15:54:54 -04:00
f2084206b6 adding tables for 2023 2026-03-29 15:54:54 -04:00
50e764146a added ingest_posts.py 2026-03-29 15:54:54 -04:00
ea97b5eb19 adding 2026 partitions 2026-03-29 15:54:54 -04:00
1ef2512daa adding post table 2026-03-29 15:54:54 -04:00
f9a9e5395c added media/temp for fast dir when working with data 2026-03-29 15:54:54 -04:00
d8e166a340 adding data_science_dev 2026-03-29 15:54:54 -04:00
c266ba79f4 updated snapshot_config.toml 2026-03-29 14:12:06 -04:00
f627a5ac6e enabling kafka 2026-03-26 09:59:31 -04:00
a5e7d97213 adding full qwen3 2026-03-24 16:20:21 -04:00
1419deb3c6 setting up brain nix serve 2026-03-24 15:04:48 -04:00
1f06692696 adding zstd to firefix settings 2026-03-24 12:53:44 -04:00