You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<palign="center"><b>Substantial performance improvements across all models when optimized prompts are generated by PromptWizard on GSM8k dataset</b></p>
587
-
</div>
598
+
</div>
599
+
<br>
600
+
<buttonclass="btn" onclick="toggleContent(this,'11')">Comparision with Feedback based and other Prompt Optimization Techniques<spanclass="icon">+</span></button>
601
+
<divclass="col_content_11">
602
+
<palign="center">
603
+
<table>
604
+
<tr>
605
+
<td>Dataset</td>
606
+
<tdcolspan="4">Accuracy (high)</td>
607
+
</tr>
608
+
<tr>
609
+
<td></td>
610
+
<td>DSPy</td>
611
+
<td>PromptAgent </td>
612
+
<td>APO</td>
613
+
<td>PW</td>
614
+
</tr>
615
+
<tr>
616
+
<td>GSM8k</td>
617
+
<td>78.2</td>
618
+
<td>68.84</td>
619
+
<td>25.67</td>
620
+
<td><b>90</b></td>
621
+
</tr>
622
+
<tr>
623
+
<td>AQUARAT</td>
624
+
<td>55.1</td>
625
+
<td>56.67</td>
626
+
<td>20.12</td>
627
+
<td><b>58.2</b></td>
628
+
</tr>
629
+
<tr>
630
+
<td>SVAMP</td>
631
+
<td>77</td>
632
+
<td>78.67</td>
633
+
<td>75.25</td>
634
+
<td><b>82.3</b></td>
635
+
</tr>
636
+
<tr>
637
+
<td>ETHOS</td>
638
+
<td>84.1</td>
639
+
<td>84.25</td>
640
+
<td>80.62</td>
641
+
<td><b>89.4</b></td>
642
+
</tr>
643
+
</table>
644
+
<br>
645
+
<table>
646
+
<tr>
647
+
<td>Dataset</td>
648
+
<tdcolspan="4">Calls (low)</td>
649
+
</tr>
650
+
<tr>
651
+
<td></td>
652
+
<td>DSPy</td>
653
+
<td>PromptAgent </td>
654
+
<td>APO</td>
655
+
<td>PW</td>
656
+
</tr>
657
+
<tr>
658
+
<td>GSM8k</td>
659
+
<td>915</td>
660
+
<td>2115</td>
661
+
<td>8490</td>
662
+
<td><b>147</b></td>
663
+
</tr>
664
+
<tr>
665
+
<td>AQUARAT</td>
666
+
<td>920</td>
667
+
<td>2200</td>
668
+
<td>8500</td>
669
+
<td><b>112</b></td>
670
+
</tr>
671
+
<tr>
672
+
<td>SVAMP</td>
673
+
<td>2300</td>
674
+
<td>2111</td>
675
+
<td>8000</td>
676
+
<td><b>178</b></td>
677
+
</tr>
678
+
<tr>
679
+
<td>ETHOS</td>
680
+
<td>660</td>
681
+
<td>2217</td>
682
+
<td>8200</td>
683
+
<td><b>80</b></td>
684
+
</tr>
685
+
</table>
686
+
<br>
687
+
<table>
688
+
<tr>
689
+
<td>Dataset</td>
690
+
<tdcolspan="4">Tokens (low)</td>
691
+
</tr>
692
+
<tr>
693
+
<td></td>
694
+
<td>DSPy</td>
695
+
<td>PromptAgent </td>
696
+
<td>APO</td>
697
+
<td>PW</td>
698
+
</tr>
699
+
<tr>
700
+
<td>GSM8k</td>
701
+
<td>262</td>
702
+
<td>500</td>
703
+
<td><b>109</b></td>
704
+
<td>237</td>
705
+
</tr>
706
+
<tr>
707
+
<td>AQUARAT</td>
708
+
<td>326</td>
709
+
<td>875</td>
710
+
<td><b>125</b></td>
711
+
<td>200</td>
712
+
</tr>
713
+
<tr>
714
+
<td>SVAMP</td>
715
+
<td>189</td>
716
+
<td>680</td>
717
+
<td><b>85</b></td>
718
+
<td>127</td>
719
+
</tr>
720
+
<tr>
721
+
<td>ETHOS</td>
722
+
<td>175</td>
723
+
<td>417</td>
724
+
<td><b>55</b></td>
725
+
<td>190</td>
726
+
</tr>
727
+
</table>
728
+
</p>
729
+
<br>
730
+
<palign="center"><b>PromptWizard outperforms feedback based methods like APO, PromptAgent and other prompt optimization techniques like DSPy in terms of accuracy and number of API calls for optimization on various datasets. For the case of
731
+
number of average tokens per call, PromptWizard uses the second least number in most cases and is only behind APO which being a techinque designed for only binary classification tasks generates smaller sized prompts (hence uses fewer tokens) and is not extensible to
0 commit comments