Skip to content

Commit dab63a0

Browse files
Scenario Demo
1 parent 6ffa7e3 commit dab63a0

File tree

1 file changed

+149
-1
lines changed

1 file changed

+149
-1
lines changed

docs/index.html

Lines changed: 149 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,17 @@
101101
background-color: #f1f1f1;
102102
display: none;
103103
}
104+
table {
105+
width: 100%;
106+
border-collapse: collapse;
107+
}
108+
table, th, td {
109+
border: 1px solid black;
110+
}
111+
th, td {
112+
padding: 8px;
113+
text-align: left;
114+
}
104115

105116
.btn {
106117
display: flex; /* Use flexbox for layout */
@@ -584,7 +595,144 @@ <h2 class="title is-3">Results</h2>
584595
<img src="./images/prompting.png" >
585596
</p>
586597
<p align="center"><b>Substantial performance improvements across all models when optimized prompts are generated by PromptWizard on GSM8k dataset</b></p>
587-
</div>
598+
</div>
599+
<br>
600+
<button class="btn" onclick="toggleContent(this,'11')">Comparision with Feedback based and other Prompt Optimization Techniques<span class="icon">+</span></button>
601+
<div class="col_content_11">
602+
<p align="center">
603+
<table>
604+
<tr>
605+
<td>Dataset</td>
606+
<td colspan="4">Accuracy (high)</td>
607+
</tr>
608+
<tr>
609+
<td></td>
610+
<td>DSPy</td>
611+
<td>PromptAgent </td>
612+
<td>APO</td>
613+
<td>PW</td>
614+
</tr>
615+
<tr>
616+
<td>GSM8k</td>
617+
<td>78.2</td>
618+
<td>68.84</td>
619+
<td>25.67</td>
620+
<td><b>90</b></td>
621+
</tr>
622+
<tr>
623+
<td>AQUARAT</td>
624+
<td>55.1</td>
625+
<td>56.67</td>
626+
<td>20.12</td>
627+
<td><b>58.2</b></td>
628+
</tr>
629+
<tr>
630+
<td>SVAMP</td>
631+
<td>77</td>
632+
<td>78.67</td>
633+
<td>75.25</td>
634+
<td><b>82.3</b></td>
635+
</tr>
636+
<tr>
637+
<td>ETHOS</td>
638+
<td>84.1</td>
639+
<td>84.25</td>
640+
<td>80.62</td>
641+
<td><b>89.4</b></td>
642+
</tr>
643+
</table>
644+
<br>
645+
<table>
646+
<tr>
647+
<td>Dataset</td>
648+
<td colspan="4">Calls (low)</td>
649+
</tr>
650+
<tr>
651+
<td></td>
652+
<td>DSPy</td>
653+
<td>PromptAgent </td>
654+
<td>APO</td>
655+
<td>PW</td>
656+
</tr>
657+
<tr>
658+
<td>GSM8k</td>
659+
<td>915</td>
660+
<td>2115</td>
661+
<td>8490</td>
662+
<td><b>147</b></td>
663+
</tr>
664+
<tr>
665+
<td>AQUARAT</td>
666+
<td>920</td>
667+
<td>2200</td>
668+
<td>8500</td>
669+
<td><b>112</b></td>
670+
</tr>
671+
<tr>
672+
<td>SVAMP</td>
673+
<td>2300</td>
674+
<td>2111</td>
675+
<td>8000</td>
676+
<td><b>178</b></td>
677+
</tr>
678+
<tr>
679+
<td>ETHOS</td>
680+
<td>660</td>
681+
<td>2217</td>
682+
<td>8200</td>
683+
<td><b>80</b></td>
684+
</tr>
685+
</table>
686+
<br>
687+
<table>
688+
<tr>
689+
<td>Dataset</td>
690+
<td colspan="4">Tokens (low)</td>
691+
</tr>
692+
<tr>
693+
<td></td>
694+
<td>DSPy</td>
695+
<td>PromptAgent </td>
696+
<td>APO</td>
697+
<td>PW</td>
698+
</tr>
699+
<tr>
700+
<td>GSM8k</td>
701+
<td>262</td>
702+
<td>500</td>
703+
<td><b>109</b></td>
704+
<td>237</td>
705+
</tr>
706+
<tr>
707+
<td>AQUARAT</td>
708+
<td>326</td>
709+
<td>875</td>
710+
<td><b>125</b></td>
711+
<td>200</td>
712+
</tr>
713+
<tr>
714+
<td>SVAMP</td>
715+
<td>189</td>
716+
<td>680</td>
717+
<td><b>85</b></td>
718+
<td>127</td>
719+
</tr>
720+
<tr>
721+
<td>ETHOS</td>
722+
<td>175</td>
723+
<td>417</td>
724+
<td><b>55</b></td>
725+
<td>190</td>
726+
</tr>
727+
</table>
728+
</p>
729+
<br>
730+
<p align="center"> <b>PromptWizard outperforms feedback based methods like APO, PromptAgent and other prompt optimization techniques like DSPy in terms of accuracy and number of API calls for optimization on various datasets. For the case of
731+
number of average tokens per call, PromptWizard uses the second least number in most cases and is only behind APO which being a techinque designed for only binary classification tasks generates smaller sized prompts (hence uses fewer tokens) and is not extensible to
732+
other tasks.
733+
</p>
734+
</b>
735+
</div>
588736
</div>
589737
</div>
590738
</div>

0 commit comments

Comments
 (0)