Skip to content

Commit ac1ebf2

Browse files
committed
updated several sections
1 parent 22426d7 commit ac1ebf2

File tree

1 file changed

+43
-65
lines changed

1 file changed

+43
-65
lines changed

index.html

Lines changed: 43 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ <h2 class="title is-3">Introduction</h2>
168168
<!-- INTRODUCTION & OVERVIEW SECTION -->
169169
<section class="section">
170170
<div class="container is-max-desktop">
171-
<h2 class="title is-3">Introduction & Motivation</h2>
171+
<h2 class="title is-3 has-text-centered">Motivation</h2>
172172
<div class="content has-text-justified">
173173
<p>In recent years, there has been a significant surge in the capabilities of large language models (LLMs) in generating human-like text and performing a wide range of natural language processing tasks. State-of-the-art models like GPT-4o, OpenAI o1/o3, and Google's Gemini have achieved superior performance in knowledge QA, instruction-following, and code generation.</p>
174174
<p>Despite recent advances, many real-world applications require not only fluency in the content of the output but also precise control over its structure. This includes tasks where the expected output must follow specific formats such as JSON, XML, LaTeX, HTML, or code in frameworks like React or Vue. These types of structured output are essential in domains like software development, data pipelines, user interface generation, and scientific publishing, where incorrect formatting can lead to disrupted pipelines or non-functional outputs.</p>
@@ -440,85 +440,46 @@ <h2 class="title is-3" id="leaderboard">Leaderboard</h2>
440440
</div>
441441
</div>
442442
</div>
443-
<!-------------------------------------------------------------------- Image Type SECTION -------------------------------------------------------------------->
443+
<!-------------------------------------------------------------------- Rendered Image Types SECTION -------------------------------------------------------------------->
444444
<div class="columns is-centered m-6">
445445
<div class="column is-full has-text-centered content">
446446
<h2 class="title is-3">Rendered Images from different format types</h2>
447447
<div class="content has-text-justified">
448448
<p>
449-
We compare the performance of various models across the most frequent structured output formats.
450-
Across all types, GPT-4 and Claude consistently outperform other models by a significant margin.
451-
Open-source models demonstrate relatively strong performance in categories like JSON and YAML, which are more common formats.
452-
However, for more specialized formats like SVG, LaTeX equations, and React components, all models except the leading proprietary ones obtain lower scores.
453-
This indicates that existing models still struggle with complex structured output generation tasks.
449+
Below are examples of various structured output formats rendered by models in our benchmark.
450+
These examples showcase the diversity of data formats evaluated in StructEval,
451+
from markup languages like HTML and LaTeX to data-interchange formats like JSON and YAML.
452+
The quality and correctness of these rendered outputs contribute to the model's overall score.
454453
</p>
455454
</div>
456-
<div class="model-labels-container">
457-
<span class="model-label" style="background-color: rgba(196, 123, 160, 0.5);">Llama-3-8B</span>
458-
<span class="model-label" style="background-color: rgba(245, 123, 113, 0.5);">Mixtral-8x7B</span>
459-
<span class="model-label" style="background-color: rgba(255, 208, 80, 0.5);">Llama-3-70B</span>
460-
<span class="model-label" style="background-color: rgba(110, 194, 134, 0.5);">Mistral-7B</span>
461-
<span class="model-label" style="background-color: rgba(255, 153, 78, 0.5);">Gemini-1.0-Pro</span>
462-
<span class="model-label" style="background-color: rgba(42, 149, 235, 0.5);">Gemini-1.5-Pro</span>
463-
<span class="model-label" style="background-color: rgba(183, 156, 220, 0.5);">Claude-3-Opus</span>
464-
<span class="model-label" style="background-color: rgba(143, 169, 209, 0.5);">Claude-3-Sonnet</span>
465-
<span class="model-label" style="background-color: rgba(72, 199, 176, 0.5);">GPT-4-Turbo</span>
466-
<span class="model-label" style="background-color: rgba(117, 209, 215, 0.5);">GPT-4o</span>
467-
</div>
468455
<div class="content has-text-centered">
469-
<div class="chart-grid">
470-
<!-- Chart 1: JSON -->
471-
<div class="chart-item">
472-
<canvas id="chart_JSON"></canvas>
473-
<p class="chart-label">JSON (245)</p>
474-
</div>
475-
<!-- Chart 2: YAML -->
476-
<div class="chart-item">
477-
<canvas id="chart_YAML"></canvas>
478-
<p class="chart-label">YAML (198)</p>
479-
</div>
480-
<!-- Chart 3: HTML -->
481-
<div class="chart-item">
482-
<canvas id="chart_HTML"></canvas>
483-
<p class="chart-label">HTML (210)</p>
456+
<div class="columns is-multiline is-centered">
457+
<div class="column is-one-third">
458+
<img src="static/images/imgs/000107.png" alt="Structured Output Example 1" style="width: 100%; max-width: 350px;">
459+
<p class="caption">JSON Generation Example</p>
484460
</div>
485-
<!-- Chart 4: XML -->
486-
<div class="chart-item">
487-
<canvas id="chart_XML"></canvas>
488-
<p class="chart-label">XML (187)</p>
461+
<div class="column is-one-third">
462+
<img src="static/images/imgs/000829.png" alt="Structured Output Example 2" style="width: 100%; max-width: 350px;">
463+
<p class="caption">HTML Generation Example</p>
489464
</div>
490-
<!-- Chart 5: Markdown -->
491-
<div class="chart-item">
492-
<canvas id="chart_Markdown"></canvas>
493-
<p class="chart-label">Markdown (165)</p>
465+
<div class="column is-one-third">
466+
<img src="static/images/imgs/000936.png" alt="Structured Output Example 3" style="width: 100%; max-width: 350px;">
467+
<p class="caption">React Component Example</p>
494468
</div>
495-
<!-- Chart 6: CSV -->
496-
<div class="chart-item">
497-
<canvas id="chart_CSV"></canvas>
498-
<p class="chart-label">CSV (155)</p>
469+
<div class="column is-one-third">
470+
<img src="static/images/imgs/001126.png" alt="Structured Output Example 4" style="width: 100%; max-width: 350px;">
471+
<p class="caption">SVG Generation Example</p>
499472
</div>
500-
<!-- Chart 7: LaTeX -->
501-
<div class="chart-item">
502-
<canvas id="chart_LaTeX"></canvas>
503-
<p class="chart-label">LaTeX (142)</p>
473+
<div class="column is-one-third">
474+
<img src="static/images/imgs/001210.png" alt="Structured Output Example 5" style="width: 100%; max-width: 350px;">
475+
<p class="caption">LaTeX Equation Example</p>
504476
</div>
505-
<!-- Chart 8: SQL -->
506-
<div class="chart-item">
507-
<canvas id="chart_SQL"></canvas>
508-
<p class="chart-label">SQL (135)</p>
509-
</div>
510-
<!-- Chart 9: SVG -->
511-
<div class="chart-item">
512-
<canvas id="chart_SVG"></canvas>
513-
<p class="chart-label">SVG (127)</p>
514-
</div>
515-
<!-- Chart 10: React -->
516-
<div class="chart-item">
517-
<canvas id="chart_React"></canvas>
518-
<p class="chart-label">React (118)</p>
477+
<div class="column is-one-third">
478+
<img src="static/images/imgs/001312.png" alt="Structured Output Example 6" style="width: 100%; max-width: 350px;">
479+
<p class="caption">YAML to XML Conversion Example</p>
519480
</div>
520481
</div>
521-
<p class="bottom-text">Selected models' performance on different structured format types. The numbers indicate the count of examples for each format in the benchmark.</p>
482+
<p class="bottom-text">Examples of different structured format types generated by models in our benchmark.</p>
522483
</div>
523484
</div>
524485
</div>
@@ -562,6 +523,23 @@ <h2 class="title is-3">Generation VS Conversion Tasks</h2>
562523
</div>
563524
</div>
564525
</div>
526+
<!-------------------------------------------------------------------- Annotation Pipeline SECTION -------------------------------------------------------------------->
527+
<div class="columns is-centered m-6">
528+
<div class="column is-full has-text-centered content">
529+
<h2 class="title is-3">Annotation Pipeline</h2>
530+
<div class="content has-text-justified">
531+
<p>
532+
The annotation process for StructEval follows a rigorous methodology to ensure high-quality evaluation data.
533+
Our pipeline includes validation of syntax correctness, semantic accuracy, and visual question-answer pairs for formats
534+
that produce rendered outputs. Each sample undergoes multiple expert verification steps before being included in the benchmark.
535+
</p>
536+
</div>
537+
<div class="content has-text-centered">
538+
<embed src="static/images/annotation_pipeline.pdf" type="application/pdf" width="80%" height="600px" />
539+
<p>Figure: The annotation pipeline for the StructEval benchmark.</p>
540+
</div>
541+
</div>
542+
</div>
565543
<!-------------------------------------------------------------------- Error Analysis SECTION -------------------------------------------------------------------->
566544
<div class="columns is-centered m-6">
567545
<div class="column is-full has-text-centered content">

0 commit comments

Comments
 (0)