Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discover] Presentation of document and row counts #196444

Open
ryankeairns opened this issue Oct 15, 2024 · 4 comments
Open

[Discover] Presentation of document and row counts #196444

ryankeairns opened this issue Oct 15, 2024 · 4 comments
Assignees
Labels
Feature:Discover Discover Application Feature:ES|QL ES|QL related features in Kibana impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews)

Comments

@ryankeairns
Copy link
Contributor

ryankeairns commented Oct 15, 2024

Related: #177156 #166219 #195787

Problem

  • Users desire to know the total 'hits' (aka documents, rows, etc.) returned by their search/query
  • In relation, users desire to know what the histogram reflects and why/how it may differ from the number of hits returned

User experience considerations

  • Presentation of counts are inconsistent between data view and ES|QL modes; they should behave similarly to the degree possible
  • The term 'Documents' has less relevance in ES|QL mode; we might instead consider a term like 'Rows'
  • It is unclear what the various values presented in the UI mean (value in tab above data grid; count value when hovering the histogram; LIMIT value in the ES|QL editor, etc.)
  • For experienced users, the value has moved around for better or worse. It used to be in the histogram section, now it is in the tab
  • Related, there can be multiple queries happening - one for the histogram, one for the data grid

Technical considerations

( @kertal 's findings below )
In the data view world, we can have the following number of hits

  • x documents actually returned from ES
  • x documents matching the query in ES
  • x downsampled documents were used for the histogram

It’s complicated. Using ES|QL is simpler since it doesn’t seem to be aware of downsampled data, but once it is, same challenge

Image

Current UI

Total hits in tab

Old placement in histogram area (pre tabs)

Values in ES|QL mode

@ryankeairns ryankeairns added Feature:Discover Discover Application Feature:ES|QL ES|QL related features in Kibana Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews) labels Oct 15, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@kertal
Copy link
Member

kertal commented Oct 16, 2024

I'm adding the testdata for the downsampled data use case (just work in DataView Mode)

Click to expand ``` PUT stats-index { "mappings": { "properties": { "agg_metric": { "type": "aggregate_metric_double", "metrics": [ "min", "max", "sum", "value_count" ], "default_metric": "max" } } } }

PUT stats-index/_doc/1
{
"@timestamp": "2024-10-15T08:00:00Z",
"agg_metric": {
"min": -302.50,
"max": 702.30,
"sum": 200.0,
"value_count": 25
}
}

PUT stats-index/_doc/2
{
"@timestamp": "2024-10-15T09:00:00Z",
"agg_metric": {
"min": -93.00,
"max": 1702.30,
"sum": 300.00,
"value_count": 25
}
}

PUT my_index
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"my_histogram": {
"type": "histogram"
},
"my_text": {
"type": "keyword"
}
}
}
}

PUT my_index/_doc/1
{
"@timestamp": "2023-11-23T17:47:23Z",
"my_text": "histogram_1",
"my_histogram": {
"values": [0.1, 0.2, 0.3, 0.4, 0.5],
"counts": [3, 7, 23, 12, 6]
},
"_doc_count": 45
}

PUT my_index/_doc/2
{
"@timestamp": "2023-11-23T17:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 62
}

PUT my_index/_doc/3
{
"@timestamp": "2023-11-25T17:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 62
}

PUT my_index/_doc/4
{
"@timestamp": "2023-11-25T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 12
}

PUT my_index/_doc/5
{
"@timestamp": "2023-11-26T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 1
}

PUT my_index/_doc/6
{
"@timestamp": "2023-11-27T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 1
}

PUT my_index/_doc/7
{
"@timestamp": "2023-11-28T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 2323
}

PUT my_index/_doc/8
{
"@timestamp": "2023-11-28T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 23234
}

PUT my_index/_doc/9
{
"@timestamp": "2023-11-28T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 2
}

PUT my_index/_doc/10
{
"@timestamp": "2023-11-28T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 2
}

PUT my_index/_doc/11
{
"@timestamp": "2023-11-29T18:49:23Z",
"my_text": "histogram_2",
"my_histogram": {
"values": [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts": [8, 17, 8, 7, 6, 2]
},
"_doc_count": 12
}

GET my_index/_search
{
"aggs": {
"histogram_titles": {
"terms": { "field": "my_text" }
}
}
}

</details>

@kertal kertal added the impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. label Oct 16, 2024
@florent-leborgne
Copy link
Contributor

florent-leborgne commented Oct 18, 2024

👋 I gathered some thoughts about the wording, depth of information and location of information. I may have missed some occurrences or cases where the UI would look a bit different, but feel free to check it here: https://www.figma.com/design/uHUXd5MGcJWGTwpHwWRpYY/Scratchpad?node-id=370-187&t=ORIHkTsVvYhn2MW6-1

A summary of my thinking here after the background and comments I could read & get:

  • Use "Results" as a generic term instead of "Documents". Other options like records, hits, or keeping documents seem less accurate. I'm also not a fan of "rows" as for me it rather refers to the container rather than to the content.
  • Show numbers that correspond to what the user is looking at:
    • Seeing the histogram, it feels to me like it's a valuable information to see a "Total count" of what's currently represented in the visualization. I don't think we need a name for (h)it as just Count or Total count feels simple enough. If we really need one, "results" again?
    • To me the number in the tab name doesn't make any sense. As a user, I'm opening what is basically a big table. So I'd rather expect to see the number of entries, but in this case... it's not obvious. For that reason, I think it'd be good to somehow attach this information to the pagination, either right before the table, or under? I always found Gmail to do it clearly, like Results 101-200 of 1888 for example. If for some reason/setting/LIMIT parameter the number of results that are loaded into the table is different from the total number of results, the current behaviour with the information on the last page looks like a good option to me. ES|QL mode doesn't seem to have pagination by default, so the same example with the default limit could just be Results 1-1000 of 1888 of the first (which is also the last) page.
  • Add context where there can be confusion. It was noted a few times that users can get confused as to what these numbers mean exactly. We could add tooltips or popovers to provide that information, or at least context. 2 important ones based on my suggestions above would be to have one next to the total count in the histogram, and one next to the 1-100 of 1888. On each we could explain where these numbers come from, and what can influence them, talk about downsampling or aggregations.

For details and nuances, please check my Figma scratchpad linked earlier.
I'm probably missing some things and my understanding isn't super deep yet so please keep me honest here. Those were the things that struck me the most as a newbie looking at the UI

@kertal
Copy link
Member

kertal commented Oct 25, 2024

Let's start with applying the renaming of the "Documents" tab, it's a good issue to start:
#197779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Discover Discover Application Feature:ES|QL ES|QL related features in Kibana impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews)
Projects
None yet
Development

No branches or pull requests

5 participants