-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale.xdiscrete levels #1130
Comments
Thanks for capturing this. We actually discussed this behavior that results in the error above and we weren't sure what the right solution was. The handling of missing values is often not easy to spot in the code and I wouldn't be surprised if there were other issues with missing values. It probably isn't too hard to get the old behavior back (if we want it) but, as mentioned previously, it would probably be great if the handling of missing values was more centralized if possible. |
With a dataset containing 7 categories, if I use The NA label is added (by Gadfly) to the 3 levels I didn't specify (its not part of the raw data), which then becomes the first boxplot in the plots above. I'm also undecided whether that should be part of the default behaviour. But as mentioned, the minimum behaviour should be a plot with the 4 levels I specified. |
perhaps related, and sorry to hijack this issue if not, but the Geom.segment example is throwing the following error on master now:
|
Let's file a separate issue for the |
So what would be the best behavior? Show a "missing" category or hide it? I'm not familiar with Gadfly, but it looks weird to group all values which do not appear in the specified levels in a "missing" category, while missing values are not represented by default. |
I think that showing the "Other" boxplot (i.e. the categories I didn't specify in But for now, I would like to see the minimum behaviour (as described above) work on master. |
Another issue I encountered on Gadfly master is a 255 "group" limit, I don't recall hitting a limit with PooledDataArray. This is potentially an issue for polygon-based maps: using DataFrames, VoronoiDelaunay
srand(123)
tess = DelaunayTessellation()
width = max_coord - min_coord
a= Point.(min_coord+rand(130)*width, min_coord+rand(130)*width)
push!(tess, a)
# In the next line, change 256 to 257 and the plots fail
triangles = [geta.(tess._trigs) getb.(tess._trigs) getc.(tess._trigs)][2:256,:]
n = size(triangles, 1)
D = DataFrame(x=vec(getx.(triangles)'), y=vec(gety.(triangles)'), id=string.(vec([1:n 1:n 1:n]')))
pa = plot(D, x=:x, y=:y, Geom.polygon(fill=true, preserve_order=true), color=:id, Theme(key_position=:none))
pb = plot(D, x=:x, y=:y, Geom.polygon(preserve_order=true), group=:id)
hstack(pa,pb) |
Good catch, this is due to BTW, it would make sense to check that the same storage type is used everywhere, to avoid recompiling lots of methods down the line. For example, this line uses |
@andreasnoack it'd be nice to tag 0.7 at some point, but this issue needs resolved first. it's not clear to me what the best solution is to the NA problem above, but at least the path is clear for the UInt8 problem. since it's related to your work on IndirectArrays, would you mind taking care of at least the latter? the only suggestion i'd make here is that perhaps UInt16 would suffice. not sure of the performance implications. |
Bump. @andreasnoack, it would be great to have this fixed! If there's anything we can do to help, let us know. |
Indeed. I think I know how to fix these issues but I need the time and, unfortunately, other tasks take priority right now. |
we should think about whether we want to get these dataframes related regressions fixed before or after we merge any breaking changes. ideally it'd be nice to have a 0.7.x which completely worked. on the otherhand, no sense delaying progress just because of semantic versioning. |
Using @Mattriks' code from above, I was able to verify that master fails to plot when using 257 in the code above ERROR: LoadError: InexactError()
Stacktrace:
[1] copy!(::IndexLinear, ::Array{UInt8,1}, ::IndexLinear, ::Array{Int64,1}) at ./abstractarray.jl:656
[2] discretize_make_ia(::Array{String,1}, ::Array{String,1}) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/misc.jl:397
[3] discretize(::Array{String,1}, ::Array{String,1}, ::Void, ::Bool) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/scale.jl:269
[4] discretize(::Array{String,1}, ::Array{String,1}) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/scale.jl:258
[5] apply_scale(::Gadfly.Scale.DiscreteColorScale, ::Array{Gadfly.Aesthetics,1}, ::Gadfly.Data, ::Vararg{Gadfly.Data,N} where N) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/scale.jl:477
[6] apply_scales(::IterTools.Distinct{Base.ValueIterator{Dict{Symbol,Gadfly.ScaleElement}},Gadfly.ScaleElement}, ::Array{Gadfly.Aesthetics,1}, ::Gadfly.Data, ::Vararg{Gadfly.Data,N} where N) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/scale.jl:34
[7] apply_scales(::IterTools.Distinct{Base.ValueIterator{Dict{Symbol,Gadfly.ScaleElement}},Gadfly.ScaleElement}, ::Gadfly.Data) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/scale.jl:53
[8] render_prepare(::Gadfly.Plot) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/Gadfly.jl:677
[9] render(::Gadfly.Plot) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/Gadfly.jl:757
[10] collect(::Base.Generator{Tuple{Gadfly.Plot,Gadfly.Plot},Gadfly.##110#111}) at ./array.jl:475
[11] hstack(::Gadfly.Plot, ::Gadfly.Plot) at /Users/tamasnagy/.julia/v0.6/Gadfly/src/Gadfly.jl:923
[12] include_from_node1(::String) at ./loading.jl:576
[13] include(::String) at ./sysimg.jl:14 and that @nalimilan's solution seems to fix that error. discretize_make_ia(values::AbstractVector, levels) =
- IndirectArray(Array{UInt8}(indexin(values, levels)), levels)
+ IndirectArray(Array{UInt}(indexin(values, levels)), levels) But there's still the issue of the tiling breaking at the bottom right of the plot. Not sure what is going on there. |
@andreasnoack julia 0.7 is just around the corner and it'd be nice to have this issue resolved before dropping support for 0.6. will you have time soon? i could try to take a stab, but you are much more familiar with the changes you've made to support DataFrames. thanks. |
I'll try to take a look during July but can't promise anything. |
@andreasnoack just a friendly bump. i'd like to start working on 0.7, but don't want to drop 0.6 support before this issue is resolved. |
would be nice to resolve this regression now that gadfly dependencies work on julia 0.7. anyone want to volunteer? :) |
As I noted above, it shouldn't be hard to fix at least some/most of the issues by using |
According to the docs, the
levels
argument ofScale.x-discrete
behaves such that "Order will be respected and anything in the data that's not represented in levels will be set to NA".Gadfly 0.6.5
:so it works as the docs suggest, but with obvious issues in the top plot, and maybe users would prefer for the NA column not to appear (or to appear only if NA is added to the levels vector)?
The text was updated successfully, but these errors were encountered: