Skip to content

Faulty histogram computation? #1757

@daso94msg

Description

@daso94msg

Hello,

There are two things that I think are wrong in ydata_profiling/model/summary_algorithms.histogram_compute:

if len(bins) > hist_config.max_bins:
    bins = np.histogram_bin_edges(finite_values, bins=hist_config.max_bins)  
    weights = weights if weights and len(weights) == hist_config.max_bins else None
  1. I think it needs to be len(bins) > hist_config.max_bins +1 , as np.histogram_bin_edges includes the rightmost edge, so that its return value will always be number of bins +1.
  2. Why are the weights set to None if len(weights) != hist_config.max_bins? The shape of the weights should corespond to the values and will almost never be the same shape as the bins. I can't really think of a reasoning for this check at all.

@alexbarros: I see that you made that commit initially, could you elaborate?

Regards

David

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐛Something isn't workingcode quality 📈Improvements to the quality of the code base

    Type

    Projects

    Status

    Selected for next release

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions