Skip to content

[BUG] RandomUnderSampler throws errors if pandas DataFrame has timestamps #970

Closed
@tomateit

Description

@tomateit

Describe the bug

RandomUnderSampler performs checks on X argument, which are unnecessary, as they do not affect the choice of resampled indices.
This is an issue if I pass pandas DataFrame.
The exception is not risen if I pass a numpy object with timestamps.

Steps/Code to Reproduce

from datetime import datetime
import pandas as pd

df = pd.DataFrame({"label": [0,0,0,1], "td": [datetime.now()]*4})
rus = imblearn.under_sampling.RandomUnderSampler(random_state=2342374)
rus.fit_resample(df, df.label)

Expected Results

No error is thrown.

Actual Results

TypeError: The DType <class 'numpy.dtype[int64]'> could not be promoted by <class 'numpy.dtype[datetime64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[int64]'>, <class 'numpy.dtype[datetime64]'>)

Versions

Linux-5.15.0-60-generic-x86_64-with-glibc2.35
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
NumPy 1.24.1
SciPy 1.9.3
Scikit-Learn 1.2.1
Imbalanced-Learn 0.10.0

My current workaround

from datetime import datetime
import pandas as pd

df = pd.DataFrame({"label": [0,0,0,1], "td": [datetime.now()]*4})

rus = imblearn.under_sampling.RandomUnderSampler(random_state=2342374)

downsabpled_df, _ = rus.fit_resample(df.to_numpy(), df.label)
downsabpled_df = pd.DataFrame(downsabpled_df, columns=df.columns)

P.S. Huge thanks for this useful library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions