ValueError on np.random.permutation when running .py file from command line but not in juypter notebook

Question

I am running the following code

import pandas as pd
import numpy as np 
import tensorflow as tf

california_housing_dataframe = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv", sep=",")

california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe))
california_housing_dataframe["median_house_value"] /= 1000.0
print(california_housing_dataframe.describe())
print(california_housing_dataframe)

This causes a ValueError:

"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"

However, the same code runs in jupyter notebook (just remove the print and call the dataframe directly).

I can see that the problem is due to the "np.random.permutation" line. If I print the dataframe without doing it, it prints fine. But why is there no issue in running this in jupyter notebook? And, how do I resolve this so that I can run the .py program from the command line?

Grzegorz Skibinski · Accepted Answer · 2019-09-08T16:25:40.130

1

Replace:

california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe))

With:

california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe.index))

(set index as permutated indexes of dataframe, not the whole permutated dataframe)

edited Sep 08 '19 at 16:25

answered Sep 08 '19 at 16:11

Grzegorz Skibinski

12,624
2
11
34

That seems to have fixed it! Thanks!! But what is the difference between the two? – Alhpa Delta Sep 08 '19 at 16:32
1

Because pandas index is set of unique identifiers, on default integers, starting at 0, usually of dimension 1xN, where N number of rows in your dataframe. So by trying to push multidimensional array there (which is the whole dataframe), you made it go a bit crazy. You theoretically can do multidimensional index, but it's not that trivial (e.g. https://stackoverflow.com/questions/28962113/how-to-get-away-with-a-multidimensional-index-in-pandas). Also - if you make some of the indexes repeated (so it's not unique) - you will loose some of the elements of the dataframe. – Grzegorz Skibinski Sep 08 '19 at 16:50

ValueError on np.random.permutation when running .py file from command line but not in juypter notebook

1 Answers1