Tuesday, October 18, 2022

pandas' SettingWithCopyWarning: did I get it right?

 I am just beginning to learn pandas and am looking to provide some automated help. From what I read, it appears that SettingWithCopyWarning is something that confuse many people. Is the following correct?

In [2]:
df = pd.DataFrame([[10, 20, 30], [40, 50., 60]],
                  index=list("ab"),
                  columns=list("xyz"))
In [3]:
df.loc["b"]["x"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [4]:
# What is SettingWithCopyWarning ?
what()
Pandas occasionally emits a SettingWithCopyWarning when you use       
'chained indexing', either directly or indirectly,and you then attempt
to assign a value to the result. By 'direct chained indexing', we mean
that your code contains something like:                               

...[index_1][index_2] = ...                                           

During the first extraction using [index_1], pandas found that the    
series to be created contained values of different types. It          
automatically created a new series converting all values to a common  
type. The second indexing, [index_2] was then done a this copy instead
of the original dataframe. Thus, the assigment was not done on the    
original dataframe, which caused Pandas to emit this warning.         

An 'indirect chained indexing' essentially amount to the same problem 
except that the second indexing is not done on the same line as that  
which was done to extract the first series.                           
In [5]:
# Can I get more specific information for what I just did?
why()
You used direct chained indexing of a dataframe which made a copy of  
the original content of the dataframe. If you try to assign a value to
that copy, the original dataframe will not be modified. Instead of    
doing a direct chained indexing                                       

df.loc["b"]["x"] ...                                                  

try:                                                                  

df.loc["b", "x"] ...                                                  
In [6]:
# What about if I tried to use indirect chaining. 
# There are two possibilities
series = df.loc["b"]
series["x"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [7]:
where()
Warning issued on line 4 of code block [6].                                                         

       1| # What about if I tried to use indirect chaining.  
       2| # There are two possibilities
       3| series = df.loc["b"]
     > 4| series["x"] = 99
In [8]:
why()
I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:              

series = df.loc[...]                                                  

This made a copy of the data contained in the dataframe. Next, you    
indexed that copy                                                     

series["x"]                                                           

This had no effect on the original dataframe. If your goal is to      
modify the value of the original dataframe, try something like the    
following instead:                                                    

df.loc[..., "x"]                                                      
In [9]:
# What if I do things in a different order
series_1 = df["x"]
series_1.loc["b"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [10]:
where()
Warning issued on line 3 of code block [9].                                                         

       1| # What if I do things in a different order
       2| series_1 = df["x"]
     > 3| series_1.loc["b"] = 99
In [11]:
why()
I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:              

series_1 = df[...]                                                    

This made a copy of the data contained in the dataframe. Next, you    
indexed that copy                                                     

series_1.loc["b"]                                                     

This had no effect on the original dataframe. If your goal is to      
modify the value of the original dataframe, try something like the    
following instead:                                                    

df.loc[..., "b"]                                                      
In [12]:
# What if I had multiples data frames?
df2 = df.copy()
series = df.loc["b"]
series["x"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [13]:
where()
Warning issued on line 4 of code block [12].                                                        

       2| df2 = df.copy()
       3| series = df.loc["b"]
     > 4| series["x"] = 99
In [14]:
why()
In your code, you have the following dataframes: {'df2', 'df'}. I do  
not know which one is causing the problem here; I will use the name   
df2 as an example.                                                    

I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:              

series = df2.loc[...]                                                 

This made a copy of the data contained in the dataframe. Next, you    
indexed that copy                                                     

series["x"]                                                           

This had no effect on the original dataframe. If your goal is to      
modify the value of the original dataframe, try something like the    
following instead:                                                    

df2.loc[..., "x"]                                                     

No comments: