Update: about the iloc and loc indexer, the interpretation of [rows, columns] or [rows] may seems better.

Main Types

DataFrame and Series

The first one is a loaded table with multiple columns and the other is just one column

Magic Operations and Basic Information

The basic operations are enlisted as

list(df)

Get a list of its column headers or labels. See .add_surfix and .add_prefix

df[‘salary’] -> Series

Take the whole column that is titled ‘salary’.

df[‘salary’]<= 6000 -> Series(…)

Note that the series is in(enumeration: bool) format

which is like this. Be noted the usage

>> df[df['salary'] > 12000].head()
0     False
1     False
2      True
3     False
4      True
...

df[ df[‘salary’] <= 6000 ]

Select those which satisfy the codition

df[‘salary after tax’] = df[‘salary’]*.8

See also .apply(func)

def tax(s):
    if s>=6000:
        return s*.7
    else:
        return s*.85
    
df["salary_after_tax"] = df["salary"].apply(tax)


    

Useful Methods

DataFrame/Series: .head(n=5) / .tail(n=5)

>> print((df['salary'] > 12000).head(2))
0    False
1    False
Name: salary, dtype: bool

Statistics

  • .median() “middle number”
  • .sum()

.loc[Condition(s), Column Title(s)]

.loc: location indexer

For multiple conds or column titles, use () or [] to wrap

df_low= df.loc[df["salary"]<6000,"salary"]
df.loc[df["salary"]<6000,"salary_after_tax"] = df_low*.85

df_low= df.loc[df["salary"]>=6000,["name", "surname", "salary"]
df.loc[df["salary"]>=6000,"salary_after_tax"] = df_low*.7

.iloc[Condition(s), Indices/Slices]

Except for the latter argument, it is same as the location indexer loc

.at / .iat

Very similar to .loc and faster, but only get ONE ITEM for the result.

Do not useix indexer. It is deprecated.

Leave a comment

Your email address will not be published. Required fields are marked *