python - how to merge pandas dataframe with different lengths -


i have pandas dataframe following..

df_fav_dish   item_id buyer_id  dish_count dish_name   121      261           2      null   126      261           3      null   131      261           7      null   132      261           6      null   133      261           2      null   135      261           2      null   139      309           2      null   140      261           2      null   142      261           2      null   143      153           3      null   145       64           2      null   148      261           2      null   155      261           2      null   156       64           2      null   163      261           2      null 

length of above dataframe 34. , have dataframe following..

 data   item_id      item_name      121        paneer       126       chicken       131        prawns      132        mutton      133        curd      139       mocktail      140       cocktail      142       biryani        143      thai curry      145      red curry      148        fish      155      lobster       69        fish curry         67       butter       31       bread             59      egg curry 

length of above dataframe 322 .this data frame contains 300 item_id , corresponding item names want join 2 dataframes on item_id. 2 dataframes of different lengths. doing following in python.

df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id') 

but gives me many rows. want add item_name first data frame second dataframe both item_id equal each other

desired output is

item_id buyer_id  dish_count dish_name  item_name   121      261           2      null     paneer   126      261           3      null     chicken   131      261           7      null     prawns   132      261           6      null     mutton   133      261           2      null     curd   135      261           2      null        139      309           2      null     mocktail   140      261           2      null     cocktail   142      261           2      null     biryani   143      153           3      null     thai curry   145       64           2      null     red curry   148      261           2      null     fish   155      261           2      null     lobster   156       64           2      null        163      261           2      null      

your column item_id in dataframe data contains duplicity, so:

if no duplicity:

print data    item_id item_name 0      121    paneer 1      140   chicken 2      131    prawns  print df_fav_dish     item_id  buyer_id  dish_count dish_name 0      139       309           2      null 1      140       261           2      null 2      142       261           2      null 3      143       153           3      null  print df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id')    item_id  buyer_id  dish_count dish_name item_name 0      139       309           2      null       nan 1      140       261           2      null   chicken 2      142       261           2      null       nan 3      143       153           3      null       nan 

with duplicity duplicity rows joined:

print data    item_id item_name 0      140    paneer 1      140   chicken 2      140    prawns  print df_fav_dish     item_id  buyer_id  dish_count dish_name 0      139       309           2      null 1      140       261           2      null 2      142       261           2      null 3      143       153           3      null  print df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id')    item_id  buyer_id  dish_count dish_name item_name 0      139       309           2      null       nan 1      140       261           2      null    paneer 2      140       261           2      null   chicken 3      140       261           2      null    prawns 4      142       261           2      null       nan 5      143       153           3      null       nan 

so can drop_duplicates:

# drop duplicates except first occurrence print df.drop_duplicates(subset='item_id', keep='first')    item_id  buyer_id  dish_count dish_name item_name 0      139       309           2      null       nan 1      140       261           2      null    paneer 4      142       261           2      null       nan 5      143       153           3      null       nan  # drop duplicates except last occurrence print df.drop_duplicates(subset='item_id', keep='last')    item_id  buyer_id  dish_count dish_name item_name 0      139       309           2      null       nan 3      140       261           2      null    prawns 4      142       261           2      null       nan 5      143       153           3      null       nan  # drop duplicates print df.drop_duplicates(subset='item_id', keep=false)    item_id  buyer_id  dish_count dish_name item_name 0      139       309           2      null       nan 4      142       261           2      null       nan 5      143       153           3      null       nan 

Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -