python - how to merge pandas dataframe with different lengths -
i have pandas dataframe following..
df_fav_dish item_id buyer_id dish_count dish_name 121 261 2 null 126 261 3 null 131 261 7 null 132 261 6 null 133 261 2 null 135 261 2 null 139 309 2 null 140 261 2 null 142 261 2 null 143 153 3 null 145 64 2 null 148 261 2 null 155 261 2 null 156 64 2 null 163 261 2 null
length of above dataframe 34. , have dataframe following..
data item_id item_name 121 paneer 126 chicken 131 prawns 132 mutton 133 curd 139 mocktail 140 cocktail 142 biryani 143 thai curry 145 red curry 148 fish 155 lobster 69 fish curry 67 butter 31 bread 59 egg curry
length of above dataframe 322 .this data frame contains 300 item_id , corresponding item names want join 2 dataframes on item_id. 2 dataframes of different lengths. doing following in python.
df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id')
but gives me many rows. want add item_name
first data frame second dataframe both item_id
equal each other
desired output is
item_id buyer_id dish_count dish_name item_name 121 261 2 null paneer 126 261 3 null chicken 131 261 7 null prawns 132 261 6 null mutton 133 261 2 null curd 135 261 2 null 139 309 2 null mocktail 140 261 2 null cocktail 142 261 2 null biryani 143 153 3 null thai curry 145 64 2 null red curry 148 261 2 null fish 155 261 2 null lobster 156 64 2 null 163 261 2 null
your column item_id
in dataframe data
contains duplicity, so:
if no duplicity:
print data item_id item_name 0 121 paneer 1 140 chicken 2 131 prawns print df_fav_dish item_id buyer_id dish_count dish_name 0 139 309 2 null 1 140 261 2 null 2 142 261 2 null 3 143 153 3 null print df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id') item_id buyer_id dish_count dish_name item_name 0 139 309 2 null nan 1 140 261 2 null chicken 2 142 261 2 null nan 3 143 153 3 null nan
with duplicity duplicity rows joined:
print data item_id item_name 0 140 paneer 1 140 chicken 2 140 prawns print df_fav_dish item_id buyer_id dish_count dish_name 0 139 309 2 null 1 140 261 2 null 2 142 261 2 null 3 143 153 3 null print df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id') item_id buyer_id dish_count dish_name item_name 0 139 309 2 null nan 1 140 261 2 null paneer 2 140 261 2 null chicken 3 140 261 2 null prawns 4 142 261 2 null nan 5 143 153 3 null nan
so can drop_duplicates
:
# drop duplicates except first occurrence print df.drop_duplicates(subset='item_id', keep='first') item_id buyer_id dish_count dish_name item_name 0 139 309 2 null nan 1 140 261 2 null paneer 4 142 261 2 null nan 5 143 153 3 null nan # drop duplicates except last occurrence print df.drop_duplicates(subset='item_id', keep='last') item_id buyer_id dish_count dish_name item_name 0 139 309 2 null nan 3 140 261 2 null prawns 4 142 261 2 null nan 5 143 153 3 null nan # drop duplicates print df.drop_duplicates(subset='item_id', keep=false) item_id buyer_id dish_count dish_name item_name 0 139 309 2 null nan 4 142 261 2 null nan 5 143 153 3 null nan
Comments
Post a Comment