Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

pandas

  1. T

    Python drop duplicated pairs only

    If I have a dataframe like this: Time X Y 2023-02-01T15:03:02.565333 200 10.1 2023-02-01T15:03:02.565333 200 10.1 2023-02-01T15:03:02.565333 200 10.1 2023-02-01T15:03:02.565333 200 10.1 2023-02-01T15:03:02.634508 200 10.1 2023-02-01T15:03:02.634508 200...
  2. M

    Column wise count in Pandas DF

    I have the below DF A B C D COLUMNS 1 ABC ABC 2,3 DEF ABC DEF 2,4 LSD LSD LSD 2,4 ERR ERR 2,3 ERR ABC LSD 2,4 Expected Output A-1 B-2 C-3 D-4 F-5 COLUMNS INDEX ABC DEF ERR LSD 1 ABC ABC 2,3 2 0 0 0 DEF ABC DEF DEF 2,4 1 2 0 0 LSD LSD LSD LSD 2,3 0 0 0 2...
  3. F

    How to optimize merging only on lines matching a condition?

    I want to left merge df_1 and df_2 on column a I can archive it easily with: df_3 = df_1.merge(df_2, on="a", how="left") However, I know I will never find a in df_2 when df_1.b == 0 So to optimze my code, I would like to merge df_1 with df_2 only when df_1.b != 0 How can I get df_3 more...
  4. N

    Groupby number of hours in a month in pandas

    Could someone please guide how to groupby no. of hours from hourly based index to find how many hours of null values are there in a specific month? Therefore, I am thinking of having a dataframe with monthly based index. Below given is the dataframe which has timestamp as index and another...
  5. R

    Calculate difference of rows in Pandas

    I have a timeseries dataframe where there are alerts for some particular rows. The dataframe looks like- machineID time vibration alerts 1 2023-02-15 220 1 11:45 1 2023-02-15 221 0 12:00 1 2023-02-15 219...
  6. R

    Display rows if one item is present in column of multiple values of dataframe

    Suppose I have dataframe like - uni_id country_name scholarship timeperiod type course_name 1 UK 1 4 UG Btech;Bcom;BArts 1 UK 0 2 PG Mba 2 UK 1 3 UG Bcom;xyx Now how do...
  7. P

    How to check if occurrences of identical consecutive numbers is below a threshold in pandas series

    I need to check if the occurrences of identical consecutive numbers is below a certain threshold, e.g. maximal two same consecutive numbers. pd.Series(data=[-1, -1, 2, -2, 2, -2, 1, 1]) # True pd.Series(data=[-1, -1, -1, 2, 2, -2, 1, 1]) # False Further checks: Only the numbers +1 and -1...
  8. M

    Finding objects in dataset

    I am trying to understand how to find object columns in dataset and I faced with a little problem s = (X_train.dtypes == 'object') object_cols = list(s.index) print("Categorical variables:") print(object_cols) what s[s].index does here?
  9. D

    How to get dictionary of df indices that links the same ids on different days?

    I've following toy-dataframe: | id| date -------------- 0 | a | d1 1 | b | d1 2 | a | d2 3 | c | d2 4 | b | d3 5 | a | d3 import pandas as pd df = pd.DataFrame({'id': ['a', 'b', 'a', 'c', 'b', 'a'], 'date': ['d1', 'd1', 'd2', 'd2', 'd3', 'd3']}) I want to obtaining 'linking...
  10. T

    How to avoid bfill or ffill when calculating pct_change with NaNs

    For a df like below, I use pct_change() to calculate the rolling percentage changes: price = [np.NaN, 10, 13, np.NaN, np.NaN, 9] df = pd. DataFrame(price, columns = ['price']) df Out[75]: price 0 NaN 1 10.0 2 13.0 3 NaN 4 NaN 5 9.0 But I get these unexpected results...
  11. M

    Improve performance of 8million iterations over a dataframe and query it

    There is a for loop of 8 million iterations, which takes 2 sample values from a column of a 1 million records dataframe (say df_original_nodes) and then query that 2 samples in another dataframe say (df_original_rel) and if sample does not exist then add that samples as a new row into the...
  12. C

    Using regex in contains() to select rows from a pandas data frame having some string value (Capital or small)

    I want to extract rows from a pandas data frame based on the values of a column using regex in contains() method. I am using the following code line to extract rows from a data frame if the 'COMPTYPE' column has any string value mentioned in contains() method df =...
  13. S

    How to speed up custom function

    How to speed up my custom function? I have three list of numbers : list1 list2 list3 And Pandas Dataframe like this: id inum DESC_1 recs id1 inum1 1 recs1 id2 inum2 2 recs2 id3 inum3 3 recs3 And my custom function: def keep_inum(row): if len(row) != 0: if...
  14. M

    How to pivot dataframe into ML format

    My head is spinning trying to figure out if I have to use pivot_table, melt, or some other function. I have a DF that looks like this: month day week_day classname_en origin destination 0 1 7 2 1 2 5 1 1 2 6 2...
  15. S

    Splitting a dataframe on a specific string value in pandas? [duplicate]

    I have a dataframe which i have to split as soon as a specific string value in a column occurs. Ex. df = txn_details amt 0 opening_balance 13000 1 opening_balance 15000 2 upi2873 12879 3 upi182y31 12301 4 opening_balance 85050 5 upi79279831 8400 The desired output(3...
  16. K

    Substitute column values of a dataframe with the corresponding items in an array

    I have a column in a dataframe which contains an array of numbers from 1 to 5 and I have an array containing five words. I would like to find the simplest, most compact and most elegant way in Python to "in place" replace the numbers in the column with the corresponding words. For example...
Top