New member
- Joined
- Feb 3, 2023
- Messages
- 5
- Thread Author
- #1
I am currently trying to read from a local MySQL database table on my computer. This table has about 2M rows of data. So when I call read_sql() or read_sql_query() I tried passing it chunksize = 1000, chunksize=10 and then chunksize=1, all of which in my debug session in PyCharm, it just disappears and hangs. I have let it run for some 10 minutes and then killed the execution.
I have no idea what is going on or how to troubleshoot this? Any suggestions? Here is my code:
` try: # Connect to local database database_uri = 'mysql+pymysql://root:1234@localhost:3306' localEngine = sqlalchemy.create_engine(database_uri)
Do you have any suggestions on what I can do trouble shoot this?
I have no idea what is going on or how to troubleshoot this? Any suggestions? Here is my code:
` try: # Connect to local database database_uri = 'mysql+pymysql://root:1234@localhost:3306' localEngine = sqlalchemy.create_engine(database_uri)
Code:
with localEngine.connect() as conn_local:
result = conn_local.execute(text("USE ConsumerExpenditures10;"))
result = conn_local.execute(text("""
CREATE TABLE IF NOT EXISTS LOADING_TABLE (
EXPENDITURE_ID varchar(11) PRIMARY KEY NOT NULL,
HOUSEHOLD_ID VARCHAR(10) NOT NULL,
YEAR YEAR NOT NULL,
MONTH INT(11) NOT NULL,
PRODUCT_CODE VARCHAR(155) NOT NULL,
COST DOUBLE NOT NULL,
GIFT INT NOT NULL,
IS_TRAINING INT(255) NOT NULL,
MARITAL VARCHAR(25),
SEX VARCHAR(25),
AGE INT,
WORK_STATUS VARCHAR(25),
INCOME_RANK double,
INCOME_RANK_1 double,
INCOME_RANK_2 double,
INCOME_RANK_3 double,
INCOME_RANK_4 double,
INCOME_RANK_5 double,
INCOME_RANK_MEAN double,
FEDERAL_FUNDS_TARGET_RATE double,
FEDERAL_FUNDS_UPPER_TARGET double,
FEDERAL_FUNDS_LOWER_TARGET double,
EFFECTIVE_FEDERAL_FUNDS_RATE double,
REAL_GDP double,
UNEMPLOYMENT_RATE double,
INFLATION_RATE double,
CPI double)
"""))
# TRANSFORMATION #9 - Create comprehensive table of GDP, CPI and Consumer Expenditures Data
# with each row being one Consumer Expenditure Purchase
query = text("""select e.expenditure_id, e.household_id, e.year, e.month, e.product_code,
e.cost, e.gift, e.is_training,
hm.marital, hm.sex, hm.age, hm.work_status, h.income_rank,
h.income_rank_1, h.income_rank_2, h.income_rank_3, h.income_rank_4,
h.income_rank_5, h.income_rank_mean, g.FEDERAL_FUNDS_TARGET_RATE,
g.FEDERAL_FUNDS_UPPER_TARGET, g.FEDERAL_FUNDS_LOWER_TARGET,
g.EFFECTIVE_FEDERAL_FUNDS_RATE,
g.REAL_GDP, g.UNEMPLOYMENT_RATE, g.INFLATION_RATE, c.CPI
from expenditures e
inner join household_members hm
on hm.household_id = e.HOUSEHOLD_ID
inner join households h
on h.household_id = hm.HOUSEHOLD_ID
inner join gdp g
on g.gdp_year = e.`YEAR`
inner join cpi c
on c.CPI_YEAR = g.gdp_year""")
# https://pythonspeed.com/articles/pandas-sql-chunking/
# https://stackoverflow.com/questions/69711599/pandas-read-sql-from-ms-sql-gets-stuck-for-queries-with-275-chars-in-linux
# Takes too long to execute this query: df_final_table = pd.read_sql(query, conn_local)
# So we have to do it in chunks to load into a pandas dataframe and then write that to the loading_table
for chunk_dataframe in pd.read_sql_query(query, conn_local, chunksize=10):
print(
f"Got dataframe w/{len(chunk_dataframe)} rows"
)
# write this dataframe chunk into the LOADING_TABLE
result = chunk_dataframe.to_sql(name='LOADING_TABLE', con=conn_local, if_exists='append', index=False)
conn_local.commit()`