Building a Clean Macroeconomic Panel from the IMF World Economic Outlook Database

One of the most valuable datasets for macroeconomic analysis is the IMF World Economic Outlook. In this post, we’ll explore some code solutions to prepare and clean the WEO dataset, transforming it into a panel-ready structure that’s perfect for econometric studies or policy analysis. Stay tuned as we delve into the steps needed to get your data ready for rigorous analysis.

First, the WEO dataset can be accessed via the following link. Download it and import it into Python.

df = pd.read_excel('WEOApr2025all.xlsx')
df=df[df['ISO'].notna()] 
df.rename(columns={'ISO':'ISO3'},inplace=True)
df.head()

df = pd.read_excel('WEOApr2025all.xlsx')
df=df[df['ISO'].notna()] 
df.rename(columns={'ISO':'ISO3'},inplace=True)
df.head()

WEO Dictionary

This block constructs a concise reference table mapping WEO subject codes to their descriptive metadata which are subject descriptor, source notes, units, and scale. It removes duplicates to ensure each subject code is uniquely represented. Although saving to Excel is commented out, this dictionary is useful for documentation or later merging variable definitions.

# Dictionary
dict_weo=df[[ 'WEO Subject Code','Subject Descriptor','Subject Notes','Units','Scale']]
dict_weo.drop_duplicates(inplace=True)
#dict_weo.to_excel('weo_dictionary.xlsx',index=False)

# Dictionary
dict_weo=df[[ 'WEO Subject Code','Subject Descriptor','Subject Notes','Units','Scale']]
dict_weo.drop_duplicates(inplace=True)
#dict_weo.to_excel('weo_dictionary.xlsx',index=False)

Dropping Redundant Columns

In this process, extraneous descriptive columns have been eliminated from the primary dataset. Although these fields provide informative insights, they are not requisite for quantitative analysis and merely contribute to an increase in file size. Subsequently, the data is restructured using the melt() and pivot() functions, rendering it suitable for panel analysis.

df.drop(columns=['Subject Descriptor','Subject Notes','Units','Scale','Country/Series-specific Notes','Estimates Start After'],inplace=True)
panel_df = pd.melt(df, id_vars=['WEO Country Code','ISO3', 'Country', 'WEO Subject Code'], 
                   var_name='Year', value_name='Value')

panel_df['Year'] = panel_df['Year'].astype(int)
panel_df['Value'] = pd.to_numeric(panel_df['Value'], errors='coerce')
pivot_df = panel_df.pivot(index=['WEO Country Code', 'ISO3', 'Country', 'Year'], 
                          columns='WEO Subject Code', 
                          values='Value').reset_index()

pivot_df = pivot_df[['WEO Country Code', 'ISO3', 'Country', 'Year',
                     'NGDP_R', 'NGDP_RPCH', 'NGDP', 'NGDPD', 'PPPGDP', 'NGDP_D',
                    'NGDPRPC', 'NGDPRPPPPC', 'NGDPPC', 'NGDPDPC', 'PPPPC',
                    'NGAP_NPGDP', 'PPPSH', 'PPPEX', 'NID_NGDP', 'NGSD_NGDP', 'PCPI',
                    'PCPIPCH', 'PCPIE', 'PCPIEPCH', 'TM_RPCH', 'TMG_RPCH', 'TX_RPCH',
                    'TXG_RPCH', 'LUR', 'LE', 'LP', 'GGR', 'GGR_NGDP', 'GGX',
                    'GGX_NGDP', 'GGXCNL', 'GGXCNL_NGDP', 'GGSB', 'GGSB_NPGDP',
                    'GGXONLB', 'GGXONLB_NGDP', 'GGXWDN', 'GGXWDN_NGDP', 'GGXWDG',
                    'GGXWDG_NGDP', 'NGDP_FY', 'BCA', 'BCA_NGDPD']] 
pivot_df.sort_values(by=['ISO3', 'Year'], inplace=True)
pivot_df.reset_index(drop=True, inplace=True)
pivot_df.columns.name = None
pivot_df

df.drop(columns=['Subject Descriptor','Subject Notes','Units','Scale','Country/Series-specific Notes','Estimates Start After'],inplace=True)
panel_df = pd.melt(df, id_vars=['WEO Country Code','ISO3', 'Country', 'WEO Subject Code'], 
                   var_name='Year', value_name='Value')

panel_df['Year'] = panel_df['Year'].astype(int)
panel_df['Value'] = pd.to_numeric(panel_df['Value'], errors='coerce')
pivot_df = panel_df.pivot(index=['WEO Country Code', 'ISO3', 'Country', 'Year'], 
                          columns='WEO Subject Code', 
                          values='Value').reset_index()

pivot_df = pivot_df[['WEO Country Code', 'ISO3', 'Country', 'Year',
                     'NGDP_R', 'NGDP_RPCH', 'NGDP', 'NGDPD', 'PPPGDP', 'NGDP_D',
                    'NGDPRPC', 'NGDPRPPPPC', 'NGDPPC', 'NGDPDPC', 'PPPPC',
                    'NGAP_NPGDP', 'PPPSH', 'PPPEX', 'NID_NGDP', 'NGSD_NGDP', 'PCPI',
                    'PCPIPCH', 'PCPIE', 'PCPIEPCH', 'TM_RPCH', 'TMG_RPCH', 'TX_RPCH',
                    'TXG_RPCH', 'LUR', 'LE', 'LP', 'GGR', 'GGR_NGDP', 'GGX',
                    'GGX_NGDP', 'GGXCNL', 'GGXCNL_NGDP', 'GGSB', 'GGSB_NPGDP',
                    'GGXONLB', 'GGXONLB_NGDP', 'GGXWDN', 'GGXWDN_NGDP', 'GGXWDG',
                    'GGXWDG_NGDP', 'NGDP_FY', 'BCA', 'BCA_NGDPD']] 
pivot_df.sort_values(by=['ISO3', 'Year'], inplace=True)
pivot_df.reset_index(drop=True, inplace=True)
pivot_df.columns.name = None
pivot_df

Creating New Macroeconomic Variables

The final section enriches the dataset by computing key analytical indicators used in macroeconomic research. I added variables that are used a lot in macroeconomic analysis such as GDP growth rate, economic crisis dummy, GDP per capita, and inflation rate.

pivot_df['GDP_N_GR'] = ((pivot_df['NGDP'] / pivot_df['NGDP'].shift(1) - 1)).where(pivot_df['NGDP'].shift(1).notna())
pivot_df['GDP_R_GR'] = ((pivot_df['NGDP_R'] / pivot_df['NGDP_R'].shift(1) - 1)).where(pivot_df['NGDP_R'].shift(1).notna())

pivot_df['GDP_CRISIS'] = pivot_df['GDP_R_GR'] < 0
pd.set_option('future.no_silent_downcasting', True)
pivot_df['GDP_CRISIS'] = pivot_df['GDP_CRISIS'].replace({True: 1, False: 0})
pivot_df['GDP_R_PC'] = (pivot_df['NGDP_R'] * 1e9) / (pivot_df['LP'] * 1e6)
pivot_df['GDP_R_PC_GR'] = ((pivot_df['GDP_R_PC'] / pivot_df['GDP_R_PC'].shift(1) - 1)).where(pivot_df['GDP_R_PC'].shift(1).notna())
pivot_df['GDP_PPP_PC'] = pivot_df['PPPGDP'] * 1e9 / (pivot_df['LP'] * 1e6)
pivot_df['INFLATION'] = ((pivot_df['PCPI'] / pivot_df['PCPI'].shift(1) - 1)).where(pivot_df['PCPI'].shift(1).notna())
pivot_df['INFLATION_END'] = ((pivot_df['PCPIE'] / pivot_df['PCPIE'].shift(1) - 1)).where(pivot_df['PCPIE'].shift(1).notna())

pivot_df.head()

pivot_df['GDP_N_GR'] = ((pivot_df['NGDP'] / pivot_df['NGDP'].shift(1) - 1)).where(pivot_df['NGDP'].shift(1).notna())
pivot_df['GDP_R_GR'] = ((pivot_df['NGDP_R'] / pivot_df['NGDP_R'].shift(1) - 1)).where(pivot_df['NGDP_R'].shift(1).notna())

pivot_df['GDP_CRISIS'] = pivot_df['GDP_R_GR'] < 0
pd.set_option('future.no_silent_downcasting', True)
pivot_df['GDP_CRISIS'] = pivot_df['GDP_CRISIS'].replace({True: 1, False: 0})
pivot_df['GDP_R_PC'] = (pivot_df['NGDP_R'] * 1e9) / (pivot_df['LP'] * 1e6)
pivot_df['GDP_R_PC_GR'] = ((pivot_df['GDP_R_PC'] / pivot_df['GDP_R_PC'].shift(1) - 1)).where(pivot_df['GDP_R_PC'].shift(1).notna())
pivot_df['GDP_PPP_PC'] = pivot_df['PPPGDP'] * 1e9 / (pivot_df['LP'] * 1e6)
pivot_df['INFLATION'] = ((pivot_df['PCPI'] / pivot_df['PCPI'].shift(1) - 1)).where(pivot_df['PCPI'].shift(1).notna())
pivot_df['INFLATION_END'] = ((pivot_df['PCPIE'] / pivot_df['PCPIE'].shift(1) - 1)).where(pivot_df['PCPIE'].shift(1).notna())

pivot_df.head()

A fully automated pipeline like this not only improves efficiency but also reinforces research transparency and reproducibility. By embedding clear data definitions, consistent transformations, and computed indicators in code rather than manual spreadsheets, it ensures that every analytical step can be replicated, audited, and extended. This approach transforms the IMF WEO dataset from a static source into a living analytical foundation that can be continuously updated and reused for future macroeconomic and fiscal studies.

Data Policy Analyst

recent posts