Overview

This is the second blog post of my Finaical Analysis in Python series.

This blog series covers topics like the following:

how to visualize the long-term investment plan
backtest dollar-cost averaging strategy for long-term investment
backtest value averaging strategy for long-term investment
compare different investment strategies for short-term investment

The jupyter notebooks can be downloaded here.

Disclaimer

Investing money into anything is often involved with risk. Please do your own research before investing and be responsible for your own investment decisions.

I am just learning investment on my own and want to share some codes that I have written that might be useful for others.

The content here is only for informational purpose (instead of taking them as professional investment advice).

Introduction

In the last notebook, we already talked about how to visualize the return of a long term investment plan.

We made an assumption that the annual return rate is about 5%. How can we ensure that we can invest our savings properly so that we can have a 5% annual return?

In this notebook, we are going to use the historical data of SP500 to do some backtesting.

We will calculate the overall rate of return and the internal rate of return for two dollor cost averaging methods.

# Import libraries
import datetime
import warnings

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from pandas_datareader import data as pdr

plt.style.use('bmh')
warnings.filterwarnings("ignore")

A peak into the history

The first step here is to download data. I used a method introducted in a datacamp tutorial.

The total return index of SP500 between 1988 and 2020 is used here.

# Get historical data
startdate = datetime.datetime(1988, 1, 1)
enddate = datetime.datetime(2021, 1, 1)

# SP 500 Total return index
SP = pdr.get_data_yahoo('^SP500TR', start=startdate, end=enddate)

# If you want to use the price index of SP500, you can use the ticker of '^GSPC'
# SP = pdr.get_data_yahoo('^GSPC', start=startdate, end=enddate)

SP

Date	High	Low	Open	Close	Volume	Adj Close
1988-01-04	256.019989	256.019989	256.019989	256.019989	0	256.019989
1988-01-05	258.769989	258.769989	258.769989	258.769989	0	258.769989
1988-01-06	259.029999	259.029999	259.029999	259.029999	0	259.029999
1988-01-07	261.209991	261.209991	261.209991	261.209991	0	261.209991
1988-01-08	243.550003	243.550003	243.550003	243.550003	0	243.550003
...	...	...	...	...	...	...
2020-08-17	6954.669922	6937.540039	6940.890137	6943.220215	0	6943.220215
2020-08-18	6971.020020	6920.080078	6954.689941	6960.299805	0	6960.299805
2020-08-19	6981.410156	6920.370117	6967.069824	6930.799805	0	6930.799805
2020-08-20	6963.520020	6889.850098	6901.520020	6952.919922	0	6952.919922
2020-08-21	6982.970215	6941.000000	6954.390137	6977.270020	0	6977.270020

# Generate group by keys
SP['date'] = SP.index
SP['year'] = SP['date'].apply(lambda d: f'{d.year}')

# Group by year, take the adjusted close price of the first business day in each year
df = SP[['year', 'Adj Close']].sort_index().groupby('year').first()

# Generate annual return
df['return'] = df['Adj Close'].pct_change()

# Show annual return by the beginning of next year
df['return_next_y'] = df['return'].shift(-1)

df = df.drop(columns='return')
df = df.reset_index()

df

	year	Adj Close	return_next_y
0	1988	256.019989	0.115733
1	1989	285.649994	0.351864
2	1990	386.160004	-0.058836
3	1991	363.440002	0.320273
4	1992	479.839996	0.075358
5	1993	516.000000	0.099031
6	1994	567.099976	0.014848
7	1995	575.520020	0.386954
8	1996	798.219971	0.214014
9	1997	969.049988	0.346721
10	1998	1305.040039	0.278520
11	1999	1668.520020	0.199932
12	2000	2002.109985	-0.108011
13	2001	1785.859985	-0.088109
14	2002	1628.510010	-0.199778
15	2003	1303.170044	0.241626
16	2004	1618.050049	0.103155
17	2005	1784.959961	0.075072
18	2006	1918.959961	0.138075
19	2007	2183.919922	0.040977
20	2008	2273.409912	-0.340563
21	2009	1499.170044	0.245396
22	2010	1867.060059	0.145277
23	2011	2138.300049	0.025300
24	2012	2192.399902	0.171570
25	2013	2568.550049	0.279590
26	2014	3286.689941	0.146649
27	2015	3768.679932	-0.001244
28	2016	3763.989990	0.146411
29	2017	4315.080078	0.218119
30	2018	5256.279785	-0.050553
31	2019	4990.560059	0.324358
32	2020	6609.290039	NaN

# Describe the annual return
df['return_next_y'].describe()

Output:

count    32.000000
mean      0.120554
std       0.169035
min      -0.340563
25%       0.022687
50%       0.141676
75%       0.242569
max       0.386954
Name: return_next_y, dtype: float64

When checking the annual return rate, the minimum value is -34.1% and maximum value is 38.7%. So we should not be panic if we see a drop of 30-40% in the total value in a certain year. Try to buy more in the dip, and the bull market will follow in 1-3 years.

Define functions

def calc_dollar_cost_averaging_return(df_dc: pd.DataFrame, value_per_period: float = 1.0, 
                                      increase_investment_per_period: float = 0.0,
                                      display_returns: bool = True) -> pd.DataFrame:
    """
    This function calculates the overall rate of return and the internal rate of return of dollar cost averaging method. 
    :param df_dc: Original dataframe that contains the price at the beginning of each period
    :param value_per_period: Investment value per period
    :param increase_investment_per_period: Increase the investment by x each period
    :param display_returns: Whether to display return results or not
    :return: A dataframe that contains all the relevant info
    """
    # amount to invest each period
    df_dc.loc[0, 'should_invest'] = value_per_period   
    # total amount of investment after the investment this period
    df_dc.loc[0, 'total_invest'] = value_per_period   
    # the total value by the beginning of next period
    df_dc.loc[0, 'total_value_next'] = value_per_period*(1 + df_dc.loc[0, 'return_next_y'])
    df_dc.loc[0, 'overall_return_next'] = df_dc.loc[0, 'total_value_next'] / df_dc.loc[0, 'total_invest'] - 1

    for i in range(1,len(df_dc.index)):
        df_dc.loc[i, 'should_invest'] = value_per_period * (1 + increase_investment_per_period)**i
        df_dc.loc[i, 'total_invest'] = df_dc.loc[i-1, 'total_invest'] + df_dc.loc[i, 'should_invest']
        df_dc.loc[i, 'total_value_next'] = ((df_dc.loc[i-1, 'total_value_next'] + 
                                            df_dc.loc[i, 'should_invest']) *
                                            (1 + df_dc.loc[i, 'return_next_y']) 
                                           )
        df_dc.loc[i, 'overall_return_next'] = df_dc.loc[i, 'total_value_next'] / df_dc.loc[i, 'total_invest'] - 1
        
    if display_returns:
        
        # Calculate overall return
        overall_return = round(df_dc.loc[len(df_dc.index) - 2, 'overall_return_next']*100, 2)
        print(f"The overall return rate is {overall_return}%")

        # Calculate IRR
        cf = (-df_dc['should_invest']).tolist()
        cf[-1] = df_dc.loc[len(df_dc.index) - 2, 'total_value_next']
        print("The cash flow (the negative sign stands for investment):")
        display(cf)
        irr = round(100*np.irr(cf), 2)
        print(f"The internal return rate (IRR) is {irr}% per period")
        
        # Calculate CAGR
        cagr = round(100*((df_dc.loc[len(df_dc.index) - 2, 'total_value_next']/
                           df_dc.loc[len(df_dc.index) - 2, 'total_invest'])**(1/(len(df_dc.index) - 1)) - 1), 2)
        
        print(f"The compound annual growth rate (CAGR) is {cagr}% per period")
        
    return df_dc


def plot_changes(df: pd.DataFrame,
                 y1: str = 'total_invest', y2: str = 'total_value_next',
                 xlabel: str = 'period', ylabel: str = 'value',
                 title: str = 'Visualize total investment and value each year',
                 target: int = None) -> None:
    """
    This function visualizes the total investment and the total value of the investment plan across time
    :param df: A dataframe contains total investment and total value
    :param y1: Column name
    :param y2: Column name
    :param xlabel: X label
    :param ylabel: Y label
    :param title: Title of the graph
    :return: A plot
    """
    plt.figure(figsize=(15, 5))
    plt.plot(df[y1], label=y1)
    plt.plot(df[y2], label=y2)
    if target:
        plt.axhline(y=target, c='black', label='target value')
    plt.xlim([0, len(df[y1])-1])
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    plt.legend()
    plt.show()

Plan 1: Yearly dollar cost averaging buying method

Imagine we were able to invest 1200 dollar per year into a SP500 index based ETF starting from 1988.

What’s the overall return after 32 years?

What’s the average annual return?

df1 = calc_dollar_cost_averaging_return(df, value_per_period=1200)

Output:

The overall return rate is 580.19%
The cash flow (the negative sign stands for investment):

[-1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 -1200.0,
 261192.11446741945]

The internal return rate (IRR) is 9.92% per period
The compound annual growth rate (CAGR) is 6.17% per period

df1:

	year	Adj Close	return_next_y	should_invest	total_invest	total_value_next	overall_return_next
0	1988	256.019989	0.115733	1200.0	1200.0	1338.879804	0.115733
1	1989	285.649994	0.351864	1200.0	2400.0	3432.220743	0.430092
2	1990	386.160004	-0.058836	1200.0	3600.0	4359.680708	0.211022
3	1991	363.440002	0.320273	1200.0	4800.0	7340.295930	0.529228
4	1992	479.839996	0.075358	1200.0	6000.0	9183.879488	0.530647
5	1993	516.000000	0.099031	1200.0	7200.0	11412.205047	0.585028
6	1994	567.099976	0.014848	1200.0	8400.0	12799.465363	0.523746
7	1995	575.520020	0.386954	1200.0	9600.0	19416.618801	1.022564
8	1996	798.219971	0.214014	1200.0	10800.0	25028.857872	1.317487
9	1997	969.049988	0.346721	1200.0	12000.0	35322.955610	1.943580
10	1998	1305.040039	0.278520	1200.0	13200.0	46695.335609	2.537525
11	1999	1668.520020	0.199932	1200.0	14400.0	57471.129235	2.991051
12	2000	2002.109985	-0.108011	1200.0	15600.0	52333.999013	2.354744
13	2001	1785.859985	-0.088109	1200.0	16800.0	48817.182741	1.905785
14	2002	1628.510010	-0.199778	1200.0	18000.0	40024.865577	1.223604
15	2003	1303.170044	0.241626	1200.0	19200.0	51185.872534	1.665931
16	2004	1618.050049	0.103155	1200.0	20400.0	57789.735898	1.832830
17	2005	1784.959961	0.075072	1200.0	21600.0	63418.196358	1.936028
18	2006	1918.959961	0.138075	1200.0	22800.0	73540.339150	2.225453
19	2007	2183.919922	0.040977	1200.0	24000.0	77802.957039	2.241790
20	2008	2273.409912	-0.340563	1200.0	25200.0	52097.453233	1.067359
21	2009	1499.170044	0.245396	1200.0	26400.0	66376.423781	1.514258
22	2010	1867.060059	0.145277	1200.0	27600.0	77393.691544	1.804119
23	2011	2138.300049	0.025300	1200.0	28800.0	80582.143634	1.797991
24	2012	2192.399902	0.171570	1200.0	30000.0	95813.509570	2.193784
25	2013	2568.550049	0.279590	1200.0	31200.0	124137.478352	2.978765
26	2014	3286.689941	0.146649	1200.0	32400.0	143718.101728	3.435744
27	2015	3768.679932	-0.001244	1200.0	33600.0	144737.758102	3.307671
28	2016	3763.989990	0.146411	1200.0	34800.0	167304.672506	3.807606
29	2017	4315.080078	0.218119	1200.0	36000.0	205258.694569	4.701630
30	2018	5256.279785	-0.050553	1200.0	37200.0	196021.626888	4.269399
31	2019	4990.560059	0.324358	1200.0	38400.0	261192.114467	5.801878
32	2020	6609.290039	NaN	1200.0	39600.0	NaN	NaN

plot_changes(df1, title='Plan 1', xlabel='year')

png

Plan 2: Increase the investment each year by 2%

In Plan 2, let’s assume the initial_investment is 1200, and then we increase the investment each year by 2%.

# Assuming initial_investment is 1200, and then increase investment each year by 2%
df2 = calc_dollar_cost_averaging_return(df, value_per_period=1200, increase_investment_per_period=0.02)

Output:

The overall return rate is 486.51%
The cash flow (the negative sign stands for investment):
   
[-1200.0,
 -1224.0,
 -1248.48,
 -1273.4496000000001,
 -1298.918592,
 -1324.8969638400001,
 -1351.3949031168002,
 -1378.422801179136,
 -1405.991257202719,
 -1434.1110823467732,
 -1462.7933039937088,
 -1492.049170073583,
 -1521.8901534750546,
 -1552.327956544556,
 -1583.374515675447,
 -1615.042005988956,
 -1647.3428461087349,
 -1680.2897030309098,
 -1713.8954970915281,
 -1748.1734070333584,
 -1783.1368751740258,
 -1818.7996126775063,
 -1855.1756049310566,
 -1892.2791170296778,
 -1930.1246993702714,
 -1968.7271933576767,
 -2008.1017372248302,
 -2048.263771969327,
 -2089.2290474087135,
 -2131.0136283568877,
 -2173.633900924026,
 -2217.106578942506,
 311273.63335198513]

The internal return rate (IRR) is 9.91% per period
The compound annual growth rate (CAGR) is 5.68% per period

df2:

	year	Adj Close	return_next_y	should_invest	total_invest	total_value_next	overall_return_next
0	1988	256.019989	0.115733	1200.000000	1200.000000	1338.879804	0.115733
1	1989	285.649994	0.351864	1224.000000	2424.000000	3464.665484	0.429317
2	1990	386.160004	-0.058836	1248.480000	3672.480000	4435.844184	0.207861
3	1991	363.440002	0.320273	1273.449600	4945.929600	7537.826024	0.524046
4	1992	479.839996	0.075358	1298.918592	6244.848192	9502.668090	0.521681
5	1993	516.000000	0.099031	1324.896964	7569.745156	11899.829221	0.572025
6	1994	567.099976	0.014848	1351.394903	8921.140059	13447.972306	0.507428
7	1995	575.520020	0.386954	1378.422801	10299.562860	20563.532573	0.996544
8	1996	798.219971	0.214014	1405.991257	11705.554117	26671.303126	1.278517
9	1997	969.049988	0.346721	1434.111082	13139.665200	37850.153571	1.880603
10	1998	1305.040039	0.278520	1462.793304	14602.458504	50262.395731	2.442050
11	1999	1668.520020	0.199932	1492.049170	16094.507674	62101.796628	2.858571
12	2000	2002.109985	-0.108011	1521.890153	17616.397827	56751.625622	2.221523
13	2001	1785.859985	-0.088109	1552.327957	19168.725784	53166.862347	1.773625
14	2002	1628.510010	-0.199778	1583.374516	20752.100299	43812.361085	1.111225
15	2003	1303.170044	0.241626	1615.042006	22367.142305	56403.853151	1.521728
16	2004	1618.050049	0.103155	1647.342846	24014.485152	64039.465661	1.666702
17	2005	1784.959961	0.075072	1680.289703	25694.774855	70653.449907	1.749721
18	2006	1918.959961	0.138075	1713.895497	27408.670352	82359.450191	2.004869
19	2007	2183.919922	0.040977	1748.173407	29156.843759	87554.082572	2.002866
20	2008	2273.409912	-0.340563	1783.136875	30939.980634	58912.245650	0.904082
21	2009	1499.170044	0.245396	1818.799613	32758.780247	75634.187986	1.308822
22	2010	1867.060059	0.145277	1855.175605	34613.955851	88746.748765	1.563901
23	2011	2138.300049	0.025300	1892.279117	36506.234969	92932.231838	1.545654
24	2012	2192.399902	0.171570	1930.124699	38436.359668	111137.895169	1.891478
25	2013	2568.550049	0.279590	1968.727193	40405.086861	144730.058188	2.581976
26	2014	3286.689941	0.146649	2008.101737	42413.188598	168257.173136	2.967096
27	2015	3768.679932	-0.001244	2048.263772	44461.452370	170093.499961	2.825640
28	2016	3763.989990	0.146411	2089.229047	46550.681418	197392.199679	3.240372
29	2017	4315.080078	0.218119	2131.013628	48681.695046	243042.959525	3.992492
30	2018	5256.279785	-0.050553	2173.633901	50855.328947	232820.204951	3.578089
31	2019	4990.560059	0.324358	2217.106579	53072.435526	311273.633352	4.865072
32	2020	6609.290039	NaN	2261.448711	55333.884237	NaN	NaN

plot_changes(df2, title='Plan 2', xlabel='year')

png

Conclusion

The compound annual growth rate of Plan 2 is lower than that of Plan 1. This is because Plan 2 invested more money at the later stage of the investment. But, in both plans, the compound annual growth rates are more than 5%.

The internal return rate of both plans are quite similiar to each other (almost 10%).

It seems that SP500 is a good enough index that can bring us long term return. However, the historical data cannot predict the future. Please do your own research before investing.

In the next notebook, we will further discuss which strategy can further increase IRR.

Overview

Disclaimer

Introduction

A peak into the history

Define functions

Plan 1: Yearly dollar cost averaging buying method

Plan 2: Increase the investment each year by 2%

Conclusion

Share on: