Steve has recently graduated with a degree in finance, and he wants to help his parents invest in stock. The dataset he is currently working with is designed to analyze a few dozen stocks, but he wants to be able to expand his dataset to analyze more. The purpose of this project is to refactor the original code so that it can loop through the dataset only once instead of multiple times per stock, and to see if refactoring the code in this way will allow it to run more efficiently.
2017 vastly outperformed 2018 in terms of daily volume and yearly returns, the only exceptions being the two stocks ENPH and RUN.
Judging by the outcomes of these tables, if Steve were to recommend any of the stocks in this dataset to his parents ENPH and RUN would be the safest investment for them to make.
The original for
loop was structured in a way that looped over the dataset multiple times. It checks through each current ticker and loops back around once it ends and moves on to the next one:
'Loop through the tickers
For i = 0 To 11
ticker = tickers(i)
totalVolume = 0
'Loop through the rows
Worksheets("2018").Activate
For j = 2 To RowCount
'Find total volume for the current ticker
If Cells(j, 1).Value = ticker Then
totalVolume = totalVolume + Cells(j, 8).Value
End If
'Find the starting price for the current ticker
If Cells(j, 1) = ticker And Cells(j - 1, 1) <> ticker Then
startingPrice = Cells(j, 6).Value
End If
'Find the ending price for the current ticker
If Cells(j, 1) = ticker And Cells(j + 1, 1) <> ticker Then
endingPrice = Cells(j, 6).Value
End If
Next j
'Output the data for the current ticker
Worksheets("All Stocks Analysis").Activate
Cells(4 + i, 1).Value = ticker
Cells(4 + i, 2).Value = totalVolume
Cells(4 + i, 3).Value = (endingPrice / startingPrice) - 1
Next i
To change this so that the data only needs to loop one time, a tickerIndex
variable was added and used to access four other arrays in the script: tickers
, tickerVolume
, tickerStartingPrices
, and tickerEndingPrices
:
'1a) Create a ticker Index
tickerIndex = 0
'1b) Create three output arrays
Dim tickerVolumes(12) As Long
Dim tickerStartingPrices(12) As Single
Dim tickerEndingPrices(12) As Single
Using if-then
statements and the newly added arrays, a for
loop is used to check for the first row (starting price), the last row (ending price), and to increase the tickerIndex
if the next row’s ticker doesn’t match previous one:
'2a) Create a for loop to initialize the tickerVolumes to zero.
For i = 0 To 11
tickerVolumes(i) = 0
Next i
'2b) Loop over all the rows in the spreadsheet.
For i = 2 To RowCount
'3a) Increase volume for current ticker
If Cells(i, 1).Value = tickers(tickerIndex) Then
tickerVolumes(tickerIndex) = tickerVolumes(tickerIndex) + Cells(i, 8).Value
End If
'3b) Check if the current row is the first row with the selected tickerIndex.
If Cells(i, 1).Value = tickers(tickerIndex) And Cells(i - 1, 1).Value <> tickers(tickerIndex) Then
tickerStartingPrices(tickerIndex) = Cells(i, 6).Value
End If
'3c) check if the current row is the last row with the selected ticker
'If the next row’s ticker doesn’t match, increase the tickerIndex.
If Cells(i, 1).Value = tickers(tickerIndex) And Cells(i + 1, 1).Value <> tickers(tickerIndex) Then
tickerEndingPrices(tickerIndex) = Cells(i, 6).Value
'3d) Increase the tickerIndex.
tickerIndex = tickerIndex + 1
End If
Next i
Finally, a for
loop is used to loop though the arrays and display the output in the worksheet:
'4) Loop through your arrays to output the Ticker, Total Daily Volume, and Return.
For i = 0 To 11
Worksheets("All Stocks Analysis").Activate
Cells(i + 4, 1).Value = tickers(i)
Cells(i + 4, 2).Value = tickerVolumes(i)
Cells(i + 4, 3).Value = tickerEndingPrices(i) / tickerStartingPrices(i) - 1
The original script had a run time of 0.548 seconds for the year 2017, and 0.545 seconds for the year 2018.
The edited script had a run time of 0.108 seconds for the year of 2017, and 0.109 seconds for the year of 2018.
From this comparison, we can conclude that the refactored script had a meaningful improvement on run time performance.
One of the advantages that comes with refactoring code is that it provides a pre-existing framework to work off, and since it’s code that already works, we just need to change it. In this case, we can look at what the original code was is designed to do and determine how to edit it from there. This process saves us the trouble of having to start from scratch and can give some useful insight on what to do next.
However, the disadvantages of working with pre-existing code is the possibility of running into issues with debugging and errors. Since we’re not starting from the beginning, it can be difficult to understand where exactly the code is going wrong- If it’s something that that was added, or if the new things are just conflicting with old things in the code. This can be an even bigger issue when working with code that isn’t your own, making it hard to understand what the original code might be trying to do and how it can be fixed.
When refactoring the original VBA script, I found that working with pre-existing code helped in the fact that the code provided was already working, and since I know how it was written, it gave me a good idea on what needed to change to make it run more smoothly. unfortunately, I still ran into issues when I tried to edit and add things to the code. There was one specific issue that kept popping up and I don’t know how it got there or how it can be fixed. Sometimes when I run the refactored code, it gives me an entirely different run time for both years than it originally did. For example, it will say “This code ran in 8.398 seconds for the year 2017” instead of the original 0.108 seconds.