VBA is a very capable programming language. With it we can craft extraordinary projects in Excel. However, it can only do one thing at a time. We can let it breath a little and jump between events with the DoEvents command, but I'm pretty sure that VBA will always be a single threaded affair, if for no other reason than complexity. Debugging
multi-threaded programs in languages that truly suport threads, like C++, Java and C# is very difficult.
A few months ago I penned a guest article for Ajay over at Databison, demonstrating a technique I've used many times to simulate multi-threaded VBA. Today I thought I'd share the technique with my readers and include an
enhanced example workbook.
I should make it clear that this technique is not multi-threaded VBA, as such does not exist. Instead it is a method of orchestrating multiple processes from Excel, leveraging the fact that Windows is a multitasking operating system.
The scenario is scraping a website, an ideal candidate for this demonstration because it can be an asynchronous task. If you follow a traditional approach you might automate Internet Explorer from VBA. This is probably the worst choice because Internet Explorer is big, fat, and slow, but even worse, Excel's VBA processor will sit waiting synchronously for the information from Internet Explorer, virtually killing the Excel user interface while the program is running.
A better technique is to forgo automating Internet Explorer and to use the MSXML2.XMLHTTP OLE object from Microsoft. While not strictly correct, you can think of it as a web browser with no interface. In other words, it's invisible. It's stripped of everything and just quickly retrieves web page text to memory. It can also send data to a server and is optimized for XML. This technique is much faster than the first. But it still ties up Excel synchronously as it scrapes one record at a time.
My preferred approach is to create and launch a swarm of independent agents from VBA. These agents do not run in Excel. They are not VBA. They are in fact VBScript programs that run in the Windows Scripting Host. I think of them as worker bees, because they are expendable. Each has one mission: go to the website of interest; get one record; write it back to the master Excel file that created it; die. A shotgun blast would be another analogy. After the payload has been received by Excel, a new agent is created and dispatched to replace the one that just terminated. This is simpler to manage than trying to reuse an existing agent.
The swarm method works very well and the speed increase is phenomenal, but dependent on a number of factors. The website being scraped is a big factor. In this demo, we'll be scraping www.realestateabc.com, which is really slow. Several years ago before Zillow.com went to a Flash interface, this technique was unbelievably fast there. The operating system is another factor. I can run several dozen agents in a swarm on Windows XP. The same computer can run hundreds of agents in a swarm under Vista. Windows 7 is likely better than Vista, but I don't have it and so I don't know. But the more agents you can run simultaneously, the better the speed increase. I can only assume that Vista has a better OLE subsystem than XP. And of course the speed of your machine and the amount of memory it has will also affect the size of the swarm and its performance.
The
example workbook showcases all three of these techniques allowing you to test the speed of each. I've also included an interesting chart to monitor progress and speed. It's best to only run one instance of Excel during the testing. The first time you set the swarm loose, you may need to authorize each VBScript file, depending on your security settings.
Many of my examples on this site do not use VBA, or they use minimal VBA. This example is heavy on VBA, but interestingly most of the work is done by the VBScript agents when using the swarm. Of course when using the other two techniques, VBA is the workhorse.
For an added bit of enlightenment, run the swarm with the Windows Task Manager open to Processes, and the Image Name column sorted Z-A. You'll see all the agent programs as they are spawned, and then near the end of the run, you'll see them disappear.
This workbook is an example of fairly advanced Excel. The idea of Excel launching a swarm of programs that run in a different language and ultimately each write back to the Excel workbook that created them is... out there! Even the chart inclusion that monitors the progress and speed is a sophisticated specimen. I think that most users could learn a tremendous amount by fully studying this example workbook.
Please let me know what you think. If you have any questions on the methodology or the VBA or VBScript coding, please fire away.
Now imagine you had to scrape a hundred thousand records...
Here's the file:
--- UPDATE ---
There's an interesting enhancement to this sample workbook
here.