With this post I wanted to highlight two interesting techniques for web text parsing via VBA using Selenium framework and WinHttp.
Task:
Given the set of key words (comma separated list) we want to count the occurrence of the key word in the html body of the web page (giving us the popularity of the key word within the set of web pages).
Solution One:
“Selenium” as popularly known in the market, is a very robust web testing framework and used in writing automated testing for Web UI. So recently I stumbled upon a very interesting project which is a wrapper written for using selenium via VBA exposing very strong integration of selenium functionalities with VBA. The project home page is very good in describing upon the features of the same.
Solution Two:
Using the native WinHttp solution in VBA, we are downloading the HTML text of the web page via “GET” command and thus counting the key word occurrence in the same.
Comparison:
For the task in hand I found that using the WinHttp solution wins over the Selenium based methodology purely by the performance and over head the Selenium incurs for the same.
The Selenium framework has its own very important place to automate the Web UI task in hand when JavaScript is also in play with modern day web pages. Selenium plays a very important role where one can script all the user actions that have to be performed on the web, like filling forms, clicking buttons, checking options and more with the power of VBA.
Note: There is a large difference in count values between the Selenium and WinHttp based tools, for the reason that for former I am using Instr function to get the key word count, where as for the later I a using custom written function for the key word count.
Download solution
References:
Link1: http://code.google.com/p/selenium-vba/
Sunday, 20 May 2012
Friday, 4 May 2012
Excel VBA Data Warehouse Generator Tool
Recently I was observing our database development team, and I observed that much of my colleagues were struggling with lots of boiler plate code. In the process of generating Dimension and Fact tables (in snow flaked schema), I thought it would be nice to create an easy to use tool to generate database schema with the following rules in place:
1. Each dimension table would have a primary key, combined with identity auto generation.
2. A dimension table may have referential constraint upon another dimension table
3. A fact table will have multiple referential constraints from multiple dimension tables but not from any other factual table.
4. A fact table will not have a primary key nor an identity column.
This tool is primarily an assisting tool for repetitive process occurred in the data warehouse table structure generation, though for any other design requirements developer intervention is very much required to meet the special conditions.
To use the tool, there is a main page where all the relevant information needs to be set like connection string, script folder path and updated database on which the operation needs to be performed once the list is refreshed.
Control Page:
Dimension Page:
Fact Page:
Then there is a SQL Server Management Studio style dimension and fact table generation worksheets where user can set the desired columns required according to the business needs along with integrated drop down list on the Table Link column where the refrence dimension tables can be selected(The table name for dimension and fact wil be appended by keywords "Dim" and "Fact" respectivly. And also for the dimension tables the key columns as they will be auto generated via VBA and appended terms like "Id").
Download solution
1. Each dimension table would have a primary key, combined with identity auto generation.
2. A dimension table may have referential constraint upon another dimension table
3. A fact table will have multiple referential constraints from multiple dimension tables but not from any other factual table.
4. A fact table will not have a primary key nor an identity column.
This tool is primarily an assisting tool for repetitive process occurred in the data warehouse table structure generation, though for any other design requirements developer intervention is very much required to meet the special conditions.
To use the tool, there is a main page where all the relevant information needs to be set like connection string, script folder path and updated database on which the operation needs to be performed once the list is refreshed.
Control Page:
Dimension Page:
Fact Page:
Then there is a SQL Server Management Studio style dimension and fact table generation worksheets where user can set the desired columns required according to the business needs along with integrated drop down list on the Table Link column where the refrence dimension tables can be selected(The table name for dimension and fact wil be appended by keywords "Dim" and "Fact" respectivly. And also for the dimension tables the key columns as they will be auto generated via VBA and appended terms like "Id").
Download solution
File - Directory (Tree Map) in Excel
Being a software developer and in my quest of achieving an organised methodology for saving my personal files, I many times end up saving same file in multiple locations. Ahh…
So many times we can’t keep track of how our file structure if expanding underlying the cover of multiple nested folders. So, I thought to develop a tool in Excel/VBA which could visually represent me the file directory structure in the form of tree and also could provide me an easy to navigate mechanism for the nodes I wish to explore.
Some thing like this …
This tool request for the root node location to start its parsing and with the condition provided by the user to layout the sub folder (recursively) by a check box you will achieve the above result.
Download:
Download solution
So many times we can’t keep track of how our file structure if expanding underlying the cover of multiple nested folders. So, I thought to develop a tool in Excel/VBA which could visually represent me the file directory structure in the form of tree and also could provide me an easy to navigate mechanism for the nodes I wish to explore.
Some thing like this …
This tool request for the root node location to start its parsing and with the condition provided by the user to layout the sub folder (recursively) by a check box you will achieve the above result.
Public Sub FileLister(rootPath As String, exploreSubFolder As Boolean) Dim file As Scripting.file, folder As Scripting.folder, subfolder As Scripting.folder If fso Is Nothing Then Set fso = New FileSystemObject End If Set folder = fso.GetFolder(rootPath) If (folder.SubFolders.Count > 0) And exploreSubFolder Then For Each subfolder In folder.SubFolders nestLevel = nestLevel + 1 ReDim Preserve navigated(nestLevel) If navigated(nestLevel) = 0 Then For Each file In folder.Files 'Record files in folder anchorRange.Offset(printRow, nestLevel + 1).Value = file.Name anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), file.path, , , "Link" printRow = printRow + 1 Next file End If anchorRange.Offset(printRow, nestLevel + 1).Value = subfolder.Name anchorRange.Offset(printRow, nestLevel + 1).Interior.Color = 10092543 'Light Yellow anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), subfolder.path, , , "Link" printRow = printRow + 1 navigated(nestLevel) = 1 FileLister subfolder.path, exploreSubFolder nestLevel = nestLevel - 1 Next subfolder Else nestLevel = nestLevel + 1 For Each file In folder.Files 'Record files in folder anchorRange.Offset(printRow, nestLevel + 1).Value = file.Name anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), file.path, , , "Link" printRow = printRow + 1 Next file For Each subfolder In folder.SubFolders anchorRange.Offset(printRow, nestLevel + 1).Value = subfolder.Name anchorRange.Offset(printRow, nestLevel + 1).Interior.Color = 10092543 'Light Yellow anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), subfolder.path, , , "Link" printRow = printRow + 1 Next subfolder nestLevel = nestLevel - 1 If exploreSubFolder Then navigated(nestLevel) = 0 End If End If End Sub
Download:
Download solution
Subscribe to:
Posts (Atom)