Amol Pandey: October 2011

Monday, 17 October 2011

Excel – iTunes: Let’s account for our music

Recently I was bit lost managing my iTunes songs and its playlist. So I thought excel would lend me a hand to get hold of my music analytics. So here I have created an excel VBA based solution to get out my list of all the songs present in my iPod.

Additionally I have also created a utility to create playlist via VBA for my iPod. Hope this tool helps many others to enjoy their music excelly.

Songs list from iPod to Excel Sheet

Creating Playlist in iPod from Excel

Result in iTunes

Download:

Download solution

References:
Link1: http://developer.apple.com/sdk/itunescomsdk.html

MySQL: User Activity Tracking (Bandwidth)

Recently I was presented a question whether it would be possible on MySQL to track the user activity in terms of queries and the amount of data downloaded by the individual user from the database.

Thus in the quest I came across a brilliant tool named “MySQL Proxy”. This tool provides a very strong functionality for MySQL database, which includes multiple MySQL server load balancing, query injection, and much more. This tool provided a very robust solution for the request with the usage of Lua scripting. Rather me extending more benefits about the tool in the post, I provide all the references below for the readers to explore more benefits of the tool themselves and here I would like to present my solution to record user activity in MySQL using MySQL Proxy with the following Lua script.

Note: This script also provides a section where user can be prohibited to download more data than the specified limit.

-- measures bandwidth by user

proxy.global.bandwidth = proxy.global.bandwidth or {}

local session_user 
local sqlQuery
local before_bytes_sent = 0
local after_bytes_sent  = 0
local before_bytes_recevied = 0
local after_bytes_recevied = 0

function read_auth()
    session_user = proxy.connection.client.username
    proxy.global.bandwidth[session_user] = 
        proxy.global.bandwidth[session_user] or 0
end

function read_query (packet )
    -- just to show how we can block a user query
    -- when the quota has been exceeded
    if proxy.global.bandwidth[session_user] > 10000 
       and session_user ~= 'root'
    then
        return error_result('you have exceeded your query quota')  
    end
    sqlQuery = string.sub(packet, 2)
    proxy.global.bandwidth[session_user ] = 
        proxy.global.bandwidth[session_user] + packet:len()

    proxy.queries:append(1, string.char(proxy.COM_QUERY) .. "SHOW SESSION STATUS LIKE '%Bytes%'", {resultset_is_needed = true} )
    proxy.queries:append(2, packet,{resultset_is_needed = true})
    proxy.queries:append(3, string.char(proxy.COM_QUERY) .. "SHOW SESSION STATUS LIKE '%Bytes%'", {resultset_is_needed = true} )

    return proxy.PROXY_SEND_QUERY
end

function read_query_result(inj)
 
 if (inj.id == 4) then
     return proxy.PROXY_IGNORE_RESULT
 end
        
        if (inj.id == 1) or (inj.id == 3) then
                for row in inj.resultset.rows do
                        if (row[1] == "Bytes_sent") and (inj.id == 1) then
        before_bytes_sent = row[2]
   end 
   if (row[1] == "Bytes_sent") and (inj.id == 3) then
        after_bytes_sent = row[2]
   end
   if (row[1] == "Bytes_received") and (inj.id == 1) then
        before_bytes_recevied = row[2]
   end 
   if (row[1] == "Bytes_received") and (inj.id == 3) then
        after_bytes_recevied = row[2]
   end   
                end
                
             if (inj.id == 3) then
     print("Bytes sent before: " .. before_bytes_sent)
     print("Bytes sent after: " .. after_bytes_sent)
     print("Bytes received before: " .. before_bytes_recevied)
     print("Bytes received after: " .. after_bytes_recevied)
     print("Net Bytes sent: " .. (after_bytes_sent - before_bytes_sent))
     print("Net Bytes received: " .. (after_bytes_recevied - before_bytes_recevied))
     print("Username: " .. session_user)
     print("Query: " .. sqlQuery) 
     print("DateTime: " .. os.date("%Y-%m-%d %H:%M:%S"))
     insert_log_query(session_user, os.date("%Y-%m-%d %H:%M:%S"), sqlQuery, (after_bytes_sent - before_bytes_sent), (after_bytes_recevied - before_bytes_recevied))
          end
          
                return proxy.PROXY_IGNORE_RESULT
        end
end

function insert_log_query(username, date_time, query , net_bytes_sent, net_bytes_recived)
      print(username, date_time, query , net_bytes_sent, net_bytes_recived)
      proxy.queries:append(4, string.char(proxy.COM_QUERY) .. "INSERT INTO `employees`.`user_log` (`username`, `date_time`, `query`, `bytes_sent`, `bytes_recived`) VALUES ('" ..
      username .. "','" .. date_time .. "',\"" .. query .. "\"," ..  net_bytes_sent .. "," .. net_bytes_recived .. ");", {resultset_is_needed = true})
      return proxy.PROXY_SEND_QUERY 
end

This script uses the "SHOW SESSION STATUS" results for the recording of bytes sent and recevied. This also illustrates the power of MySQL Proxy which enables use to inject addtional queries to the database and process their results to our needs without any affects visible to the end user.

Download:

Download solution

References:

Link1: http://forge.mysql.com/wiki/MySQL_Proxy

Link2: http://forge.mysql.com/wiki/MySQL_Proxy_Cookbook

Sunday, 16 October 2011

MySQL Large Table: Split OR Partitioning???

For couple of past months this question has puzzled me at my work. And seeking answers over the internet, I wasn’t able to find a specific answer to the question. So in this post I would only like to highlight my experiences so far with my decision, with no bias to any particular method.

Table Split Vs Partitioning, this decision should be primarily based on the context of usage pattern of the database and type of queries being executed on the database on regular basis/users of the database.

When to split table into smaller tables:
• If the queried table is being scanned on non regular columns (i.e. the queries “Where” clause always changes to different columns within the table)
• If the queries are analytical in nature and direct users of the database are business users.
• If the partition mechanism has to span more than 1024 partition (MySQL limitation)

The disadvantage of splitting the table into multiple tables, highlight problems relevant to querying the database upon multiple tables (with usage of dynamic SQL within stored procedures), complex logic, creation of large number of tables and further more. But these problems outweigh the benefits achieved for analytical purposes once the system is set, keeping it in simple terms with each query upon spliced tables has fewer rows to scan physically and hence forth the results are faster with union all’ed result presented and consistent across any column scan involved in the query.

When to partition a table:
• If the queries are mostly regular in nature or database acts as a backend to the business system (i.e. the majority of queries “Where” clause is using the same column for scan within the table).
• The use of database if limited to storing of records and retrieval of records on standard parameters (i.e. non analytical purposes).
• Where database is being utilized by ORM mechanisms like ADO.NET/Hibernate.
• Foreign keys are not supported on partitioned table.

The disadvantage of partitioned table within an analytical environment is some times more detrimental in terms of performance than the advantages it results into. This is due to the fact when the column scans is performed on the partitioned table upon which the table is not partitioned is employs mysql more effort to scan the each partition of the table for the results and query execution is slower than the table split. But also to mention in spite of the partitioning mechanism used one should also take care of the mechanism of “Partition Pruning” related to the where clause in the select queries illustrating the mysql which partitions to scan for the result.

Performance Results:

In the experiment table contains 28,44,042 rows with "from_date" being indexed:
Note: all the tables in the example are partitioned on the “from_date” column in the table.

#**Simple Table **
CREATE TABLE `salaries` (
 `emp_no` INT(11) NOT NULL,
 `salary` INT(11) NOT NULL,
 `from_date` DATE NOT NULL,
 `to_date` DATE NOT NULL,
 PRIMARY KEY (`emp_no`, `from_date`),
 INDEX `emp_no` (`emp_no`)
 )
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
ROW_FORMAT=DEFAULT

#**Partition by month**
CREATE TABLE `salaries_copy` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT
/*!50100 PARTITION BY HASH (Month(from_date))
PARTITIONS 12 */

#**Partition by Range**
CREATE TABLE `salaries_copy_1` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT
/*!50100 PARTITION BY RANGE (to_days(from_date))
(PARTITION p0 VALUES LESS THAN (to_days('1985-01-01')) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (to_days(‘1986-01-01’)) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (to_days(‘1987-01-01’)) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (to_days('1988-01-01')) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (to_days('1989-01-01')) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (to_days('1990-01-01')) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (to_days('1991-01-01')) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (to_days('1992-01-01')) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (to_days('1993-01-01')) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (to_days('1994-01-01')) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN (to_days('1995-01-01')) ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN (to_days('1996-01-01')) ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN (to_days('1997-01-01')) ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN (to_days('1998-01-01')) ENGINE = InnoDB,
 PARTITION p14 VALUES LESS THAN (to_days('1999-01-01')) ENGINE = InnoDB,
 PARTITION p15 VALUES LESS THAN (to_days('2000-01-01')) ENGINE = InnoDB,
 PARTITION p16 VALUES LESS THAN (to_days('2001-01-01')) ENGINE = InnoDB,
 PARTITION p17 VALUES LESS THAN (to_days('2002-01-01')) ENGINE = InnoDB,
 PARTITION p18 VALUES LESS THAN (to_days('2003-01-01')) ENGINE = InnoDB,
 PARTITION p19 VALUES LESS THAN (to_days('2004-01-01')) ENGINE = InnoDB,
 PARTITION p20 VALUES LESS THAN (to_days('2005-01-01')) ENGINE = InnoDB,
 PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

#********* Test 1: Queries scanning the partitioned column: from_date ***********
Select SQL_NO_CACHE * From salaries tbl
where tbl.from_date >= '2000-03-15' and tbl.from_date < '2000-09-25';
#Duration for 1 query: 0.016 sec. (+ 2.137 sec. network)

Select SQL_NO_CACHE * From salaries_copy tbl
where tbl.from_date >= '2000-03-15' and tbl.from_date < '2000-09-25';
#Duration for 1 query: 2.106 sec. (+ 5.288 sec. network)

Select SQL_NO_CACHE * From salaries_copy_1 tbl
where tbl.from_date >= '2000-03-15' and tbl.from_date < '2000-09-25';
#Duration for 1 query: 0.063 sec. (+ 1.185 sec. network)

Select SQL_NO_CACHE * From salaries_1985
where salaries_1985.from_date >= '2000-03-15' and salaries_1985.from_date < '2000-09-25'
 UNION ALL 
Select * From salaries_1986
where salaries_1986.from_date >= '2000-03-15' and salaries_1986.from_date < '2000-09-25'
 UNION ALL …
…
Select * From salaries_2005
where salaries_2005.from_date >= '2000-03-15' and salaries_2005.from_date < '2000-09-25';
#Duration for 1 queries: 1.638 sec. (+ 0.484 sec. network)

#********* Test 2: Queries scanning the non partitioned column: to_date ***********
Select SQL_NO_CACHE * From salaries tbl
where tbl.to_date >= '2000-03-15' and tbl.to_date < '2000-09-25';
#Duration for 1 query: 0.109 sec. (+ 2.762 sec. network)

Select SQL_NO_CACHE * From salaries_copy tbl
where tbl.to_date >= '2000-03-15' and tbl.to_date < '2000-09-25';
#Duration for 1 query: 1.201 sec. (+ 6.521 sec. network)

Select SQL_NO_CACHE * From salaries_copy_1 tbl
where tbl.to_date >= '2000-03-15' and tbl.to_date < '2000-09-25';
#Duration for 1 query: 7.472 sec. (+ 3.058 sec. network)

Select SQL_NO_CACHE * From salaries_1985
where salaries_1985.to_date >= '2000-03-15' and salaries_1985.to_date < '2000-09-25'
 UNION ALL 
Select * From salaries_1986
where salaries_1986.to_date >= '2000-03-15' and salaries_1986.to_date < '2000-09-25'
 UNION ALL …
…
Select * From salaries_2005
where salaries_2005.to_date >= '2000-03-15' and salaries_2005.to_date < '2000-09-25';
#Duration for 1 query: 1.670 sec. (+ 0.483 sec. network)

Comparison Table:

Query on “from_date”	Query on “to_date”	Description
Indexed column	Non-Indexed column	-
0.016 sec. (+ 2.137 sec. network)	0.109 sec. (+ 2.762 sec. network)	Simple table
2.106 sec. (+ 5.288 sec. network)	1.201 sec. (+ 6.521 sec. network)	Partition by HASH (Month(from_date))
0.063 sec. (+ 1.185 sec. network)	7.472 sec. (+ 3.058 sec. network)	Partition by RANGE (to_days(from_date))
1.638 sec. (+ 0.484 sec. network)	1.670 sec. (+ 0.483 sec. network)	Table Split by year(from_date)

Though I do not wish to end this trail in here and I would like to know reader opinions and thoughts about this topic and shed more light whether the table split is better than partitioning or vice versa.

Look forward to your comments …

References:
Link1: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Thursday, 6 October 2011

Dynamic VBA ... Why.... ???

Recently i came up with a thought in my free mind, if we can use dynamic SQL for many kinds of SQL servers, there must be a way to execute the dynamic VBA. But then lately i am very confused as how can and it would be useful in any scenario. As we can have multiple alternatives than to dynamically plug the VBA code in and execute it.

But any ways here the solution.. if some one please find a sensible usage of this method please leave a comment for me to know its utility.

Download:

Download solution

References:
Link: http://www.cpearson.com/excel/vbe.aspx

Tuesday, 4 October 2011

VBA tooo WCF tooo C#

Recently I came across an issue where a VBA application model had to trigger a job in remotely running C# application. Suddenly polling via DB and creating a queue based mechanism was on cards, but this process involves so many elements and is resource intensive. So I thought there should be a better mechanism for the same.

And WCF was to answer my problem. I have seen some solutions online but was based on the WCF as a service but in my context I had to have a win form/console application and VBA had to trigger a job in those applications based remotely.

So I ported the service based solution to my needs and below the code illustrated and sample files for the C# and VBA solution.

(Note: Sometimes if the VBA call fails please instead of localhost use the ip-address of the PC on which the C# application is running.)

Sub testWcf()

Dim addr As String
Dim i As Long
addr = "service:mexAddress=""net.tcp://localhost:9001/hello/mex"","
addr = addr + "address=""net.tcp://localhost:9001/hello"","
addr = addr + "contract=""IHelloWorldService"", contractNamespace=""http://tempuri.org/"","
addr = addr + "binding=""NetTcpBinding_IHelloWorldService"", bindingNamespace=""http://tempuri.org/"""


Dim service1 As Object
Set service1 = GetObject(addr)

For i = 0 To 10

Debug.Print service1.SayHello("Test Message: " & i)

Next i


End Sub

And the associated C# model is as follows:

using System;
using System.Collections.Generic;
using System.Linq;
using System.ServiceModel.Channels;
using System.Text;
using System.ServiceModel;
using System.ServiceModel.Description;

namespace ConsoleApplication1
{

    [ServiceContract]
    public interface IHelloWorldService
    {
        [OperationContract]
        string SayHello(string name);
    }

    public class HelloWorldService : IHelloWorldService
    {
        public string SayHello(string name)
        {
            Console.WriteLine("The param passed is :" + name);
            return string.Format("Hello {0}", name);
        }
    }

    class Program
    {


        static void Main(string[] args)
        {
            BindingElement bindingElement =
                new TcpTransportBindingElement();
            CustomBinding binding =
               new CustomBinding(bindingElement);
            Uri tcpBaseAddress =
               new Uri("net.tcp://localhost:9001/hello");
            ServiceHost host =
               new ServiceHost(typeof(HelloWorldService), tcpBaseAddress);
            ServiceMetadataBehavior metadataBehavior;
            metadataBehavior =
               host.Description.Behaviors.
               Find();
            if (metadataBehavior == null)
            {
                metadataBehavior = new ServiceMetadataBehavior();
                host.Description.Behaviors.Add(metadataBehavior);
            }
            host.AddServiceEndpoint(typeof (IHelloWorldService), new NetTcpBinding(), tcpBaseAddress);
            host.AddServiceEndpoint(
               typeof(IMetadataExchange), binding, "MEX");
            host.Open();
            Console.WriteLine("Server started @." + tcpBaseAddress);
            Console.ReadLine();
        }
    }
}

Download solution

Refrences:
Link: http://damianblog.com/2009/07/05/excel-wcf/