Showing posts with label MySql. Show all posts
Showing posts with label MySql. Show all posts

Monday, 17 October 2011

MySQL: User Activity Tracking (Bandwidth)

Recently I was presented a question whether it would be possible on MySQL to track the user activity in terms of queries and the amount of data downloaded by the individual user from the database. 

 

Thus in the quest I came across a brilliant tool named “MySQL Proxy”. This tool provides a very strong functionality for MySQL database, which includes multiple MySQL server load balancing, query injection, and much more. This tool provided a very robust solution for the request with the usage of Lua scripting. Rather me extending more benefits about the tool in the post, I provide all the references below for the readers to explore more benefits of the tool themselves and here I would like to present my solution to record user activity in MySQL using MySQL Proxy with the following Lua script.

Note: This script also provides a section where user can be prohibited to download more data than the specified limit.
-- measures bandwidth by user

proxy.global.bandwidth = proxy.global.bandwidth or {}

local session_user 
local sqlQuery
local before_bytes_sent = 0
local after_bytes_sent  = 0
local before_bytes_recevied = 0
local after_bytes_recevied = 0

function read_auth()
    session_user = proxy.connection.client.username
    proxy.global.bandwidth[session_user] = 
        proxy.global.bandwidth[session_user] or 0
end

function read_query (packet )
    -- just to show how we can block a user query
    -- when the quota has been exceeded
    if proxy.global.bandwidth[session_user] > 10000 
       and session_user ~= 'root'
    then
        return error_result('you have exceeded your query quota')  
    end
    sqlQuery = string.sub(packet, 2)
    proxy.global.bandwidth[session_user ] = 
        proxy.global.bandwidth[session_user] + packet:len()

    proxy.queries:append(1, string.char(proxy.COM_QUERY) .. "SHOW SESSION STATUS LIKE '%Bytes%'", {resultset_is_needed = true} )
    proxy.queries:append(2, packet,{resultset_is_needed = true})
    proxy.queries:append(3, string.char(proxy.COM_QUERY) .. "SHOW SESSION STATUS LIKE '%Bytes%'", {resultset_is_needed = true} )

    return proxy.PROXY_SEND_QUERY
end

function read_query_result(inj)
 
 if (inj.id == 4) then
     return proxy.PROXY_IGNORE_RESULT
 end
        
        if (inj.id == 1) or (inj.id == 3) then
                for row in inj.resultset.rows do
                        if (row[1] == "Bytes_sent") and (inj.id == 1) then
        before_bytes_sent = row[2]
   end 
   if (row[1] == "Bytes_sent") and (inj.id == 3) then
        after_bytes_sent = row[2]
   end
   if (row[1] == "Bytes_received") and (inj.id == 1) then
        before_bytes_recevied = row[2]
   end 
   if (row[1] == "Bytes_received") and (inj.id == 3) then
        after_bytes_recevied = row[2]
   end   
                end
                
             if (inj.id == 3) then
     print("Bytes sent before: " .. before_bytes_sent)
     print("Bytes sent after: " .. after_bytes_sent)
     print("Bytes received before: " .. before_bytes_recevied)
     print("Bytes received after: " .. after_bytes_recevied)
     print("Net Bytes sent: " .. (after_bytes_sent - before_bytes_sent))
     print("Net Bytes received: " .. (after_bytes_recevied - before_bytes_recevied))
     print("Username: " .. session_user)
     print("Query: " .. sqlQuery) 
     print("DateTime: " .. os.date("%Y-%m-%d %H:%M:%S"))
     insert_log_query(session_user, os.date("%Y-%m-%d %H:%M:%S"), sqlQuery, (after_bytes_sent - before_bytes_sent), (after_bytes_recevied - before_bytes_recevied))
          end
          
                return proxy.PROXY_IGNORE_RESULT
        end
end

function insert_log_query(username, date_time, query , net_bytes_sent, net_bytes_recived)
      print(username, date_time, query , net_bytes_sent, net_bytes_recived)
      proxy.queries:append(4, string.char(proxy.COM_QUERY) .. "INSERT INTO `employees`.`user_log` (`username`, `date_time`, `query`, `bytes_sent`, `bytes_recived`) VALUES ('" ..
      username .. "','" .. date_time .. "',\"" .. query .. "\"," ..  net_bytes_sent .. "," .. net_bytes_recived .. ");", {resultset_is_needed = true})
      return proxy.PROXY_SEND_QUERY 
end

This script uses the "SHOW SESSION STATUS" results for the recording of bytes sent and recevied. This also illustrates the power of MySQL Proxy which enables use to inject addtional queries to the database and process their results to our needs without any affects visible to the end user.

Download:

Download Solution
Download solution


References: 
Link2: http://forge.mysql.com/wiki/MySQL_Proxy_Cookbook

Sunday, 16 October 2011

MySQL Large Table: Split OR Partitioning???

For couple of past months this question has puzzled me at my work. And seeking answers over the internet, I wasn’t able to find a specific answer to the question. So in this post I would only like to highlight my experiences so far with my decision, with no bias to any particular method.

Table Split Vs Partitioning, this decision should be primarily based on the context of usage pattern of the database and type of queries being executed on the database on regular basis/users of the database.


When to split table into smaller tables:
• If the queried table is being scanned on non regular columns (i.e. the queries “Where” clause always changes to different columns within the table)
• If the queries are analytical in nature and direct users of the database are business users.
• If the partition mechanism has to span more than 1024 partition (MySQL limitation)

The disadvantage of splitting the table into multiple tables, highlight problems relevant to querying the database upon multiple tables (with usage of dynamic SQL within stored procedures), complex logic, creation of large number of tables and further more. But these problems outweigh the benefits achieved for analytical purposes once the system is set, keeping it in simple terms with each query upon spliced tables has fewer rows to scan physically and hence forth the results are faster with union all’ed result presented and consistent across any column scan involved in the query.

When to partition a table:
• If the queries are mostly regular in nature or database acts as a backend to the business system (i.e. the majority of queries “Where” clause is using the same column for scan within the table).
• The use of database if limited to storing of records and retrieval of records on standard parameters (i.e. non analytical purposes).
• Where database is being utilized by ORM mechanisms like ADO.NET/Hibernate.
• Foreign keys are not supported on partitioned table.

The disadvantage of partitioned table within an analytical environment is some times more detrimental in terms of performance than the advantages it results into. This is due to the fact when the column scans is performed on the partitioned table upon which the table is not partitioned is employs mysql more effort to scan the each partition of the table for the results and query execution is slower than the table split. But also to mention in spite of the partitioning mechanism used one should also take care of the mechanism of “Partition Pruning” related to the where clause in the select queries illustrating the mysql which partitions to scan for the result.

Performance Results:

In the experiment table contains 28,44,042 rows with "from_date" being indexed:
Note: all the tables in the example are partitioned on the “from_date” column in the table. 


#**Simple Table **
CREATE TABLE `salaries` (
 `emp_no` INT(11) NOT NULL,
 `salary` INT(11) NOT NULL,
 `from_date` DATE NOT NULL,
 `to_date` DATE NOT NULL,
 PRIMARY KEY (`emp_no`, `from_date`),
 INDEX `emp_no` (`emp_no`)
 )
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
ROW_FORMAT=DEFAULT

#**Partition by month**
CREATE TABLE `salaries_copy` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT
/*!50100 PARTITION BY HASH (Month(from_date))
PARTITIONS 12 */

#**Partition by Range**
CREATE TABLE `salaries_copy_1` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT
/*!50100 PARTITION BY RANGE (to_days(from_date))
(PARTITION p0 VALUES LESS THAN (to_days('1985-01-01')) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (to_days(‘1986-01-01’)) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (to_days(‘1987-01-01’)) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (to_days('1988-01-01')) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (to_days('1989-01-01')) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (to_days('1990-01-01')) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (to_days('1991-01-01')) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (to_days('1992-01-01')) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (to_days('1993-01-01')) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (to_days('1994-01-01')) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN (to_days('1995-01-01')) ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN (to_days('1996-01-01')) ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN (to_days('1997-01-01')) ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN (to_days('1998-01-01')) ENGINE = InnoDB,
 PARTITION p14 VALUES LESS THAN (to_days('1999-01-01')) ENGINE = InnoDB,
 PARTITION p15 VALUES LESS THAN (to_days('2000-01-01')) ENGINE = InnoDB,
 PARTITION p16 VALUES LESS THAN (to_days('2001-01-01')) ENGINE = InnoDB,
 PARTITION p17 VALUES LESS THAN (to_days('2002-01-01')) ENGINE = InnoDB,
 PARTITION p18 VALUES LESS THAN (to_days('2003-01-01')) ENGINE = InnoDB,
 PARTITION p19 VALUES LESS THAN (to_days('2004-01-01')) ENGINE = InnoDB,
 PARTITION p20 VALUES LESS THAN (to_days('2005-01-01')) ENGINE = InnoDB,
 PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

#********* Test 1: Queries scanning the partitioned column: from_date ***********
Select SQL_NO_CACHE * From salaries tbl
where tbl.from_date >= '2000-03-15' and tbl.from_date < '2000-09-25';
#Duration for 1 query: 0.016 sec. (+ 2.137 sec. network)

Select SQL_NO_CACHE * From salaries_copy tbl
where tbl.from_date >= '2000-03-15' and tbl.from_date < '2000-09-25';
#Duration for 1 query: 2.106 sec. (+ 5.288 sec. network)

Select SQL_NO_CACHE * From salaries_copy_1 tbl
where tbl.from_date >= '2000-03-15' and tbl.from_date < '2000-09-25';
#Duration for 1 query: 0.063 sec. (+ 1.185 sec. network)

Select SQL_NO_CACHE * From salaries_1985
where salaries_1985.from_date >= '2000-03-15' and salaries_1985.from_date < '2000-09-25'
 UNION ALL 
Select * From salaries_1986
where salaries_1986.from_date >= '2000-03-15' and salaries_1986.from_date < '2000-09-25'
 UNION ALL …
…
Select * From salaries_2005
where salaries_2005.from_date >= '2000-03-15' and salaries_2005.from_date < '2000-09-25';
#Duration for 1 queries: 1.638 sec. (+ 0.484 sec. network)

#********* Test 2: Queries scanning the non partitioned column: to_date ***********
Select SQL_NO_CACHE * From salaries tbl
where tbl.to_date >= '2000-03-15' and tbl.to_date < '2000-09-25';
#Duration for 1 query: 0.109 sec. (+ 2.762 sec. network)

Select SQL_NO_CACHE * From salaries_copy tbl
where tbl.to_date >= '2000-03-15' and tbl.to_date < '2000-09-25';
#Duration for 1 query: 1.201 sec. (+ 6.521 sec. network)

Select SQL_NO_CACHE * From salaries_copy_1 tbl
where tbl.to_date >= '2000-03-15' and tbl.to_date < '2000-09-25';
#Duration for 1 query: 7.472 sec. (+ 3.058 sec. network)

Select SQL_NO_CACHE * From salaries_1985
where salaries_1985.to_date >= '2000-03-15' and salaries_1985.to_date < '2000-09-25'
 UNION ALL 
Select * From salaries_1986
where salaries_1986.to_date >= '2000-03-15' and salaries_1986.to_date < '2000-09-25'
 UNION ALL …
…
Select * From salaries_2005
where salaries_2005.to_date >= '2000-03-15' and salaries_2005.to_date < '2000-09-25';
#Duration for 1 query: 1.670 sec. (+ 0.483 sec. network)

Comparison Table:

Query on “from_date Query on “to_date Description
Indexed column Non-Indexed column -
0.016 sec. (+ 2.137 sec. network) 0.109 sec. (+ 2.762 sec. network) Simple table
2.106 sec. (+ 5.288 sec. network) 1.201 sec. (+ 6.521 sec. network) Partition by HASH (Month(from_date))
0.063 sec. (+ 1.185 sec. network) 7.472 sec. (+ 3.058 sec. network) Partition by RANGE (to_days(from_date))
1.638 sec. (+ 0.484 sec. network) 1.670 sec. (+ 0.483 sec. network) Table Split by year(from_date)


Though I do not wish to end this trail in here and I would like to know reader opinions and thoughts about this topic and shed more light whether the table split is better than partitioning or vice versa.

Look forward to your comments …

References: 
Link1: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Saturday, 5 March 2011

SSH Tunneling (Port Forward via VBA) W/O PUTTY

Recently i faced a situation where I had to connect to MySQL server via VBA to extract the data for reporting.
(Sounds simple .. Think again..)

Standing inbetween was a technique commonly used in banks and secure insitutuions to establish secure communication to remote location called "Tunneling". In order to tunnel to remote location for establising MySQL database connection (via technique called PORT Forwarding) in which a local port of YOUR PC is MAPPED to connect to remote location port via TUNNEL using tools like PUTTY.

Confused...

Let me make it simple:

1. I need to connect to MySQL DB located in remote location via VBA.
2. The remote location details for MySQL server are as follows:

Remote Server IP : xxx.xxx.xxx.xxx
Remote MYSQL Port: 3306 (Standard MySQL Port)

3. I open putty and connect to remote server via PUTTY with password verification and forwarding my local port to Map to remote port

Local Mapped Port : 8585 (Can be any number)

This local port now represent the MySQL service on remote machine via secure tunnelled channel on which we would be required to obtain the data.

Now my trouble was the existing practise of invoking the Putty via VBA Shell  command for establishing the channel and killing the process for closing the same was unreliable as there is no way to confirm if the PORT forward had been established unless via On Error cluases in VBA for connection open menthod.

In order to over come this issue i exposed the methods from C# library (Tamir Gal SSH.NET) to VBA and also provided a Fix for the library usage for the corporate environment in which the Admin priveliges are least granted for the users to deploy this solution. This solution is deplyed and useable in the folllowing manner:

1. Save the complete unzipped folder to your C:\ drive.

2. Run the included .reg file in the package to create the entries for the dll in the registry (where in the secure corporate environments where access to the local machine branch of registry is prohibited 'HKLM' this .reg file registers the classes in the 'HKCU' branch this allowing easy deployment for secure environments.

3. Now from the VBA editior the library can be refrenced as Tools -> Refrences



4. And can be used before the connection open to establish the PORT forward via boolean to indicate the sucess/failure of the operation.

Sub TunnelExample()
    Dim SSh As SSHTunnelVBA.VBATunnel
    Dim bln As Boolean

    Set SSh = New SSHTunnelVBA.VBATunnel

    bln = SSh.TunnelConnect("UserName", "Password", "Source/RemoteIP Address",
 "LocalPort No to Forward", "Local Port IP", "Remote Port to map")

    If bln Then

        'Port forwarded succesfully.

    Else

        'Port forwarded error.

    End If

    bln = SSh.TunnelDisConnect

End Sub

5. And finally the tunnel can be disconneted via following command again returning boolean.

I hope you would find this solution very useful as it does its job very well and takes away the complexity of PUTTY shell commands via VBA and unreliability making our VBA code more robust, clean and secure.

Download Solution
Download solution


Refrences:
Tunneling : http://en.wikipedia.org/wiki/Tunneling_protocol
Port Forwarding : http://en.wikipedia.org/wiki/Port_forwarding
Putty : http://www.chiark.greenend.org.uk/~sgtatham/putty/
Tamir GAL SSH.NET : http://www.tamirgal.com/blog/page/SharpSSH.aspx