vim regex

Ever wonder just how useful vim REALLY is? Well, the substitution command in VIM makes it really useful, it supports all sorts of regex, and makes editing very easy and fast assuming that the user knows what he/she is doing. Vim is one of those tools that takes more thinking than doing at the beginning. For example it took me 30 minutes to surround lines from 24 to 35 with =>” and “.  To give you a better picture of my problem look at the below example.

line containing contents foo bar will become =>”foo bar” for arbitrary number of lines.

There are several ways of doing this. One of the ways is by using the Visual block mode to select lines 24 to 35 and press I to insert =>” at the beginning of the lines. Then again select the lines and press A to append to the end of the lines. But I didn’t think this was a satisfactory solution to my problem. What if there were 50 lines that needs to be encapsulated instead of 11 lines? (visual block selection can be a pain sometimes)

So we can use subsitution command :

:24,35s/^.*$/=>"&"/

to solve our problem.

What the command says is that from lines 24 to 35 we are going to substitute the from beginning of the line to the end of the line (denote this as X), with the X but with =>” before it and after it. The & symbol represents the contents which we just matched (in this case it is X).

Here is another useful substitution command you might use

:23s/[a-z]*_*[a-z]*\s*/=>"&",\r/g

The above command takes a line (i.e line 23) of multiple words for example:

23: foo bar lulz cat

to

=>"foo",

=>"bar",

=>"lulz",

=>"cats",

this is specifically useful if you want to construct a large php array of multiple keys and values. What I do is I first type out the words that will be the values then use the above command, then type in the corresponding key for each value. e.g.

array(

1=>"foo",

2=>"bar",

3=>"lulz",

4=>"cats"

);

I hope the above will help you constructing large php arrays as it did for me.

Advertisements

Leave a comment

HBase

Over the weekend I was fooling around with HBase. HBase is a distributed database, you would have one master node (name node) set up and multiple regional nodes set up to offer high availability, and performance; the downside is that HBase is not a relational database (that means no foreign keys and the other nice features that comes databases like mySQL). Hbase has its own unique way of storing data that is not too different from google’s BigTable. To understand how data is stored in HBase one can refer to http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture. (Although this is kind of out of date, you can still get a general picture).

HBase requires HDFS to be set up, the instruction for the whole process can be found at http://wiki.apache.org/hadoop/QuickStart.

HBase came with several interfaces: shell, restful, thrift and java. Below I will describe my experiences with some of the interfaces and my opinion on others.

Shell

HBase provides a JRUBY IRB-based shell that is very similar to the standard SQL shell. Starting the shell is simple, go to the bin directory of your Hbase, and do ./hbase shell. The shell is really well designed and allows a user to do table and row operations with minimal effort. To create a table in the shell one would only have to do:

create 'table1','f1','f2','f3'

The above command will create a table with the name ‘table1’ and with the column families ‘f1’, ‘f2’, and ‘f3’.

To insert data into hbase

put 'table1','row1','f1:col9', 'value', 'timestamp'

The above command will create an row with the key ‘row1’ in the table ‘table1’. ‘f1:col9’ is the specific column which the data “value” is to be put.  As you can see this is a lot easier than the standard sql statements to create tables and insert data.

Pro: Easy to use

Cons: None

For a more detailed guide on the HBase shell visit http://wiki.apache.org/hadoop/Hbase/Shell.

Rest

The restful interface that HBase provides is quite nice in theory though it is still in alpha development stage. The HBase people calls the restful interface the “Stargate” interface. The Stargate interface runs on the Jetty server by default, but they provide a .war that allows the user to switch from Jetty into the apache tomcat server. There are a lot of discussions on tomcat vs Jetty online. The main lesson I got out of those discussions was that Jetty is more light weight than tomcat, but tomcat provides performance advantages (One would need to use JProfiler to confirm this). A lot of people tend to prefer restful interfaces because they are so easy to access and use, in fact many of the hbase adapter libraries will use restful interface bindings to communicate with HBase.

If you want the current restful docs specific to your version you’d have to go to http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/stargate/package-summary.html#package_description

replace the 0.20.6 with the version of your HBase. (to find the version of your hbase, simply go to the shell and type ‘version’).

Stargate can accept and return three types of data (depending on the header): xml, json, and protobufs(binary). It is good to mention that xml is the most supported data type. Another thing about the restful interface is that the columns, rows, and data are encoded in base64, so for every get and put you have to know what to encode and decode appropriately.

There are tons of posts on the HBase mailing list which talks about the (the lack of) performance that the restful interface provides. Some say that the restful interface is suitable for services which have many clients posting and reading data at a low rate.

In general I am not too happy with the restful interface that HBase provides.

Pro: Ease of access

Cons: Everything else

Thrift

I admit that I haven’t used thrift with HBase and some says that the performance is terrific. I just didn’t like the fact that I’d have to install 64mb of C++ libraries just to communicate with HBase.

Pros: Performance

Cons: Client needs to install thrift

Java

If you are doing development in Java, then kudos to you, this is probably the interface that you want to use. It has the best performance compared to all other interfaces and is the most supported interface.

Pros: Speed, community support

Cons: Java only

Thanks for reading my long rambling, the material of the posts  are only of my opinion, experience and knowledge, please correct me if I am wrong.

Leave a comment

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

1 Comment