Sunday, January 8, 2012

Matching a string from a variable in regular expressions

I had to hand in assignment in perl in which I had match a few thousand genes to a (very very) long list of gene names.

Perl being the language it is, I accomplished the job using regex (regular expressions).
Perl may have many faults but its a good language for gene matching because it has really great regular expression support.

So I ran the program and it started going over the file looking for matches when it crashed in the middle because the gene name I was matching was not balanced in terms of (. I made a specific fix only to find out it crashed again because the name had *.

These and other characters interfere with regex operation, so I thought I'd write a function to "fix" the string for regular expression (I planned on adding \ before any special character). However, I'd thought I'd first check if there isn't a built in function and here I found quotemeta which does exactly that.

$regex = quotemeta( $regex );
if( $string =~ m/$regex/ ) { ... }

You can take a string, send it to quotemeta and get a fixed string ready to be matched in regular expression.

If you don't want to create another variable you can also use the \Q...\E directive which basically has the same meaning:

if( $string =~ m/\Q$regex\E/ ) { ... }

No comments:

Post a Comment