samedi 9 mai 2015

regex - camel case that doesnt contain consecutive uppercase letters

Consider the following description of identifiers: "Identifiers are alphanumeric, but must start with a lowercase letter and may not contain consecutive uppercase letters." Write a DFA that accepts these identifiers.

this is my regex that i will use to translate to the DFA but i dont think its correct

[a-z].( ([a-z0-9] | [a-z0-9][A-Z])* | ([a-z0-9] | [A-Z][a-z0-9])* )

Social security Regular Expression validator in asp.net

i want to use regular expression to validate SSN with dashes. This is the format i would like to see: 000-00-0000. This expression does not work for me for some reason.

  <asp:RegularExpressionValidator runat="server" Display="Dynamic" ValidationExpression="^(\\d{3}-\\d{2}-\\d{4}|\\d{9})$" ControlToValidate="txtSSN" ForeColor="Red" />

How to properly use Pattern in Java with a String of numbers

I'm trying to use Pattern in Java and I'm having a problem.

I want to find from a String[] that contains numbers (that I retrieved from a file) if they match another string of numbers (also from a file).

E.g. I have the number 1 and I want to search in a String like this: 5 10 1 8 6 3 -1 10 8 10 10 4 10 -1 10 10 10 10 9 10 -1 10 10 10 10 10 10 -1 10 10 10 10 10 10 -1 -2.

Obviously the -1 and 10 are not what I'm locking for. Is there a way to solve this?

I can't use use Pattern.compile() with an integer.

Regular expression to split by forward slash

I have a parse tree which includes some information. To extract the information that I need, I am using a code which splits the string based on forward slash (/), but that is not a perfect code. I explain more details here:

I had used this code in another project earlier and that worked perfectly. But now the parse trees of my new dataset are more complicated and the code makes wrong decisions sometimes.

The parse tree is something like this:

(TOP~did~1~1 (S~did~2~2 (NPB~I~1~1 I/PRP ) (VP~did~3~1 did/VBD not/RB (VP~read~2~1 read/VB (NPB~article~2~2 the/DT article/NN ./PUNC. ) ) ) ) ) 

As you see, the leaves of the tree are the words right before the forward slashes. To get these words, I have used this code before:

parse_tree.split("/");

But now, in my new data, I see instances like these:

1) (TOP Source/NN http://ift.tt/1cuAduD ./. )

where there are multiple slashes due to website addresses (In this case, only the last slash is the separator of the word).

2) (NPB~sister~2~2 Your/PRP$ sister/NN //PUNC: )

Where the slash is a word itself.

Could you please help me to replace my current simple regular expression with an expression which can manage these cases?

To summarize what I need, I would say that I need a regular expression which can split based on forward slash, but it must be able to manage two exceptions: 1) if there is a website address, it must split based on the last slash. 2) If there are two consecutive slashes, it must split based on the second split (and the first slash must NOT be considered as a separator, it is a WORD).

Javascript strip Latex form of English letter variants

How can I replace Latex characters using {\'<1 alpha>} pattern with corresponding English letter?

For example

L{\'o}pez

Should change to

Lopez

It should not affect any other character out of {\'<1 alpha>} pattern. It should be greedy as well since there might be several characters required to be pruned.

Transform characters; like A to B, B to C,

I want to have a short and easy way to replace the character A to B, B to C, Z to A,... in PHP.

I already have tried this:

$pwd = "Abc";
for($char = ord('A'); $char <= ord('Z'); $char++) {
  $newc = $char+1;
  if($newc > 90)
    $newc = 65;
  $pwd = str_replace(chr($char), chr($newc), $pwd);
  $pwd = str_replace(chr($char+32), chr($newc+32), $pwd);
}
echo $pwd;

But when I use it I only get "Aaa"... :(

I can't find anything on the internet. Could you maybe help me?

Thanks in advance, Luuc

regular expression: may or may not contain a string

I want to match a floating number that might be in the form of 0.1234567 or 1.23e-5 Here is my python code:

import re
def main():
    m2 = re.findall(r'\d{1,4}:[-+]?\d+\.\d+(e-\d+)?', '1:0.00003 3:0.123456 8:-0.12345')
    for svs_elem in m2:
         print svs_elem

main()

It prints blank... Based on my test, the problem was in (e-\d+)? part. Thank you!

How to do URL matching regex for routing framework?

I already have a routing method that matches this pattern:

/hello/:name

that set name to be a dynamic path, I want to know how to make it:

/hello/{name}    

with the same regex. How to add optional trailing slash to it, like this?

/hello/:name(/)

or

/hello/{name}(/)

This is the regex I use for /hello/:name

@^/hello/([a-zA-Z0-9\-\_]+)$@D

The regex is auto generated from PHP class

private function getRegex($pattern){
        $patternAsRegex = "@^" . preg_replace('/\\\:[a-zA-Z0-9\_\-]+/', '([a-zA-Z0-9\-\_]+)', preg_quote($pattern)) . "$@D";
        return $patternAsRegex;
    }

If the route is /hello/:name(/) I want it to make the match with optional thing else continue normal

Lua: String.match/string.gsub - Casing for true/false

I've been trying to figure this out for a while, but I fear I'm not seeing the entire solution quickly, and now I'm needing a fresh set of eyes to accomplish what I need.

I have a very particular script for the MUD I play to help me differentiate between MOBs and players when in a room. The script itself works, but now I want to add a new element that will check if my group mates are in the same room. This is what I have so far:

function strends(s)
  if s:match("%u%w+ is here%.") or s:match("%u%w+ is fighting .-%.") or s:match("%u%w+ is sleeping here%.") then
    return true
  else
    return false
  end
end

That's working great - it checks if an upper case name is in the room and returns information as requested.

I have a table of my group mates, though I may find it easier to do it as a string and do string.find. The problem I'm running into is casing it for each of the scenarios:

  1. Return true if there are players outside my group in the room.
  2. Return false if it's only players outside my group.
  3. Return false if there is no one in the room aside from myself.

In scenario one, it MUST return true, even if there are people in my group as well as people outside my group. But my Lua knowledge isn't expansive enough that I can work out the problem. The reason for the non-beginning string.matches is because it's possible for the particular line to have xx amount of characters before it. How should I approach this, or what should I be doing in order to accomplish my goal?

join rows in CSV with different sized sections python

I have a csv file structered like this:

|     publish_date     |sentence_number|character_count|    sentence       |
----------------------------------------------------------------------------
|          1           |               |               |                   |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |      -1       |       0       | Sentence 1 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       0       |      14       | Sentence 2 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       1       |      28       | "Sentence 3 here. |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       2       |      42       | Sentence 4 here." |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       3       |      56       | Sentence 5 here.  |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------
|          2           |               |               |                   |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |      -1       |       0       | Sentence 1 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       0       |      14       | Sentence 2 here.  |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------

What I'd like to do is combine each block of sentences into paragraphs to output individual paragraphs:

["Sentence 1 here.", "Sentence 2 here.", ""Sentence 3 here.", "Sentence 4 here."", "Sentence 5 here."]

Some sentences are quotes which continue into a new sentence, whilst others are entirely embedded within a sentence.

So far I've got this:

def read_file():

    file = open('test.csv', "rU")
    reader = csv.reader(file)
    included_cols = [3]

    for row in reader:
        content = list(row[i] for i in included_cols)

        print content    
    return content

read_file()

But this just outputs a list of sentences like so:

['Sentence 1 here.']
['Sentence 2 here.']

Any suggestions appreciated.

Why is this python regular expression matching only one character not the whole words?

I am trying to extract the title of this text, which is all in uppercase. I want to avoid a long dashed sequence and some acronyms like NOM-059-SEMARNAT 2010, of which may be some other ones to exclude. So I did a regex in python for a findall (with library re under python 2.'7.7, in spyder, windows8.1):

(?!(?:[- ]{2,}|NOM\-059\-SEMARNAT))([A-Z0-9ÁÉÍÓÚÑ:;¿\?\(\)\-\+\. ,]{10,})

A sample of the summaries document with this pattern I am scanning is this:

--------------------------------------------- Congreso Mexicano
RELACIÓN ENTRE EL TAMAÑO DEL FOROFITO Y LA RIQUEZA DE EPÍFITAS EN LOS PANTANOS DE CENTLA, TABASCO Dwers Aasrd Jxcxéas Lóasd1*, Rasdé de Jawdúz Rasdw Vasde1 Instituto de Ciencias Biologicas, Universidad de Ciencias y Artes de Chiapas awdsd.w@hlksajk.com Las plantas epífitas son poco comunes en manglares, no epífitas y las características de los forofitos de Rhizophora mangle, especie amenazada de acuerdo a la NOM-059-SEMARNAT 2010; en áreas conservadas de la reserva Pantanos de Centla, al noroeste de Tabasco. Se evaluó la relación entre La riqueza de epífitas estuvo significativamente relacionada con la cobertura de raíz y DAP de los forofitos. Las zonas I y III de los forofitos fueron las más similares y compartieron 47% del total de las especies. La zona I, que son las Palabras clave: Epífitas vasculares, distribución vertical, composición, Rhizophora mangle, raíces aéreas. ID: 96 lunes, 20 de abril de 2015, 3:30:00 PM, Sala: 8 Eje temático: Ecología de Comunidades


re.sub() negative look behind + negative look ahead

Remove every occurence of ' from a string except when it is found before an 's or after an s'. with the exception being if it encapsulated the whole word it should be removed.

Example:

Andrea's -Stays as is
Kids' - stays as is
'Kids' --> Kids
Ki'd's' --> Kids

WHat I came up with so far :

\'(?!s ) 

this matches the first example and ignores it.

here is it working

I have a problem with the rest

How to grep a unknown distance line above a pattern

Sorry to edit the question again. I found that I didn't ask my question clearly before.
I asked a question yesterday but I found another problem today /.\
Here is my file:

Time 00:00:01
kkk
lll
ccc
aaa: 88
...
Time 00:00:03
jjj
kkk
lll
ccc
aaa: 89
ooo
bbb
aaa
kkk
lll
ccc
aaa: 90
...
Time 00:00:04
kkk
lll
...

Here is the output I want:

Time 00:00:01
kkk
lll
ccc
aaa: 88
Time 00:00:03
kkk
lll
ccc
aaa: 89
Time 00:00:03
kkk
lll
ccc
aaa: 90

Last time I was looking for one line and the other line above it. This time I am looking for a pattern with multiple lines:

kkk
lll
ccc
aaa: /any thing here/

and a line

Time /any thing here/

From the question I asked yesterday, I tried

awk '/Time/{a=$0}/kkk\nlll\nccc\naaa/{print a"\n"$0}' file

and

perl -ane '$t=$_ if /Time/; print $t,$_ if /kkk\nlll\nccc\naaa/' test2

and

pcregrep.exe -M 'kkk.*(\n|.)lll.*(\n|.)ccc.*(\n|.)*aaa' test2

from this but they are not working or the output is not what I want.

I found a thread like this which is talking about state machine but it is complex since I have several lines to match.

Any suggestion that can solve this problem easily?

Regular expression to extract word

given a String like this: "today is a nice day".

Is it possible to extract only the words "day" from this String?

Using [^day] works letter-wise, hence not what I want.

How to str_replace Google News RSS for Facebook Share?

Hi I'm using simpleXML to display a news.google.com feed.

The displayed entries link to the original article in this way:

http://ift.tt/1dTSQsr

I need the entries to link to this instead: http://ift.tt/1KujlPl

The reason is that Facebook Sharer cannot interpret the following link:

http://ift.tt/1dTSQst

Facebook Sharer needs it to look like this:

http://ift.tt/1dTSQsv

Is there a way that I can use regex (str_replace or preg_match) to remove the Google redirect URL so that social sharing sites can recognize the link?

The Google redirect URL is dynamic and so it will be slightly different each time and so I will need something that can replace each variant.

My working, functional code:

    $feed = file_get_contents("http://ift.tt/1dTSQIN");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  $date = $entry->pubDate; 
  $date = strftime("%m/%d/%y %I:%M:%S%P", strtotime($date));
  $desc = $entry->description;
  $desc = str_replace("and more&nbsp;&raquo;", "","$desc");
  $desc = str_replace("font-size:85%", "font-size:100%","$desc");
  ?>
  <div class="item"></div>
  <?php echo $desc; ?>
  <div class="date">
  <?php echo $date; ?></div>
  <?php } ?>
 $desc = $entry->description;
 $date = $entry->pubDate; 
 $date = strftime("%A, %m/%d/%Y, %H:%M:%S", strtotime($date));
 $desc = str_replace("and more »","x","and more »");
  echo $date; 
  echo $desc;
  }

I'm using $desc to display the link instead of $link, but URL to the article with the Google redirectURL is still in $link if you would like to str_replace or preg_match $link instead of $desc

Link to working Google News feed below: http://ift.tt/1dTSQIN

If you know how to fix this you're a hero. Thank you Overflowers

Complete htaccess solution for modern URL rewriting (language in path, allow parameters, convert old links, etc.)

I'm currently optimizing an old website. For the sake for SEO and a modern user experience I'd like to convert all links using htaccess. I thought it's going to be an easy task as a lot of users already did this but I've only found very short examples covering only a small part of the whole task. Trying to combine some parts and extenting them myself I got really frustrated. I hope you can help me out...

Given (current links are in the following form):

/index.php?page=contact
/index.php?page=contact&lang=en
/index.php?lang=de&page=contact&sth=else&val=1#section
/?page=contact

Desired new URL form:

/en/contact
/de/contact?sth=else&val=1


What I came up with so far:

RewriteEngine On
RewriteBase /

# Remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]

# Redirect only if no language provided
# Test for known languages and fall back to English
RewriteCond %{HTTP:Accept-Language} ^de [NC]
RewriteRule ^$ /de/ [L,R=301]
RewriteCond %{HTTP:Accept-Language} ^fr [NC]
RewriteRule ^$ /fr/ [L,R=301]
RewriteRule ^$ /en/ [L,R=301]

# If provided language is unknown redirect to English; 
# The following doesn't work. May it's better to do this in index.php?
#RewriteCond %{REQUEST_URI} !^/(en|de|fr)/?
#RewriteRule .* /en/ [R,NC,QSA]

# Load default start page if no other assigned
RewriteRule ^([a-z]{2})$ $1/home [R,L]

# Language and page assigned
RewriteRule ^([a-z]{2})/([a-z0-9_-]+)$ index.php?lang=$1&page=$2 [L,NC]


This seems to work quite well already. However, as soon as I've tried to add the following requirements as well, I only broke it.

1.Calling a page in the old format needs to be transformed to the new format so old/external links aren't broken. Whatever I've tried I only created endless loops.

/index.php?page=contact -> /en/contact
/index.php?page=contact&lang=de#section -> /de/contact#section
/?lang=fr -> /fr/home

2.If there's no language or an unknown language, it needs to fall back to the preferred browser language (if supported, like de or fr) or English.

/xx/contact -> /en/contact
/contact?val=1#section -> /en/contact?val=1#section

3.I'm not sure what to do with image/css/JS paths. Most of them are currently implemented in HTML as src="images/i.jpg". Is it better to convert them all to src="/images/i.jpg" or to add another htaccess rule so that a path like /en/contact/images/i.jpg gets the image correctly?

I hope you can help me out on this. Also I believe this is a common case so the result might be a complete solution for other people as well. Thank you.

Regular expression for getting first and last name ignoring middle names

I'm searching a regular expression that could give me the first and last name in a string that is a complete name.

I searched but I didn't find one that fit my needs. For instance:

  • Abc Def Ghi Jkl ---> Abc Jkl
  • Aéc Def Gài Mkl ---> Aéc Mkl
  • Aéc-Def Gài Mkl ---> Aéc-Def Mkl
  • Aéc Def Gài-Mkl ---> Aéc Gài-Mkl
  • Afd ---> Afd

How can I build a regex to return me what is on the right side when the string is what is on the left?

RegEx for preg_match

RegEx for http://example.com/N/ where N is a number (0-9+)

The case is as follows:

if ( !isset($_SERVER['HTTP_REFERER']) || $_SERVER['HTTP_REFERER'] != 'http://example.com/N/' )

Is it possible to combine REPLACE with LIKE to replace multiple values in oracle database column

This is similar to the question here but instead of replacing a single value I want to replace multiple values with matching pattern.

--create table
create table my_table (column1 varchar2(10));

--load table
insert into my_table values ('Test1');
insert into my_table values ('Test2');
insert into my_table values ('Test3');
insert into my_table values ('Test4');
insert into my_table values ('Test5');
insert into my_table values ('Lesson');



--this query replaces 'Test1' with blank
select replace(column1, 'Test1', ' ') from my_table;

--now i want to replace all matching values with blank but i get an error
select replace(column1, like '%Test%', ' ') from my_table; --this throws below error.


--ORA-00936: missing expression
--00936. 00000 -  "missing expression"
--*Cause:    
--*Action:
--Error at Line: 19 Column: 25

Running Oracle Database 11g Enterprise Edition Release 11.2.0.1.0

file_get_contents() and cyrillic

I am trying to get the source of a website using file_get_contents() function, then find something there using regex, and display it on the screen.

The problem is the data I want to extract is cyrillic, and when I look at the output, it is just some strange characters:

[6]=> array(1) { [0]=> string(83) "ĐĄĐ ĐĐĐĐРТĐШĐĐĐĄĐĐĐ ĐĐĐ ĐĐŁĐĐĐ 28 " } }

I also tried converting the encoding of the source into UTF-8 (all my files are utf-8), but I have a lot of those websites, and each of them could have different encodings.

$source = @file_get_contents($url, false, $context);
$source = iconv(mb_detect_encoding($source), 'UTF-8', $source);

This is what I tried, but it doesn't work.

Setting source encoding manually in iconv just changes the characters, but it's still not cyrillic.

How can I solve this?

Complex requirements for string split around select commas

TL;DR

I need some help making a regex that will match any commas in a string that are side by side with unlimited white space around them and between them. The commas and their surrounding white space cannot be within matching single quotes or double quotes. I then need to capture the non-whitespace values from around those commas and count how many of those commas there are.

The values captured from around the commas will become their own values in the final array, while the commas that were counted will become nil values that are added to the final array.

Explanation of the problem:

This is a pretty complex problem so any help is greatly appreciated. I'm adding functionality to a library I've been using for a while now. I have this string that contains an array

"['d,og,f:asdf,:hello,",,\",,alsee',,,'ho,la', "-123,4,5.3", true,   :good, false,,, "gr\'\'\'true,\',\'ee\"n", ":::testme", true]"

I would like to split this string only around select commas so that I have an array containing the following values

'd,og,f:asdf,:hello,",,\",,alsee'
nil
nil
'ho,la'
"-123,4,5.3"
true
:good
false
nil
nil
"gr\'\'\'true,\',\'ee\"n"
":::testme"
true

Then nil values are coming from the side by side commas that are not contained in any string. I wrote the following regex to split the string above (I already got rid of the start and end brackets):

/(?<=(?:['\"]|false|true|^|,)),(?=(?:\s*(?:(?::[\w]+)|(?:(?::?(?:\"[\s\S]*\")|(?:'[\s\S]*'))|(?:false|true)))\s*(?:,|$)))/

This splits the string so I get these values:

(0) "'d,og,f:asdf,:hello,",,\",,alsee',,"
(1) "'ho,la'"
(2) " "-123,4,5.3""
(3) " true"
(4) "   :good, false,,"
(5) " "gr\'\'\'true,\',\'ee\"n""
(6) " ":::testme""
(7) " true"

All the values are strings as can be seen by their surrounding double quotes. They will not all end up that way though. A true or false will be converted to a boolean. The values surrounded by internal quotes will end up as strings. Then a value preceded with a : will end up as a symbol.

There are problems with the values at index 0 and 4. Index 0 should be this:

(0.0) "'d,og,f:asdf,:hello,",,\",,alsee'"
(0.1) nil
(0.2) nil

As you can see, the two commas at the end are gone. They have become the two nil values you see above. Then the string starts at the first single quote and ends at the last single quote, signifying that this value in the array is a string.

Then index 4 (" :good, false,,") should be this:

(4.0) "   :good"
(4.1) " false"
(4.2) nil
(4.3) nil

The two commas at the end have become nil. Then " false" is it's own value which will later be converted to a boolean, while " :good" is also it's own value and will later be converted to a symbol.

To fix the problem with index 4 I have all the values run through a second regex. Here it is:

/^(\s*:(?:(?:[\w]+|\"[\s\S]+\"|'[\s\S]+')\s*)),([\s\S]*)$/

Instead of splitting this one I get the capture groups. It ends up returning this array for the value at index 4:

(4.0) "   :good"
(4.1) " false,,"

That's what I wanted except for one problem. The value at index 4.1 (" false,,") has the two trailing commas which should be nil values in the array.

I need some help making a regex that will match any commas in a string that are side by side with unlimited white space around them and between them. The commas and their surrounding white space cannot be within matching single quotes or double quotes. I then need to capture the non-whitespace values from around those commas and count how many of those commas there are.

The values captured from around the commas will become their own values in the final array, while the commas that were counted will become nil values that are added to the final array.

Comparing two tab delimited files and print matched rows

I have two big tabular files, file1 and file 2. I want to compare these two files and print matched rows as showed below.

file_1
ENSDARG00000000760
ENSDARG00000001015
ENSDARG00000001549
ENSDARG00000002445
ENSDARG00000003102
ENSDARG00000004594
ENSDARG00000004851

file_2
ENSDARG00000000151 ENSDART00000000160 2292 chovy.60083
ENSDARG00000000151 ENSDART00000151127 1470 chovy.60083
ENSDARG00000000175 ENSDART00000146636 1832 chovy.300567
ENSDARG00000000966 ENSDART00000001092 6325 chovy.254634
ENSDARG00000000966 ENSDART00000140618 6295 chovy.254634
ENSDARG00000001015 ENSDART00000001148 1791 chovy.388956
ENSDARG00000001015 ENSDART00000104891 1835 chovy.388956
ENSDARG00000001015 ENSDART00000141913 994 chovy.283553

my desired output:

ENSDARG00000001015 ENSDART00000001148 1791 chovy.388956
ENSDARG00000001015 ENSDART00000104891 1835 chovy.388956
ENSDARG00000001015 ENSDART00000141913 994 chovy.283553

my code:

grep -wFf file1.txt file2.txt > output.txt

I think it is not working..

Thank you for all your help!

Grep\Sed between two tags with multiline

I have many files with whom I need to get information.

Example of my files:

first file content:

"test This info i need grep</singleline>"

and

second file content (with two lines):

"test This info=
 i need grep too</singleline>"

in results i need grep this text: from first file - "This info i need grep" and from second file - "This info= i need grep too"

in first file i use:

grep -o 'test .*</singleline>' * | sed -e 's/test \(.*\)<\/singleline>/\1/'

and successfully get "This info i need grep" but I can not get the information from the second file by using the same command. Please help rewrite the command or write what the other. Very big thanks!

Counting conditional and comma operators

I need to count number of occurrences of conditional (ternary) and comma operators in .c file. Problem is the conditional operator can be of any kind (it can be the easiest one and it can be really long compound one). The same applies to comma operator.

I don't absolutely know how to do that. I thought using regex match to find it, but it's probably not the best option. Have you better idea?

regular expression ruby phone number

I'm trying to figure out how to write my own regex.

I made a list of viable phone numbers and non-viable ones and trying to make sure the viable ones are included but I can't figure out how to finish it up.

Allowed list

0665363636 //
06 65 36 36 36 //
06-65-36-36-36 //
+33 6 65 36 36 36

Not allowed

06 65 36 36 //
2336653636 //
+3366536361 //
0065363636 

I messed around with it a bit and I currently have this:

[0+][63][6 \-3][56\ ][\d{1}][\d \-]\d{2}[\d{1} \-]\d\d? ?\-?\d?\d? ?\d?\d?$

This blocks out number 2 and 4 of the non allowed but I can't seem to figure out how to block the other ones out.

Should I put a minimum amount of numbers? If so how would I do this.

Search pages within particular domain that match regular expression?

I have domain eg. www.example.com and I would to get a list of pages that match regular expression eg. http://(?:www\.)?example.com/[a-z\-]+-[0-9]+\?..

regex replace c++ boost

I am just trying to replace if one pattern matches with my string, hence if my string is ABCDH5, it should become ABCDH5m

So the code is written as below

std::string mappedSymbol = boost::regex_replace(tag,from,ruleTo,boost::regex_constants::match_continuous);

here,
tag value is ABCDH5
from value is (ABCD[A-Z][0-9]);
ruleTo value is $1m

But my mappedSymbol is ABCDH5mm.
I am not sure how it is adding extra m to the actual String.

What does the regex "/\\*{2,}/" mean? [duplicate]

This question already has an answer here:

I'm kinda new to regex, and specifically, I don't understand there are 2 backslashes? I mean, I know the second one is to escape the character "*", but what does the first backslash do?

Well I'm passing this regex expression to the php function preg_match(), and I'm trying to find strings that include 2 or more consecutive "*".

Rewrite request to query string

I have these lines in my .htaccess, and it works as I want it to.

RewriteEngine on
RewriteRule ^api$ page.php?id=api

If you go to http://example.com/api, it would rewrite the URL as http://ift.tt/1JWeFVx, but I need this to apply to any alphanumeric input, for instance:

No request:
    http://example.com -> http://ift.tt/1oq2mTb
Anything else:
    http://ift.tt/1GVjyYG -> http://ift.tt/1JWeFoi

Thanks in advance :)

Getting source of the websites using file_get_content

I have a list of a couple of thousands websites. I have to iterate over them, and in each iteration - call file_get_contents of the given url, search for some information from the source using regex, and write it to another file.

Ok, the thing is - it's very, very slow. I divided the whole process into searching for about 50 urls each time I refresh the page. But:

  • I'd have to refresh the page until I get to a couple of thousand
  • even with only 50 urls, I get 30sec time exceeded

Is there a way to speed this up?

Picking up field value using Pythong regex

This is an example of two lines in a file that I am trying to pick up information from.

...
{ "SubtitleSettings_REPOSITORY", FieldType_STRING, (int32_t)REPOSITORY},
{ "PREFERRED_SUBTITLE_LANGUAGE", FieldType_STRING,SUBTITLE_LANGUAGE},
...

What I want to do is to find out the 3rd field of this weird data structure for the given string to match to 1st field, i.e.

SubtitleSettings_REPOSITORY => REPOSITORY
PREFERRED_SUBTITLE_LANGUAGE => SUBTITLE_LANGUAGE

The regx in my Python code can only handles the second line, but not cope with the first line. How I can improve it?

import re
...
#field is given a value in previous code, can be "SubtitleSettings_REPOSITORY", or "PREFERRED_SUBTITLE_LANGUAGE"
match = re.search(field+'"[, \t]+(\w+)[, \t]+(\w+)', src_file.read(), re.M|re.I)
return_value = match.group(2)

Implementing the equivalent of c++ cin in javascript?

I am trying to design a small javascript interpreter that interpret the print and read statement. (equivalent of cout and cin in c++).

I have implemented the print statement but now I am having difficulties in implementing the read statement. The problem is that even I have a read statement the interpreter finished with all the print statement and than gives me the opportunity to enter the input text.

So what I think is that i should manage to events. -test button click and -ENTER key click.

I am trying to manage read statement like this:

 var read = lines[i].match(rxRead);            
        if(read !==null){//means there is a read statement
        s += "$('#output').focus();";              
        s+= "myRead=function(ab){
            var keycode = (ab.keyCode ? ab.keyCode : ab.which);
            if(keycode == '13'){
            read_lines = $('#output').val().split('\\n');
            alert(read_lines[read_lines.length-1]);
           }};";

       s+= "$('#output').keypress(function (){ 
            myRead(event);});";

       continue;} 

But this is not solving the problem. Can you give me any hint or any idea how to do this? Where can I start? I have been stuck on this for hours I am I not findin what logic to follow?

Here is my DEMO. Thanks in advance

how to use filename-regex option in int-ftp:outbound-gateway?

firstly thanks for attention
i defined ftp outbound adapter with ls command and Recursive mode in my spring integration project, i want to filter and get files in specified sub-Directories, server directory structure is:
root
----------a\
---------------in\
---------------------a.op
----------b\
---------------in\
---------------------b.op
i want to get a.op and b.op files, i set filename-regex option to ([a-z]|[in]|.*\.op) but not worked correctly and only first level directory filtered, and my adapter code is:

 <int-ftp:outbound-gateway id="gatewayLS"
                              session-factory="ftpSessionFactory"
                              request-channel="inbound"
                              command="ls"
                              filename-regex="([a-z]|[in]|.*\.op)"
                              command-options="-R"
                              expression="payload"
                              reply-channel="toSplitter"/>

how to solved it? thanks.

Matching across a line vs matching words regex

Why is it that when I match across new lines it would seem that I can't identify individual words. For example:

content = "COAL_STORIES
AUSTRALIA - blah blah blah
BOTSWANA – blah blah blah 

URANIUM_STORIES 
AUSTRALIA – blah
INDIA - blah

COPPER_STORIES
AUSTRALIA - blah blah blah
AUSTRALIA - blah blah blah
CHINA - blah blah blah

ALUMINIUM_STORIES"




sections = content.scan(/\w.*_.*\b/)

Give and array:

[
    [0] "COAL_STORIES",
    [1] "URANIUM_STORIES",
    [2] "COPPER_STORIES",
    [3] "ALUMINIUM_STORIES"
]

But if I try that using the 'm' flag everything gets matched:

sections = content.scan(/\w.*_.*\b/m) gives an array:

[
    [0] "COAL_STORIES\nAUSTRALIA - blah blah blah\nBOTSWANA – blah blah blah \n\nURANIUM_STORIES \nAUSTRALIA – blah\nINDIA - blah\n\nCOPPER_STORIES\nAUSTRALIA - blah blah blah\nAUSTRALIA - blah blah blah\nCHINA - blah blah blah\n\nALUMINIUM_STORIES"
]

As far as I can tell I'm still looking for the same word boundaries?

Use regular expression to extract attribute value for custom tag

Thanks for taking a look at this. I'm using PHP. I have a string like so:

[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don't so much dance as rhythmically convulse.[/QUOTE]

And I want to pull out the values in the quotes and create an associative array like so:

["name" => "Max-Fischer", "post" => "486662533", "member" => "123"]

Then, I would like to remove the opening and closing [QUOTE] tags and replace them with custom HTML like so:

<blockquote><a href="URL_I_WILL_GENERATE_FROM_THE_ARRAY_VALUES">Max-Fischer</a> wrote: I don't so much dance as rhythmically convulse.</blockquote>

So the main problem is creating the preg_match() or preg_replace() to handle first: grabbing the values out in an array, and second: removing the tags and replacing them with my custom content. I can figure out how to use the array to create the custom HTML, I just can't figure how to use regular expressions well enough to achieve it.

I tried a match like this to get the attribute values:

/(\S+)=[\"\']?((?:.(?![\"\']?\s+(?:\S+)=|[>\"\']))+.)[\"\']?/

But this only returns:

[QUOTE

And that's not even addressing how to put the values (if I can get them) into an array.

Thanks in advance for your time.

Cheers.

error in runing a new application in android studio 1.1.0

I use android studio with API 15 my fault I import a new project from eclipse and when i run make error so i delete this project and i create a new application but when i run make this error:

Information:Gradle tasks [:app:generateDebugSources, :app:generateDebugAndroidTestSources] :app:preBuild UP-TO-DATE :app:preDebugBuild UP-TO-DATE :app:checkDebugManifest :app:preReleaseBuild UP-TO-DATE :app:prepareComAndroidSupportAppcompatV72211Library UP-TO-DATE :app:prepareComAndroidSupportSupportV42211Library UP-TO-DATE :app:prepareDebugDependencies :app:compileDebugAidl UP-TO-DATE :app:compileDebugRenderscript :app:generateDebugBuildConfig UP-TO-DATE :app:generateDebugAssets UP-TO-DATE :app:mergeDebugAssets UP-TO-DATE :app:generateDebugResValues UP-TO-DATE :app:generateDebugResources :app:mergeDebugResources UP-TO-DATE :app:processDebugManifest :app:processDebugResources C:\Users\nahla\AndroidStudioProjects\MyApplication2\app\src\main\res\values\values.xml Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Display3'. C:\Users\nahla\AndroidStudioProjects\MyApplication2\app\build\intermediates\exploded-aar\com.android.support\appcompat-v7\22.1.1\res\values-v21\values.xml Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Body1'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Body2'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Button'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Caption'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Display1'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Display2'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Display3'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Display4'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Headline'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Inverse'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Large'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Large.Inverse'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.PopupMenu.Large'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.PopupMenu.Small'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Medium'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Medium.Inverse'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Menu'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.SearchResult.Subtitle'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.SearchResult.Title'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Small'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Small.Inverse'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Subhead'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Title'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Menu'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Subtitle'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Subtitle.Inverse'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Title'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Title.Inverse'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionMode.Subtitle'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionMode.Title'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.PopupMenu.Large'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.PopupMenu.Small'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Button'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.TextView.SpinnerItem'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Subtitle'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:TextAppearance.Material.Widget.ActionBar.Title'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ActionBar.TabText'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ActionBar.TabView'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ActionButton'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ActionButton.CloseMode'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ActionButton.Overflow'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.AutoCompleteTextView'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Button'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Button.Borderless'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Button.Borderless.Colored'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Button.Small'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ButtonBar'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.CompoundButton.CheckBox'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.CompoundButton.RadioButton'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.DropDownItem.Spinner'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Light.ActionBar.TabText'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Light.ActionBar.TabText'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Light.ActionBar.TabView'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Light.PopupMenu'. Error:(211, 21) No resource found that matches the given name: attr 'android:overlapAnchor'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ListPopupWindow'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ListView'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ListView.DropDown'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.PopupMenu'. Error:(211, 21) No resource found that matches the given name: attr 'android:overlapAnchor'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ProgressBar'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.ProgressBar.Horizontal'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.RatingBar'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Spinner'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Spinner'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Spinner.Underlined'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.TextView.SpinnerItem'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Widget.Material.Toolbar.Button.Navigation'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Theme.Material'. Error:(1) Error retrieving parent for item: No resource found that matches the given name 'android:Theme.Material.Light'. Error:(119, 21) No resource found that matches the given name: attr 'android:colorAccent'. Error:(123, 21) No resource found that matches the given name: attr 'android:colorButtonNormal'. Error:(121, 21) No resource found that matches the given name: attr 'android:colorControlActivated'. Error:(122, 21) No resource found that matches the given name: attr 'android:colorControlHighlight'. Error:(120, 21) No resource found that matches the given name: attr 'android:colorControlNormal'. Error:(117, 21) No resource found that matches the given name: attr 'android:colorPrimary'. Error:(118, 21) No resource found that matches the given name: attr 'android:colorPrimaryDark'. Error:(119, 21) No resource found that matches the given name: attr 'android:colorAccent'. Error:(123, 21) No resource found that matches the given name: attr 'android:colorButtonNormal'. Error:(121, 21) No resource found that matches the given name: attr 'android:colorControlActivated'. Error:(122, 21) No resource found that matches the given name: attr 'android:colorControlHighlight'. Error:(120, 21) No resource found that matches the given name: attr 'android:colorControlNormal'. Error:(117, 21) No resource found that matches the given name: attr 'android:colorPrimary'. Error:(118, 21) No resource found that matches the given name: attr 'android:colorPrimaryDark'. Error:(119, 21) No resource found that matches the given name: attr 'android:colorAccent'. Error:(123, 21) No resource found that matches the given name: attr 'android:colorButtonNormal'. Error:(121, 21) No resource found that matches the given name: attr 'android:colorControlActivated'. Error:(122, 21) No resource found that matches the given name: attr 'android:colorControlHighlight'. Error:(120, 21) No resource found that matches the given name: attr 'android:colorControlNormal'. Error:(117, 21) No resource found that matches the given name: attr 'android:colorPrimary'. Error:(118, 21) No resource found that matches the given name: attr 'android:colorPrimaryDark'. Error:(126, 21) No resource found that matches the given name: attr 'android:windowElevation'. Error:(119, 21) No resource found that matches the given name: attr 'android:colorAccent'. Error:(123, 21) No resource found that matches the given name: attr 'android:colorButtonNormal'. Error:(121, 21) No resource found that matches the given name: attr 'android:colorControlActivated'. Error:(122, 21) No resource found that matches the given name: attr 'android:colorControlHighlight'. Error:(120, 21) No resource found that matches the given name: attr 'android:colorControlNormal'. Error:(117, 21) No resource found that matches the given name: attr 'android:colorPrimary'. Error:(118, 21) No resource found that matches the given name: attr 'android:colorPrimaryDark'. Error:(126, 21) No resource found that matches the given name: attr 'android:windowElevation'. Error:Execution failed for task ':app:processDebugResources'.

com.android.ide.common.process.ProcessException: org.gradle.process.internal.ExecException: Process 'command 'C:\Users\nahla\AppData\Local\Android\sdk\build-tools\21.1.2\aapt.exe'' finished with non-zero exit value 1 Information:BUILD FAILED Information:Total time: 16.92 secs Information:103 errors Information:0 warnings Information:See complete output in console

VBA Get Unique values from RegEx.Execute

How can I filter a RegEx.Execute() to only contain the unique matches?

Currently I have this:

Set allMatches = RE.Execute(text)

And I know I can loop through the elements with:

For i = 0 To allMatches.Count - 1

Next

Remove characters from string based on relative position to specific character, in R

How to remove all characters before "p", and "p" itself, in all strings of v1 as in data frame below.

df1 <- data.frame(v1 = c("m0p1", "m5p30", "m11p20", "m59p60")) 

How to remove all characters after "p" and "p" itself? Thank you

Deny use nginx for defined directory or regex

We have on server both apache and nginx. For static files we use nginx.

On site we have a lot of images, which have parameter in query string e.g

<img src="/images/main/example.jpg?src=http://ift.tt/1u6vnWS" />

for this we use in .htaccess file directive

RewriteCond %{REQUEST_URI} images\/main\/.*
RewriteCond %{QUERY_STRING} from=(.*)$
RewriteRule (.*) %1

But when I see log file from nginx I see a lot of error that file /images/main are missing - in one day 300 Mb of logs.

Could we use nginx directive for deny use files by regex ?

how to extract a specific part from pdf using php

I want to ask about extracting/getting a specific part of words from a PDF using PHP.

For example, I want to get the whole text of abstract part from a journal to input it to my database.

Does anyone know a tool or how to do it?

Complicated Regex with blank lines

I am attempting to parse a regex formula in PowerShell and not having any luck. I've created the Regex and have tested it works on RegExr although when I attempt to execute a match query on it it returns no results.

The Regex is looking for any occurrence of a pattern such as below (including the TWO blank line spaces between the Price and the Address.:

$9,999,999

26 Fake Street, Fake Island, ABC 9999

my regex:\$[\d]{1},[\d]{3},[\d]{3}\n\n\n\d{1}.*?, ([A-Z])\w+ [[\d]{4}

My PowerShell code is as Below:

$Webcontent = Get-Content 'C:\Utilities\Content.txt' -Raw
[regex]::Match($WebContent,'\$[\d]{1},[\d]{3},[\d]{3}\n\n\n\d{1}.*?, ([A-Z])\w+ [[\d]{4}').Groups.Value | Out-File C:\utilities\NewContent.txt

Is it this query possible and also can it return ALL occurrences of this when it finds it?

Match an email address if it contains a dot

I want to match any email address, that contains at least one . (dot) before the @ sign. The emails have already been validated, so the regex just needs to search for the ..

I have tried

Regex emailMatcher = new Regex(@"^[a-zA-Z\.']{1,}\.[a-zA-Z\.']{1,}@example\.com$");

But I know that emails can contain more characters than just a-zA-Z\.' so this won't cover all cases.

Any ideas on how to do it?

Thanks

EDIT: I'm not trying to validate emails, I already have the emails validated, I just need to select emails, that contain . before @ sign

Examples that would pass:

first.last@example.com
first.middle.last@example.com

Examples that should pass, but wouldn't pass using my current regex

first.last(comment)@example.com

RainMeter multiline - RegExp

I'm trying since a copple days to read an entire files with rainmeter.

I'm working with WebParser and RegExp.

My result is something like this

Line1 Line2 Line3

If I make a RegExp like (?m).* I have only the first line, however, if I make something like (?m).*\n.*, I have the two lines...

Fantastic but something I juste have 1 lines and something I can have 5 lines. If I'm writing = (?m).*\n.*\n.*\n.*\n.* and I have for example just 3 lines, Rainmeter doesn't get my lines

Someone have a solution ?

Print subset of file from line number to first match of regex using sed

I would like to print the content of a file starting a particular line number till the first occurrence of a pattern and immediately stop the search & print. I tried this one:

sed -n '2,/{p; :loop n; p; /pattern/q; b loop}'

but without success. How can this be achieved? Thanks for your help.

Python regex split but put end part of regex match back into string?

I'd like to find a regex expression that can break up paragraphs (long strings, no newline characters to worry about) into sentences with the simple rule that an of {., ?, !} followed by a whitespace and then a capital letter should be the end of the sentence (I realize this is not a good rule for real life).

I've got something partly working, but it doesn't quite do the job:

line = 'a b c FFF! D a b a a FFF. gegtat FFF. A'
matchObj = re.split(r'(.*?\sFFF[\.|\?|\!])\s[A-Z]', line)
print (matchObj)

prints

['', 'a b c FFF!', '', ' a b a a FFF. gegtat FFF.', '']

whereas I'd like to get:

['a b c FFF!', 'D a b a a FFF. gegtat FFF.']

So two questions.

  • Why are there empty members ('') in the results?

  • I understand why the D gets cut out from the split result - it's part of the first search. How can I structure my search differently so that the capital letter coming after the punctuation is put back so it can be included with the next sentence? In this case, how can I get D to turn up in the second element of the split result?

I know I could accomplish this with some sort of for-loop just peeling off the first result, adding back the capital letter and then doing it all over again, but this seems not-so-Pythonic. If regex is not the way to go here, is there something that still avoids the for loop?

Thanks for any suggestions.

Regular Expression to select text within bracket (and the bracket)

I'm new at regular expression. I've tried several website to help me build regex but I simply mostly fail even at the simplest scenario of regex.

This time, I just want to build a regex to help me search and replace string on Notepad++. I want to erase all the characters that start with [ and ends with ], and replace them with empty string. So basically I want to find all the characters within brackets, and erase them all.

So if I have this string:

[2015-05-08] ERROR [US Channel Store]: An exception occurred at index.php.

I want it to become:

ERROR : An exception occurred at index.php.

By search the document with the regex and replace with empty string.

My best attempt at making the regex is this: \[.*\]. (The website said that the full regex will actually be /\[.*\]/g)

But on the preview, I find that the regex' result actually select the whole line.

Can someone point where the mistake(s) are? And can you give me the correct regex? A little explanation on my error way of thinking about regex would be very appreciated to help me learn. Thanks!

Python re.sub() Look ahead then look behind

I want to remove all occurrence of ' except when it indicates posession

Example:

>>> 'this' needs to g'o'
this needs to go
>>> even though this' is not right it can stay
even though this' is not right it can stay
>>> Some peoples' 'kids'
some peoples' kids

exceptions:

 >>>Andrea 's b'a'l'l' 'g'o't' 'dropped (I already insert a space between the name and 's)
Andrea 's ball got dropped

Regular expressions get text from html string

I want to get content between td tag's getsometext() from html string

String test = "<table class='test'><tr><th>Text:</th><td>" + 
                                    getsometext().replace("\n", "<br />") + 
                                    "</td></tr><tr><th>Type:</th><td>" +
                                    getsometype();

i used regex with replaceAll

replaceAll("\\<.*?>",""));

but i get this

"Text:new testType:One"

can anyone please suggest me how to get only that text inside of Text i.e with regex?

"new Test"

preg_replace plus append at start and end inside src to replace cid:

I have a HTML string. For the purposes of this lets say the string is:

<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dflkjdslkjdsfldskfjdlfkjdlfksdjfflkdsjfdlkdfdjflkdfjdlkjfkdlfjdljfldjfldjflkdjjfkd<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">hkjhkhkhkhkhkjhjkhhkjhkjhkjhkjhjkhkjhkjhkhkjhkjhjkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjh<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dsjhfdsjfdjflsjflkjdflkjffldskjfdljdlfkjflkdjflkdjfdslkjfkds

Now lets look at the string i need to do some work on, this is what gmail saves the image name as inside src="":

cid:image001.jpg@01D05CBF.CF7A44B0

The class i use downloads and saves the attachment as follows:

$cid = 'cid:image001.jpg@01D05CBF.CF7A44B0'; 
$mail_id . '_' . $cid . '_' . $image_id;

So the actual image name is something like this: 308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg

Now my aim is to replace all of these occurrences:

cid:image001.jpg@01D05CBF.CF7A44B0

with

attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg

essentially strip out the cid: string, append $mail_id and _ to the start of the string and _image001.jpg to end.

keep in mind ill possibly have a bunch of these embedded cid src in the html string

So not been so good with regex i am doing this in baby steps, first i'm trying to figure out how to replace cid:image001.jpg@01D05CBF.CF7A44B0 with attachments/308907_image001.jpg@01D05CBF.CF7A44B0 and then ill try and figure out how to append _image001.jpg on the end.

I managed to build the regex that highlights the whole image tag and running it in http://www.regexr.com/ it does highlight the cid: value in element [1]:

I was thinking something like this but it just returns an empty string but the logic seems to work in the regex tool so i cant figure out why its not working, maybe its because the regex has 3 elements and i need to access element [1] to get the cid: value, not sure:

$string = preg_replace('/(<img\b\s+.*?src=\")(.*?cid:.*?)(\">)/g', 'attachments/'.$mail_id.'_', $html);

but the problem here is i just need to replace cid: with attachments/308907_ and i dont want to replace the image001.jpg@01D05CBF.CF7A44B0 part.

I am also not sure of the best way to append the _image.jpg at the end. If it was just one replace i could do something like this:

$current_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0';
$new_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg';

str_replace($current_image_name, $new_image_name,$html);

But because there could be lots of these in the email i dont think that approach will work and it might not be good performance wise since some emails could be large in some cases.

My worry is that is not efficient doing calls since it could be a big email in parsing so maybe there is a way to do that at the same time as the preg_replace function.

I am happy to figure the actual code out if someone even points me in the right direction and gives me some hints on the best way to achieve this.

vendredi 8 mai 2015

Using a replace method in R, when should PERL be TRUE?

In R you can use a replacement function such as gsub to find and replace with regular expressions. I've seen some use PERL=TRUE as an additional argument, but I wonder when this is necessary? Which flavor does R use by default? And what more can the PERL version do? When should one use the PERL variant and when shouldn't he?