Wednesday, November 28, 2018

Get Line Number in file with Python

lookup = 'text to find'

with open(filename) as myFile:
    for num, line in enumerate(myFile, 1):
        if lookup in line:
            print('found at line:', num)
 
Or:
 
f = open('some_file.txt','r')
line_num = 0
search_phrase = "the dog barked"
for line in f.readlines():
    line_num += 1
    if line.find(search_phrase) >= 0:
        print(line_num)
 
Or:
def line_num_for_phrase_in_file(phrase='the dog barked', filename='file.txt')
    with open(filename,'r') as f:
        for (i, line) in enumerate(f):
            if phrase in line:
                return i
    return -1

Or:
lookup="The_String_You're_Searching"
file_name = open("file.txt")
for num, line in enumerate(file_name,1):
        if lookup in line:
            print(num)
 
Or:
f_rd = open(path, 'r')
file_lines = f_rd.readlines()
f_rd.close()

matches = [line for line in file_lines if "chars of Interest" in line]
index = file_lines.index(matches[0]) 

Undocumented Features and Limitations of the Windows FINDSTR Command

Source: https://stackoverflow.com
The Windows FINDSTR command is horribly documented. There is very basic command line help available through FINDSTR /?, or HELP FINDSTR, but it is woefully inadequate. There is a wee bit more documentation online at https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/findstr.
There are many FINDSTR features and limitations that are not even hinted at in the documentation. Nor could they be anticipated without prior knowledge and/or careful experimentation.
So the question is - What are the undocumented FINDSTR features and limitations?
The purpose of this question is to provide a one stop repository of the many undocumented features so that:
A) Developers can take full advantage of the features that are there.
B) Developers don't waste their time wondering why something doesn't work when it seems like it should.
Please make sure you know the existing documentation before responding. If the information is covered by the HELP, then it does not belong here.
Neither is this a place to show interesting uses of FINDSTR. If a logical person could anticipate the behavior of a particular usage of FINDSTR based on the documentation, then it does not belong here.
Along the same lines, if a logical person could anticipate the behavior of a particular usage based on information contained in any existing answers, then again, it does not belong here.
 
Preface
Much of the information in this answer has been gathered based on experiments run on a Vista machine. Unless explicitly stated otherwise, I have not confirmed whether the information applies to other Windows versions.
FINDSTR output
The documentation never bothers to explain the output of FINDSTR. It alludes to the fact that matching lines are printed, but nothing more.
The format of matching line output is as follows:
filename:lineNumber:lineOffset:text
where
fileName: = The name of the file containing the matching line. The file name is not printed if the request was explicitly for a single file, or if searching piped input or redirected input. When printed, the fileName will always include any path information provided. Additional path information will be added if the /S option is used. The printed path is always relative to the provided path, or relative to the current directory if none provided.
Note - The filename prefix can be avoided when searching multiple files by using the non-standard (and poorly documented) wildcards < and >. The exact rules for how these wildcards work can be found here. Finally, you can look at this example of how the non-standard wildcards work with FINDSTR.
lineNumber: = The line number of the matching line represented as a decimal value with 1 representing the 1st line of the input. Only printed if /N option is specified.
lineOffset: = The decimal byte offset of the start of the matching line, with 0 representing the 1st character of the 1st line. Only printed if /O option is specified. This is not the offset of the match within the line. It is the number of bytes from the beginning of the file to the beginning of the line.
text = The binary representation of the matching line, including any <CR> and/or <LF>. Nothing is left out of the binary output, such that this example that matches all lines will produce an exact binary copy of the original file.
FINDSTR "^" FILE >FILE_COPY
Most control characters and many extended ASCII characters display as dots on XP
FINDSTR on XP displays most non-printable control characters from matching lines as dots (periods) on the screen. The following control characters are exceptions; they display as themselves: 0x09 Tab, 0x0A LineFeed, 0x0B Vertical Tab, 0x0C Form Feed, 0x0D Carriage Return.
XP FINDSTR also converts a number of extended ASCII characters to dots as well. The extended ASCII characters that display as dots on XP are the same as those that are transformed when supplied on the command line. See the "Character limits for command line parameters - Extended ASCII transformation" section, later in this post
Control characters and extended ASCII are not converted to dots on XP if the output is piped, redirected to a file, or within a FOR IN() clause.
Vista and Windows 7 always display all characters as themselves, never as dots.
Return Codes (ERRORLEVEL)
  • 0 (success)
    • Match was found in at least one line of at least one file.
  • 1 (failure)
    • No match was found in any line of any file.
    • Invalid color specified by /A:xx option
  • 2 (error)
    • Incompatible options /L and /R both specified
    • Missing argument after /A:, /F:, /C:, /D:, or /G:
    • File specified by /F:file or /G:file not found
  • 255 (error)
Source of data to search (Updated based on tests with Windows 7)
Findstr can search data from only one of the following sources:
  • filenames specified as arguments and/or using the /F:file option.
  • stdin via redirection findstr "searchString" <file
  • data stream from a pipe type file | findstr "searchString"
Arguments/options take precedence over redirection, which takes precedence over piped data.
File name arguments and /F:file may be combined. Multiple file name arguments may be used. If multiple /F:file options are specified, then only the last one is used. Wild cards are allowed in filename arguments, but not within the file pointed to by /F:file.
Source of search strings (Updated based on tests with Windows 7)
The /G:file and /C:string options may be combined. Multiple /C:string options may be specified. If multiple /G:file options are specified, then only the last one is used. If either /G:file or /C:string is used, then all non-option arguments are assumed to be files to search. If neither /G:file nor /C:string is used, then the first non-option argument is treated as a space delimited list of search terms.
File names must not be quoted within the file when using the /F:FILE option.
File names may contain spaces and other special characters. Most commands require that such file names are quoted. But the FINDSTR /F:files.txt option requires that filenames within files.txt must NOT be quoted. The file will not be found if the name is quoted.
BUG - Short 8.3 filenames can break the /D and /S options
As with all Windows commands, FINDSTR will attempt to match both the long name and the short 8.3 name when looking for files to search. Assume the current folder contains the following non-empty files:
b1.txt
b.txt2
c.txt
The following command will successfully find all 3 files:
findstr /m "^" *.txt
b.txt2 matches because the corresponding short name B9F64~1.TXT matches. This is consistent with the behavior of all other Windows commands.
But a bug with the /D and /S options causes the following commands to only find b1.txt
findstr /m /d:. "^" *.txt
findstr /m /s "^" *.txt
The bug prevents b.txt2 from being found, as well as all file names that sort after b.txt2 within the same directory. Additional files that sort before, like a.txt, are found. Additional files that sort later, like d.txt, are missed once the bug has been triggered.
Each directory searched is treated independently. For example, the /S option would successfully begin searching in a child folder after failing to find files in the parent, but once the bug causes a short file name to be missed in the child, then all subsequent files in that child folder would also be missed.
The commands work bug free if the same file names are created on a machine that has NTFS 8.3 name generation disabled. Of course b.txt2 would not be found, but c.txt would be found properly.
Not all short names trigger the bug. All instances of bugged behavior I have seen involve an extension that is longer than 3 characters with a short 8.3 name that begins the same as a normal name that does not require an 8.3 name.
The bug has been confirmed on XP, Vista, and Windows 7.
Non-Printable characters and the /P option
The /P option causes FINDSTR to skip any file that contains any of the following decimal byte codes:
0-7, 14-25, 27-31.
Put another way, the /P option will only skip files that contain non-printable control characters. Control characters are codes less than or equal to 31 (0x1F). FINDSTR treats the following control characters as printable:
 8  0x08  backspace
 9  0x09  horizontal tab
10  0x0A  line feed
11  0x0B  vertical tab
12  0x0C  form feed
13  0x0D  carriage return
26  0x1A  substitute (end of text)
All other control characters are treated as non-printable, the presence of which causes the /P option to skip the file.
Piped and Redirected input may have <CR><LF> appended
If the input is piped in and the last character of the stream is not <LF>, then FINDSTR will automatically append <CR><LF> to the input. This has been confirmed on XP, Vista and Windows 7. (I used to think that the Windows pipe was responsible for modifying the input, but I have since discovered that FINDSTR is actually doing the modification.)
The same is true for redirected input on Vista. If the last character of a file used as redirected input is not <LF>, then FINDSTR will automatically append <CR><LF> to the input. However, XP and Windows 7 do not alter redirected input.
FINDSTR hangs on XP and Windows 7 if redirected input does not end with <LF>
This is a nasty "feature" on XP and Windows 7. If the last character of a file used as redirected input does not end with <LF>, then FINDSTR will hang indefinitely once it reaches the end of the redirected file.
Last line of Piped data may be ignored if it consists of a single character
If the input is piped in and the last line consists of a single character that is not followed by <LF>, then FINDSTR completely ignores the last line.
Example - The first command with a single character and no <LF> fails to match, but the second command with 2 characters works fine, as does the third command that has one character with terminating newline.
> set /p "=x" <nul | findstr "^"

> set /p "=xx" <nul | findstr "^"
xx

> echo x| findstr "^"
x
Reported by DosTips user Sponge Belly at new findstr bug. Confirmed on XP, Windows 7 and Windows 8. Haven't heard about Vista yet. (I no longer have Vista to test).
Option syntax
Options can be prefixed with either / or - Options may be concatenated after a single / or -. However, the concatenated option list may contain at most one multicharacter option such as OFF or F:, and the multi-character option must be the last option in the list.
The following are all equivalent ways of expressing a case insensitive regex search for any line that contains both "hello" and "goodbye" in any order
  • /i /r /c:"hello.*goodbye" /c:"goodbye.*hello"
  • -i -r -c:"hello.*goodbye" /c:"goodbye.*hello"
  • /irc:"hello.*goodbye" /c:"goodbye.*hello"
Search String length limits
On Vista the maximum allowed length for a single search string is 511 bytes. If any search string exceeds 511 then the result is a FINDSTR: Search string too long. error with ERRORLEVEL 2.
When doing a regular expression search, the maximum search string length is 254. A regular expression with length between 255 and 511 will result in a FINDSTR: Out of memory error with ERRORLEVEL 2. A regular expression length >511 results in the FINDSTR: Search string too long. error.
On Windows XP the search string length is apparently shorter. Findstr error: "Search string too long": How to extract and match substring in "for" loop? The XP limit is 127 bytes for both literal and regex searches.
Line Length limits
Files specified as a command line argument or via the /F:FILE option have no known line length limit. Searches were successfully run against a 128MB file that did not contain a single <LF>.
Piped data and Redirected input is limited to 8191 bytes per line. This limit is a "feature" of FINDSTR. It is not inherent to pipes or redirection. FINDSTR using redirected stdin or piped input will never match any line that is >=8k bytes. Lines >= 8k generate an error message to stderr, but ERRORLEVEL is still 0 if the search string is found in at least one line of at least one file.
Default type of search: Literal vs Regular Expression
/C:"string" - The default is /L literal. Explicitly combining the /L option with /C:"string" certainly works but is redundant.
"string argument" - The default depends on the content of the very first search string. (Remember that <space> is used to delimit search strings.) If the first search string is a valid regular expression that contains at least one un-escaped meta-character, then all search strings are treated as regular expressions. Otherwise all search strings are treated as literals. For example, "51.4 200" will be treated as two regular expressions because the first string contains an un-escaped dot, whereas "200 51.4" will be treated as two literals because the first string does not contain any meta-characters.
/G:file - The default depends on the content of the first non-empty line in the file. If the first search string is a valid regular expression that contains at least one un-escaped meta-character, then all search strings are treated as regular expressions. Otherwise all search strings are treated as literals.
Recommendation - Always explicitly specify /L literal option or /R regular expression option when using "string argument" or /G:file.
BUG - Specifying multiple literal search strings can give unreliable results
The following simple FINDSTR example fails to find a match, even though it should.
echo ffffaaa|findstr /l "ffffaaa faffaffddd"
This bug has been confirmed on Windows Server 2003, Windows XP, Vista, and Windows 7.
Based on experiments, FINDSTR may fail if all of the following conditions are met:
  • The search is using multiple literal search strings
  • The search strings are of different lengths
  • A short search string has some amount of overlap with a longer search string
  • The search is case sensitive (no /I option)
In every failure I have seen, it is always one of the shorter search strings that fails.
For more info see Why doesn't this FINDSTR example with multiple literal search strings find a match?
Quotes and backslahses within command line arguments - Note:
The information within this highlighted section is not 100% accurate. After I wrote this section, user MC ND pointed me to a reference that documents how the Microsoft C/C++ library parses parameters. It is horrifically complicated, but it appears to accurately predict the backslash and quote rules for FINDSTR command line arguments. I recommend you use the highlighted information below as a guide, but if you want more accurate info, refer to the link.

Escaping Quote within command line search strings
Quotes within command line search strings must be escaped with backslash like \". This is true for both literal and regex search strings. This information has been confirmed on XP, Vista, and Windows 7.
Note: The quote may also need to be escaped for the CMD.EXE parser, but this has nothing to do with FINDSTR. For example, to search for a single quote you could use:
FINDSTR \^" file && echo found || echo not found
Escaping Backslash within command line literal search strings
Backslash in a literal search string can normally be represented as \ or as \\. They are typically equivalent. (There may be unusual cases in Vista where the backslash must always be escaped, but I no longer have a Vista machine to test).
But there are some special cases:
When searching for consecutive backslashes, all but the last must be escaped. The last backslash may optionally be escaped.
  • \\ can be coded as \\\ or \\\\
  • \\\ can be coded as \\\\\ or \\\\\\
Searching for one or more backslashes before a quote is bizarre. Logic would suggest that the quote must be escaped, and each of the leading backslashes would need to be escaped, but this does not work! Instead, each of the leading backslashes must be double escaped, and the quote is escaped normally:
  • \" must be coded as \\\\\"
  • \\" must be coded as \\\\\\\\\"
As previously noted, one or more escaped quotes may also require escaping with ^ for the CMD parser
The info in this section has been confirmed on XP and Windows 7.
Escaping Backslash within command line regex search strings
  • Vista only: Backslash in a regex must be either double escaped like \\\\, or else single escaped within a character class set like [\\]
  • XP and Windows 7: Backslash in a regex can always be represented as [\\]. It can normally be represented as \\. But this never works if the backslash precedes an escaped quote.
    One or more backslashes before an escaped quote must either be double escaped, or else coded as [\\]
    • \" may be coded as \\\\\" or [\\]\"
    • \\" may be coded as \\\\\\\\\" or [\\][\\]\" or \\[\\]\"
Escaping Quote and Backslash within /G:FILE literal search strings
Standalone quotes and backslashes within a literal search string file specified by /G:file need not be escaped, but they can be.
" and \" are equivalent.
\ and \\ are equivalent.
If the intent is to find \\, then at least the leading backslash must be escaped. Both \\\ and \\\\ work.
If the intent is to find \", then at least the leading backslash must be escaped. Both \\" and \\\" work.
Escaping Quote and Backslash within /G:FILE regex search strings
This is the one case where the escape sequences work as expected based on the documentation. Quote is not a regex metacharacter, so it need not be escaped (but can be). Backslash is a regex metacharacter, so it must be escaped.
Character limits for command line parameters - Extended ASCII transformation
The null character (0x00) cannot appear in any string on the command line. Any other single byte character can appear in the string (0x01 - 0xFF). However, FINDSTR converts many extended ASCII characters it finds within command line parameters into other characters. This has a major impact in two ways:
1) Many extended ASCII characters will not match themselves if used as a search string on the command line. This limitation is the same for literal and regex searches. If a search string must contain extended ASCII, then the /G:FILE option should be used instead.
2) FINDSTR may fail to find a file if the name contains extended ASCII characters and the file name is specified on the command line. If a file to be searched contains extended ASCII in the name, then the /F:FILE option should be used instead.
Here is a complete list of extended ASCII character transformations that FINDSTR performs on command line strings. Each character is represented as the decimal byte code value. The first code represents the character as supplied on the command line, and the second code represents the character it is transformed into. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.
158 treated as 080     199 treated as 221     226 treated as 071
169 treated as 170     200 treated as 043     227 treated as 112
176 treated as 221     201 treated as 043     228 treated as 083
177 treated as 221     202 treated as 045     229 treated as 115
178 treated as 221     203 treated as 045     231 treated as 116
179 treated as 221     204 treated as 221     232 treated as 070
180 treated as 221     205 treated as 045     233 treated as 084
181 treated as 221     206 treated as 043     234 treated as 079
182 treated as 221     207 treated as 045     235 treated as 100
183 treated as 043     208 treated as 045     236 treated as 056
184 treated as 043     209 treated as 045     237 treated as 102
185 treated as 221     210 treated as 045     238 treated as 101
186 treated as 221     211 treated as 043     239 treated as 110
187 treated as 043     212 treated as 043     240 treated as 061
188 treated as 043     213 treated as 043     242 treated as 061
189 treated as 043     214 treated as 043     243 treated as 061
190 treated as 043     215 treated as 043     244 treated as 040
191 treated as 043     216 treated as 043     245 treated as 041
192 treated as 043     217 treated as 043     247 treated as 126
193 treated as 045     218 treated as 043     249 treated as 250
194 treated as 045     219 treated as 221     251 treated as 118
195 treated as 043     220 treated as 095     252 treated as 110
196 treated as 045     222 treated as 221     254 treated as 221
197 treated as 043     223 treated as 095
198 treated as 221     224 treated as 097
Any character >0 not in the list above is treated as itself, including <CR> and <LF>. The easiest way to include odd characters like <CR> and <LF> is to get them into an environment variable and use delayed expansion within the command line argument.
Character limits for strings found in files specified by /G:FILE and /F:FILE options
The nul (0x00) character can appear in the file, but it functions like the C string terminator. Any characters after a nul character are treated as a different string as if they were on another line.
The <CR> and <LF> characters are treated as line terminators that terminate a string, and are not included in the string.
All other single byte characters are included perfectly within a string.
Searching Unicode files
FINDSTR cannot properly search most Unicode (UTF-16, UTF-16LE, UTF-16BE, UTF-32) because it cannot search for nul bytes and Unicode typically contains many nul bytes.
However, the TYPE command converts UTF-16LE with BOM to a single byte character set, so a command like the following will work with UTF-16LE with BOM.
type unicode.txt|findstr "search"
Note that Unicode code points that are not supported by your active code page will be converted to ? characters.
It is possible to search UTF-8 as long as your search string contains only ASCII. However, the console output of any multi-byte UTF-8 characters will not be correct. But if you redirect the output to a file, then the result will be correctly encoded UTF-8. Note that if the UTF-8 file contains a BOM, then the BOM will be considered as part of the first line, which could throw off a search that matches the beginning of a line.
It is possible to search multi-byte UTF-8 characters if you put your search string in a UTF-8 encoded search file (without BOM), and use the /G option.
End Of Line
FINDSTR breaks lines immediately after every <LF>. The presence or absence of <CR> has no impact on line breaks.
Searching across line breaks
As expected, the . regex metacharacter will not match <CR> or <LF>. But it is possible to search across a line break using a command line search string. Both the <CR> and <LF> characters must be matched explicitly. If a multi-line match is found, only the 1st line of the match is printed. FINDSTR then doubles back to the 2nd line in the source and begins the search all over again - sort of a "look ahead" type feature.
Assume TEXT.TXT has these contents (could be Unix or Windows style)
A
A
A
B
A
A
Then this script
@echo off
setlocal
::Define LF variable containing a linefeed (0x0A)
set LF=^


::Above 2 blank lines are critical - do not remove

::Define CR variable containing a carriage return (0x0D)
for /f %%a in ('copy /Z "%~dpf0" nul') do set "CR=%%a"

setlocal enableDelayedExpansion
::regex "!CR!*!LF!" will match both Unix and Windows style End-Of-Line
findstr /n /r /c:"A!CR!*!LF!A" TEST.TXT
gives these results
1:A
2:A
5:A
Searching across line breaks using the /G:FILE option is imprecise because the only way to match <CR> or <LF> is via a regex character class range expression that sandwiches the EOL characters.
  • [<TAB>-<0x0B>] matches <LF>, but it also matches <TAB> and <0x0B>
  • [<0x0C>-!] matches <CR>, but it also matches <0x0C> and !
    Note - the above are symbolic representations of the regex byte stream since I can't graphically represent the characters.

    Limited Regular Expressions (regex) Support
    FINDSTR support for regular expressions is extremely limited. If it is not in the HELP documentation, it is not supported.
    Beyond that, the regex expressions that are supported are implemented in a completely non-standard manner, such that results can be different then would be expected coming from something like grep or perl.
    Regex Line Position anchors ^ and $
    ^ matches beginning of input stream as well as any position immediately following a <LF>. Since FINDSTR also breaks lines after <LF>, a simple regex of "^" will always match all lines within a file, even a binary file.
    $ matches any position immediately preceding a <CR>. This means that a regex search string containing $ will never match any lines within a Unix style text file, nor will it match the last line of a Windows text file if it is missing the EOL marker of <CR><LF>.
    Note - As previously discussed, piped and redirected input to FINDSTR may have <CR><LF> appended that is not in the source. Obviously this can impact a regex search that uses $.
    Any search string with characters before ^ or after $ will always fail to find a match.
    Positional Options /B /E /X
    The positional options work the same as ^ and $, except they also work for literal search strings.
    /B functions the same as ^ at the start of a regex search string.
    /E functions the same as $ at the end of a regex search string.
    /X functions the same as having both ^ at the beginning and $ at the end of a regex search string.
    Regex word boundary
    \< must be the very first term in the regex. The regex will not match anything if any other characters precede it. \< corresponds to either the very beginning of the input, the beginning of a line (the position immediately following a <LF>), or the position immediately following any "non-word" character. The next character need not be a "word" character.
    \> must be the very last term in the regex. The regex will not match anything if any other characters follow it. \> corresponds to either the end of input, the position immediately prior to a <CR>, or the position immediately preceding any "non-word" character. The preceding character need not be a "word" character.
    Here is a complete list of "non-word" characters, represented as the decimal byte code. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.
    001   028   063   179   204   230
    002   029   064   180   205   231
    003   030   091   181   206   232
    004   031   092   182   207   233
    005   032   093   183   208   234
    006   033   094   184   209   235
    007   034   096   185   210   236
    008   035   123   186   211   237
    009   036   124   187   212   238
    011   037   125   188   213   239
    012   038   126   189   214   240
    014   039   127   190   215   241
    015   040   155   191   216   242
    016   041   156   192   217   243
    017   042   157   193   218   244
    018   043   158   194   219   245
    019   044   168   195   220   246
    020   045   169   196   221   247
    021   046   170   197   222   248
    022   047   173   198   223   249
    023   058   174   199   224   250
    024   059   175   200   226   251
    025   060   176   201   227   254
    026   061   177   202   228   255
    027   062   178   203   229
    
    Regex character class ranges [x-y]
    Character class ranges do not work as expected. See this question: Why does findstr not handle case properly (in some circumstances)?, along with this answer: https://stackoverflow.com/a/8767815/1012053.
    The problem is FINDSTR does not collate the characters by their byte code value (commonly thought of as the ASCII code, but ASCII is only defined from 0x00 - 0x7F). Most regex implementations would treat [A-Z] as all upper case English capital letters. But FINDSTR uses a collation sequence that roughly corresponds to how SORT works. So [A-Z] includes the complete English alphabet, both upper and lower case (except for "a"), as well as non-English alpha characters with diacriticals.
  • Regex character class term limit and BUG
    Not only is FINDSTR limited to a maximum of 15 character class terms within a regex, it fails to properly handle an attempt to exceed the limit. Using 16 or more character class terms results in an interactive Windows pop up stating "Find String (QGREP) Utility has encountered a problem and needs to close. We are sorry for the inconvenience." The message text varies slightly depending on the Windows version. Here is one example of a FINDSTR that will fail:
    echo 01234567890123456|findstr [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
    
    This bug was reported by DosTips user Judago here. It has been confirmed on XP, Vista, and Windows 7.
    Regex searches fail (and may hang indefinitely) if they include byte code 0xFF (decimal 255)
    Any regex search that includes byte code 0xFF (decimal 255) will fail. It fails if byte code 0xFF is included directly, or if it is implicitly included within a character class range. Remember that FINDSTR character class ranges do not collate characters based on the byte code value. Character <0xFF> appears relatively early in the collation sequence between the <space> and <tab> characters. So any character class range that includes both <space> and <tab> will fail.
    The exact behavior changes slightly depending on the Windows version. Windows 7 hangs indefinitely if 0xFF is included. XP doesn't hang, but it always fails to find a match, and occasionally prints the following error message - "The process tried to write to a nonexistent pipe."
    I no longer have access to a Vista machine, so I haven't been able to test on Vista.
    Regex bug: . and [^anySet] can match End-Of-File
    The regex . meta-character should only match any character other than <CR> or <LF>. There is a bug that allows it to match the End-Of-File if the last line in the file is not terminated by <CR> or <LF>. However, the . will not match an empty file.
    For example, a file named "test.txt" containing a single line of x, without terminating <CR> or <LF>, will match the following:
    findstr /r x......... test.txt
    
    This bug has been confirmed on XP and Win7.
    The same seems to be true for negative character sets. Something like [^abc] will match End-Of-File. Positive character sets like [abc] seem to work fine. I have only tested this on Win7.

Another grep.py

Grep.py - Source: https://github.com/rohitkrai03

import sys
import re
import os
import utils

def main():
files = utils.troll_directories(os.path.normpath(sys.argv[1]))
patterns = utils.convert_patterns(sys.argv[2:])
utils.apply_patterns(files, patterns)


if __name__ == '__main__' : main()

Utils.py:

import re
import os

def convert_patterns(patterns):
results = []

# for each pattern
for pattern in patterns:
# make a regular expression with it
expr = re.compile(pattern)
results.append(expr)
# return the results
return results

def troll_directories(start):
# troll for all the directories like in find
results = []
# Traverse the directory for all the files.
for root, dirs, files in os.walk(start):
for fname in files:
# put the full path into the results
results.append(os.path.join(root, fname))
return results

def apply_patterns(files, patterns):
# for each file in files
for fname in files:
# open the file and read the lines
lines = open(fname).readlines()
for num, line in enumerate(lines):
# for each pattern
for pattern in patterns:
# if pattern found in contents
if pattern.search(lines):
# print file, line number, line
print("{}:{}: {}".format(os.path.join(fname), num+1, line))

Python Find

It is a python based small utility which finds for given regular expression in all the filenames for the given directory and returns them with full path

How to Use

Just download the package or clone the repo from github.
Run the find.py file with source directory and regular expression to search for given as command line argument.

Example : - py find.py '.' '.*.py'


Find,py
#!/usr/bin/env python3
import sys
import re
import os
# Get the start directory.
start = os.path.normpath(sys.argv[1])
# Get the patterns from the command line arguments.
pattern = sys.argv[2]
# Convert them to regular expressions.
expr = re.compile(pattern)
# Traverse the directory for all the files.
for root, dirs, files in os.walk(start):
for fname in files:
# If a file matches the pattern then print its name.
if expr.search(fname):
print(os.path.join(root, fname))



Another interesting project:

https://pypi.org/project/grin

Python 101: Redirecting stdout

Source: https://www.blog.pythonlibrary.org
Redirecting stdout to something most developers will need to do at some point or other. It can be useful to redirect stdout to a file or to a file-like object. I have also redirected stdout to a text control in some of my desktop GUI projects. In this article we will look at the following:
  • Redirecting stdout to a file (simple)
  • The Shell redirection method
  • Redirecting stdout using a custom context manager
  • Python 3’s contextlib.redirect_stdout()
  • Redirect stdout to a wxPython text control


Redirecting stdout

The easiest way to redirect stdout in Python is to just assign it an open file object. Let’s take a look at a simple example:
import sys
 
def redirect_to_file(text):
    original = sys.stdout
    sys.stdout = open('/path/to/redirect.txt', 'w')
    print('This is your redirected text:')
    print(text)
    sys.stdout = original
 
    print('This string goes to stdout, NOT the file!')
 
if __name__ == '__main__':Redirecting stdout / stderr
    redirect_to_file('Python rocks!')
Here we just import Python’s sys module and create a function that we can pass strings that we want to have redirected to a file. We save off a reference to sys.stdout so we can restore it at the end of the function. This can be useful if you intend to use stdout for other things. Before you run this code, be sure to update the path to something that will work on your system. When you run it, you should see the following in your file:

This is your redirected text:
Python rocks!

That last print statement will go to stdout, not the file.

Shell Redirection

Shell redirection is also pretty common, especially in Linux, although Windows also works the same way in most cases. Let’s create a silly example of a noisy function that we will call noisy.py:
# noisy.py
def noisy(text):
    print('The noisy function prints a lot')
    print('Here is the string you passed in:')
    print('*' * 40)
    print(text)
    print('*' * 40)
    print('Thank you for calling me!')
 
if __name__ == '__main__':
    noisy('This is a test of Python!')
You will notice that we didn’t import the sys module this time around. The reason is that we don’t need it since we will be using shell redirection. To do shell redirection, open a terminal (or command prompt) and navigate to the folder where you saved the code above. Then execute the following command:

python noisy.py > redirected.txt

The greater than character (i.e. >) tells your operating system to redirect stdout to the filename you specified. At this point you should have a file named “redirected.txt” in the same folder as your Python script. If you open it up, the file should have the following contents:

The noisy function prints a lot
Here is the string you passed in:
****************************************
This is a test of Python!
****************************************
Thank you for calling me!

Now wasn’t that pretty cool?

Redirect stdout with a context manager

Another fun way to redirect stdout is by using a context manager. Let’s create a custom context manager that accepts a file object to redirect stdout to:
import sys
from contextlib import contextmanager
 
 
@contextmanager
def custom_redirection(fileobj):
    old = sys.stdout
    sys.stdout = fileobj
    try:
        yield fileobj
    finally:
        sys.stdout = old
 
if __name__ == '__main__':
    with open('/path/to/custom_redir.txt', 'w') as out:
        with custom_redirection(out):
            print('This text is redirected to file')
            print('So is this string')
        print('This text is printed to stdout')
When you run this code, it will write out two lines of text to your file and one to stdout. As usual, we reset stdout at the end of the function.

Using contextlib.redirect_stdout

Python 3.4 added the redirect_stdout function to their contextlib module. Let’s try using that to create a context manager to redirect stdout:
import sys
from contextlib import redirect_stdout
 
def redirected(text, path):
    with open(path, 'w') as out:
        with redirect_stdout(out):
            print('Here is the string you passed in:')
            print('*' * 40)
            print(text)
            print('*' * 40)
 
if __name__ == '__main__':
    path = '/path/to/red.txt'
    text = 'My test to redirect'
    redirected(text, path)
This code is a little simpler because the built-in function does all the yielding and resetting of stdout automatically for you. Otherwise, it works in pretty much the same way as our custom context manager.

Redirecting stdout in wxPython

wxredirect

import sys
import wx
 
class MyForm(wx.Frame):
 
    def __init__(self):
        wx.Frame.__init__(self, None,
                          title="wxPython Redirect Tutorial")
 
        # Add a panel so it looks the correct on all platforms
        panel = wx.Panel(self, wx.ID_ANY)
        style = wx.TE_MULTILINE|wx.TE_READONLY|wx.HSCROLL
        log = wx.TextCtrl(panel, wx.ID_ANY, size=(300,100),
                          style=style)
        btn = wx.Button(panel, wx.ID_ANY, 'Push me!')
        self.Bind(wx.EVT_BUTTON, self.onButton, btn)
 
        # Add widgets to a sizer
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(log, 1, wx.ALL|wx.EXPAND, 5)
        sizer.Add(btn, 0, wx.ALL|wx.CENTER, 5)
        panel.SetSizer(sizer)
 
        # redirect text here
        sys.stdout = log
 
    def onButton(self, event):
        print "You pressed the button!"
 
# Run the program
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyForm().Show()
    app.MainLoop()

This code just creates a simple frame with a panel that contains a multi-line text control and a button. Whenever you press the button, it will print out some text to stdout, which we have redirected to the text control.

Personally I thought it was cool that Python 3 now has a context manager built-in just for this purpose. Speaking of which, Python 3 also has a function for redirecting stderr. All of these examples can be modified slightly to support redirecting stderr or both stdout and stderr. The very last thing we touched on was redirecting stdout to a text control in wxPython. This can be really useful for debugging or for grabbing the output from a subprocess, although in the latter case you will need to print out the output to have it redirected correctly.

Related Reading

Grep.py

Script: grep.py

Search regular expression in buffers or log files. (for WeeChat ≥ 1.5)
Author: m4v — Version: 0.8.1 — License: GPL3.
Added: 2009-08-18, updated: 2018-04-10.
Other scripts: https://weechat.org/scripts/
Source: https://weechat.org/scripts/source/grep.py.html/

# -*- coding: utf-8 -*-
###
# Copyright (c) 2009-2011 by Elián Hanisch <lambdae2@gmail.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
###

###
# Search in Weechat buffers and logs (for Weechat 0.3.*)
#
#   Inspired by xt's grep.py
#   Originally I just wanted to add some fixes in grep.py, but then
#   I got carried away and rewrote everything, so new script.
#
#   Commands:
#   * /grep
#     Search in logs or buffers, see /help grep
#   * /logs:
#     Lists logs in ~/.weechat/logs, see /help logs
#
#   Settings:
#   * plugins.var.python.grep.clear_buffer:
#     Clear the results buffer before each search. Valid values: on, off
#
#   * plugins.var.python.grep.go_to_buffer:
#     Automatically go to grep buffer when search is over. Valid values: on, off
#
#   * plugins.var.python.grep.log_filter:
#     Coma separated list of patterns that grep will use for exclude logs, e.g.
#     if you use '*server/*' any log in the 'server' folder will be excluded
#     when using the command '/grep log'
#
#   * plugins.var.python.grep.show_summary:
#     Shows summary for each log. Valid values: on, off
#
#   * plugins.var.python.grep.max_lines:
#     Grep will only print the last matched lines that don't surpass the value defined here.
#
#   * plugins.var.python.grep.size_limit:
#     Size limit in KiB, is used for decide whenever grepping should run in background or not. If
#     the logs to grep have a total size bigger than this value then grep run as a new process.
#     It can be used for force or disable background process, using '0' forces to always grep in
#     background, while using '' (empty string) will disable it.
#
#   * plugins.var.python.grep.timeout_secs:
#     Timeout (in seconds) for background grepping.
#
#   * plugins.var.python.grep.default_tail_head:
#     Config option for define default number of lines returned when using --head or --tail options.
#     Can be overriden in the command with --number option.
#
#
#   TODO:
#   * try to figure out why hook_process chokes in long outputs (using a tempfile as a
#   workaround now)
#   * possibly add option for defining time intervals
#
#
#   History:
#
#   2018-04-10, Sébastien Helleu <flashcode@flashtux.org>
#   version 0.8.1: fix infolist_time for WeeChat >= 2.2 (WeeChat returns a long
#                  integer instead of a string)
#
#   2017-09-20, mickael9
#   version 0.8:
#   * use weechat 1.5+ api for background processing (old method was unsafe and buggy)
#   * add timeout_secs setting (was previously hardcoded to 5 mins)
#
#   2017-07-23, Sébastien Helleu <flashcode@flashtux.org>
#   version 0.7.8: fix modulo by zero when nick is empty string
#
#   2016-06-23, mickael9
#   version 0.7.7: fix get_home function
#
#   2015-11-26
#   version 0.7.6: fix a typo
#
#   2015-01-31, Nicd-
#   version 0.7.5:
#   '~' is now expaned to the home directory in the log file path so
#   paths like '~/logs/' should work.
#
#   2015-01-14, nils_2
#   version 0.7.4: make q work to quit grep buffer (requested by: gb)
#
#   2014-03-29, Felix Eckhofer <felix@tribut.de>
#   version 0.7.3: fix typo
#
#   2011-01-09
#   version 0.7.2: bug fixes
#
#   2010-11-15
#   version 0.7.1:
#   * use TempFile so temporal files are guaranteed to be deleted.
#   * enable Archlinux workaround.
#
#   2010-10-26
#   version 0.7:
#   * added templates.
#   * using --only-match shows only unique strings.
#   * fixed bug that inverted -B -A switches when used with -t
#
#   2010-10-14
#   version 0.6.8: by xt <xt@bash.no>
#   * supress highlights when printing in grep buffer
#
#   2010-10-06
#   version 0.6.7: by xt <xt@bash.no> 
#   * better temporary file:
#    use tempfile.mkstemp. to create a temp file in log dir, 
#    makes it safer with regards to write permission and multi user
#
#   2010-04-08
#   version 0.6.6: bug fixes
#   * use WEECHAT_LIST_POS_END in log file completion, makes completion faster
#   * disable bytecode if using python 2.6
#   * use single quotes in command string
#   * fix bug that could change buffer's title when using /grep stop
#
#   2010-01-24
#   version 0.6.5: disable bytecode is a 2.6 feature, instead, resort to delete the bytecode manually
#
#   2010-01-19
#   version 0.6.4: bug fix
#   version 0.6.3: added options --invert --only-match (replaces --exact, which is still available
#   but removed from help)
#   * use new 'irc_nick_color' info
#   * don't generate bytecode when spawning a new process
#   * show active options in buffer title
#
#   2010-01-17
#   version 0.6.2: removed 2.6-ish code
#   version 0.6.1: fixed bug when grepping in grep's buffer
#
#   2010-01-14
#   version 0.6.0: implemented grep in background
#   * improved context lines presentation.
#   * grepping for big (or many) log files runs in a weechat_process.
#   * added /grep stop.
#   * added 'size_limit' option
#   * fixed a infolist leak when grepping buffers
#   * added 'default_tail_head' option
#   * results are sort by line count
#   * don't die if log is corrupted (has NULL chars in it)
#   * changed presentation of /logs
#   * log path completion doesn't suck anymore
#   * removed all tabs, because I learned how to configure Vim so that spaces aren't annoying
#   anymore. This was the script's original policy.
#
#   2010-01-05
#   version 0.5.5: rename script to 'grep.py' (FlashCode <flashcode@flashtux.org>).
#
#   2010-01-04
#   version 0.5.4.1: fix index error when using --after/before-context options.
#
#   2010-01-03
#   version 0.5.4: new features
#   * added --after-context and --before-context options.
#   * added --context as a shortcut for using both -A -B options.
#
#   2009-11-06
#   version 0.5.3: improvements for long grep output
#   * grep buffer input accepts the same flags as /grep for repeat a search with different
#     options.
#   * tweaks in grep's output.
#   * max_lines option added for limit grep's output.
#   * code in update_buffer() optimized.
#   * time stats in buffer title.
#   * added go_to_buffer config option.
#   * added --buffer for search only in buffers.
#   * refactoring.
#
#   2009-10-12, omero
#   version 0.5.2: made it python-2.4.x compliant
#
#   2009-08-17
#   version 0.5.1: some refactoring, show_summary option added.
#
#   2009-08-13
#   version 0.5: rewritten from xt's grep.py
#   * fixed searching in non weechat logs, for cases like, if you're
#     switching from irssi and rename and copy your irssi logs to %h/logs
#   * fixed "timestamp rainbow" when you /grep in grep's buffer
#   * allow to search in other buffers other than current or in logs
#     of currently closed buffers with cmd 'buffer'
#   * allow to search in any log file in %h/logs with cmd 'log'
#   * added --count for return the number of matched lines
#   * added --matchcase for case sensible search
#   * added --hilight for color matches
#   * added --head and --tail options, and --number
#   * added command /logs for list files in %h/logs
#   * added config option for clear the buffer before a search
#   * added config option for filter logs we don't want to grep
#   * added the posibility to repeat last search with another regexp by writing
#     it in grep's buffer
#   * changed spaces for tabs in the code, which is my preference
#
###

from os import path
import sys, getopt, time, os, re

try:
    import cPickle as pickle
except ImportError:
    import pickle

try:
    import weechat
    from weechat import WEECHAT_RC_OK, prnt, prnt_date_tags
    import_ok = True
except ImportError:
    import_ok = False

SCRIPT_NAME    = "grep"
SCRIPT_AUTHOR  = "Elián Hanisch <lambdae2@gmail.com>"
SCRIPT_VERSION = "0.8.1"
SCRIPT_LICENSE = "GPL3"
SCRIPT_DESC    = "Search in buffers and logs"
SCRIPT_COMMAND = "grep"

### Default Settings ###
settings = {
    'clear_buffer'      : 'off',
    'log_filter'        : '',
    'go_to_buffer'      : 'on',
    'max_lines'         : '4000',
    'show_summary'      : 'on',
    'size_limit'        : '2048',
    'default_tail_head' : '10',
    'timeout_secs'      : '300',
}

### Class definitions ###
class linesDict(dict):
    """
    Class for handling matched lines in more than one buffer.
    linesDict[buffer_name] = matched_lines_list
    """
    def __setitem__(self, key, value):
        assert isinstance(value, list)
        if key not in self:
            dict.__setitem__(self, key, value)
        else:
            dict.__getitem__(self, key).extend(value)

    def get_matches_count(self):
        """Return the sum of total matches stored."""
        if dict.__len__(self):
            return sum(map(lambda L: L.matches_count, self.itervalues()))
        else:
            return 0

    def __len__(self):
        """Return the sum of total lines stored."""
        if dict.__len__(self):
            return sum(map(len, self.itervalues()))
        else:
            return 0

    def __str__(self):
        """Returns buffer count or buffer name if there's just one stored."""
        n = len(self.keys())
        if n == 1:
            return self.keys()[0]
        elif n > 1:
            return '%s logs' %n
        else:
            return ''

    def items(self):
        """Returns a list of items sorted by line count."""
        items = dict.items(self)
        items.sort(key=lambda i: len(i[1]))
        return items

    def items_count(self):
        """Returns a list of items sorted by match count."""
        items = dict.items(self)
        items.sort(key=lambda i: i[1].matches_count)
        return items

    def strip_separator(self):
        for L in self.itervalues():
            L.strip_separator()

    def get_last_lines(self, n):
        total_lines = len(self)
        #debug('total: %s n: %s' %(total_lines, n))
        if n >= total_lines:
            # nothing to do
            return
        for k, v in reversed(self.items()):
            l = len(v)
            if n > 0:
                if l > n:
                    del v[:l-n]
                    v.stripped_lines = l-n
                n -= l
            else:
                del v[:]
                v.stripped_lines = l

class linesList(list):
    """Class for list of matches, since sometimes I need to add lines that aren't matches, I need an
    independent counter."""
    _sep = '...'
    def __init__(self, *args):
        list.__init__(self, *args)
        self.matches_count = 0
        self.stripped_lines = 0

    def append(self, item):
        """Append lines, can be a string or a list with strings."""
        if isinstance(item, str):
            list.append(self, item)
        else:
            self.extend(item)

    def append_separator(self):
        """adds a separator into the list, makes sure it doen't add two together."""
        s = self._sep
        if (self and self[-1] != s) or not self:
            self.append(s)

    def onlyUniq(self):
        s = set(self)
        del self[:]
        self.extend(s)

    def count_match(self, item=None):
        if item is None or isinstance(item, str):
            self.matches_count += 1
        else:
            self.matches_count += len(item)

    def strip_separator(self):
        """removes separators if there are first or/and last in the list."""
        if self:
            s = self._sep
            if self[0] == s:
                del self[0]
            if self[-1] == s:
                del self[-1]

### Misc functions ###
now = time.time
def get_size(f):
    try:
        return os.stat(f).st_size
    except OSError:
        return 0

sizeDict = {0:'b', 1:'KiB', 2:'MiB', 3:'GiB', 4:'TiB'}
def human_readable_size(size):
    power = 0
    while size > 1024:
        power += 1
        size /= 1024.0
    return '%.2f %s' %(size, sizeDict.get(power, ''))

def color_nick(nick):
    """Returns coloured nick, with coloured mode if any."""
    if not nick: return ''
    wcolor = weechat.color
    config_string = lambda s : weechat.config_string(weechat.config_get(s))
    config_int = lambda s : weechat.config_integer(weechat.config_get(s))
    # prefix and suffix
    prefix = config_string('irc.look.nick_prefix')
    suffix = config_string('irc.look.nick_suffix')
    prefix_c = suffix_c = wcolor(config_string('weechat.color.chat_delimiters'))
    if nick[0] == prefix:
        nick = nick[1:]
    else:
        prefix = prefix_c = ''
    if nick[-1] == suffix:
        nick = nick[:-1]
        suffix = wcolor(color_delimiter) + suffix
    else:
        suffix = suffix_c = ''
    # nick mode
    modes = '@!+%'
    if nick[0] in modes:
        mode, nick = nick[0], nick[1:]
        mode_color = wcolor(config_string('weechat.color.nicklist_prefix%d' \
            %(modes.find(mode) + 1)))
    else:
        mode = mode_color = ''
    # nick color
    nick_color = ''
    if nick:
        nick_color = weechat.info_get('irc_nick_color', nick)
        if not nick_color:
            # probably we're in WeeChat 0.3.0
            #debug('no irc_nick_color')
            color_nicks_number = config_int('weechat.look.color_nicks_number')
            idx = (sum(map(ord, nick))%color_nicks_number) + 1
            nick_color = wcolor(config_string('weechat.color.chat_nick_color%02d' %idx))
    return ''.join((prefix_c, prefix, mode_color, mode, nick_color, nick, suffix_c, suffix))

### Config and value validation ###
boolDict = {'on':True, 'off':False}
def get_config_boolean(config):
    value = weechat.config_get_plugin(config)
    try:
        return boolDict[value]
    except KeyError:
        default = settings[config]
        error("Error while fetching config '%s'. Using default value '%s'." %(config, default))
        error("'%s' is invalid, allowed: 'on', 'off'" %value)
        return boolDict[default]

def get_config_int(config, allow_empty_string=False):
    value = weechat.config_get_plugin(config)
    try:
        return int(value)
    except ValueError:
        if value == '' and allow_empty_string:
            return value
        default = settings[config]
        error("Error while fetching config '%s'. Using default value '%s'." %(config, default))
        error("'%s' is not a number." %value)
        return int(default)

def get_config_log_filter():
    filter = weechat.config_get_plugin('log_filter')
    if filter:
        return filter.split(',')
    else:
        return []

def get_home():
    home = weechat.config_string(weechat.config_get('logger.file.path'))
    home = home.replace('%h', weechat.info_get('weechat_dir', ''))
    home = path.abspath(path.expanduser(home))
    return home

def strip_home(s, dir=''):
    """Strips home dir from the begging of the log path, this makes them sorter."""
    if not dir:
        global home_dir
        dir = home_dir
    l = len(dir)
    if s[:l] == dir:
        return s[l:]
    return s

### Messages ###
script_nick = SCRIPT_NAME
def error(s, buffer=''):
    """Error msg"""
    prnt(buffer, '%s%s %s' %(weechat.prefix('error'), script_nick, s))
    if weechat.config_get_plugin('debug'):
        import traceback
        if traceback.sys.exc_type:
            trace = traceback.format_exc()
            prnt('', trace)

def say(s, buffer=''):
    """normal msg"""
    prnt_date_tags(buffer, 0, 'no_highlight', '%s\t%s' %(script_nick, s))



### Log files and buffers ###
cache_dir = {} # note: don't remove, needed for completion if the script was loaded recently
def dir_list(dir, filter_list=(), filter_excludes=True, include_dir=False):
    """Returns a list of files in 'dir' and its subdirs."""
    global cache_dir
    from os import walk
    from fnmatch import fnmatch
    #debug('dir_list: listing in %s' %dir)
    key = (dir, include_dir)
    try:
        return cache_dir[key]
    except KeyError:
        pass
    
    filter_list = filter_list or get_config_log_filter()
    dir_len = len(dir)
    if filter_list:
        def filter(file):
            file = file[dir_len:] # pattern shouldn't match home dir
            for pattern in filter_list:
                if fnmatch(file, pattern):
                    return filter_excludes
            return not filter_excludes
    else:
        filter = lambda f : not filter_excludes

    file_list = []
    extend = file_list.extend
    join = path.join
    def walk_path():
        for basedir, subdirs, files in walk(dir):
            #if include_dir:
            #    subdirs = map(lambda s : join(s, ''), subdirs)
            #    files.extend(subdirs)
            files_path = map(lambda f : join(basedir, f), files)
            files_path = [ file for file in files_path if not filter(file) ]
            extend(files_path)

    walk_path()
    cache_dir[key] = file_list
    #debug('dir_list: got %s' %str(file_list))
    return file_list

def get_file_by_pattern(pattern, all=False):
    """Returns the first log whose path matches 'pattern',
    if all is True returns all logs that matches."""
    if not pattern: return []
    #debug('get_file_by_filename: searching for %s.' %pattern)
    # do envvar expandsion and check file
    file = path.expanduser(pattern)
    file = path.expandvars(file)
    if path.isfile(file):
        return [file]
    # lets see if there's a matching log
    global home_dir
    file = path.join(home_dir, pattern)
    if path.isfile(file):
        return [file]
    else:
        from fnmatch import fnmatch
        file = []
        file_list = dir_list(home_dir)
        n = len(home_dir)
        for log in file_list:
            basename = log[n:]
            if fnmatch(basename, pattern):
                file.append(log)
        #debug('get_file_by_filename: got %s.' %file)
        if not all and file:
            file.sort()
            return [ file[-1] ]
        return file

def get_file_by_buffer(buffer):
    """Given buffer pointer, finds log's path or returns None."""
    #debug('get_file_by_buffer: searching for %s' %buffer)
    infolist = weechat.infolist_get('logger_buffer', '', '')
    if not infolist: return
    try:
        while weechat.infolist_next(infolist):
            pointer = weechat.infolist_pointer(infolist, 'buffer')
            if pointer == buffer:
                file = weechat.infolist_string(infolist, 'log_filename')
                if weechat.infolist_integer(infolist, 'log_enabled'):
                    #debug('get_file_by_buffer: got %s' %file)
                    return file
                #else:
                #    debug('get_file_by_buffer: got %s but log not enabled' %file)
    finally:
        #debug('infolist gets freed')
        weechat.infolist_free(infolist)

def get_file_by_name(buffer_name):
    """Given a buffer name, returns its log path or None. buffer_name should be in 'server.#channel'
    or '#channel' format."""
    #debug('get_file_by_name: searching for %s' %buffer_name)
    # common mask options
    config_masks = ('logger.mask.irc', 'logger.file.mask')
    # since there's no buffer pointer, we try to replace some local vars in mask, like $channel and
    # $server, then replace the local vars left with '*', and use it as a mask for get the path with
    # get_file_by_pattern
    for config in config_masks:
        mask = weechat.config_string(weechat.config_get(config))
        #debug('get_file_by_name: mask: %s' %mask)
        if '$name' in mask:
            mask = mask.replace('$name', buffer_name)
        elif '$channel' in mask or '$server' in mask:
            if '.' in buffer_name and \
                    '#' not in buffer_name[:buffer_name.find('.')]: # the dot isn't part of the channel name
                #    ^ I'm asuming channel starts with #, i'm lazy.
                server, channel = buffer_name.split('.', 1)
            else:
                server, channel = '*', buffer_name
            if '$channel' in mask:
                mask = mask.replace('$channel', channel)
            if '$server' in mask:
                mask = mask.replace('$server', server)
        # change the unreplaced vars by '*'
        from string import letters
        if '%' in mask:
            # vars for time formatting
            mask = mask.replace('%', '$')
        if '$' in mask:
            masks = mask.split('$')
            masks = map(lambda s: s.lstrip(letters), masks)
            mask = '*'.join(masks)
            if mask[0] != '*':
                mask = '*' + mask
        #debug('get_file_by_name: using mask %s' %mask)
        file = get_file_by_pattern(mask)
        #debug('get_file_by_name: got file %s' %file)
        if file:
            return file
    return None

def get_buffer_by_name(buffer_name):
    """Given a buffer name returns its buffer pointer or None."""
    #debug('get_buffer_by_name: searching for %s' %buffer_name)
    pointer = weechat.buffer_search('', buffer_name)
    if not pointer:
        try:
            infolist = weechat.infolist_get('buffer', '', '')
            while weechat.infolist_next(infolist):
                short_name = weechat.infolist_string(infolist, 'short_name')
                name = weechat.infolist_string(infolist, 'name')
                if buffer_name in (short_name, name):
                    #debug('get_buffer_by_name: found %s' %name)
                    pointer = weechat.buffer_search('', name)
                    return pointer
        finally:
            weechat.infolist_free(infolist)
    #debug('get_buffer_by_name: got %s' %pointer)
    return pointer

def get_all_buffers():
    """Returns list with pointers of all open buffers."""
    buffers = []
    infolist = weechat.infolist_get('buffer', '', '')
    while weechat.infolist_next(infolist):
        buffers.append(weechat.infolist_pointer(infolist, 'pointer'))
    weechat.infolist_free(infolist)
    grep_buffer = weechat.buffer_search('python', SCRIPT_NAME)
    if grep_buffer and grep_buffer in buffers:
        # remove it from list
        del buffers[buffers.index(grep_buffer)]
    return buffers

### Grep ###
def make_regexp(pattern, matchcase=False):
    """Returns a compiled regexp."""
    if pattern in ('.', '.*', '.?', '.+'):
        # because I don't need to use a regexp if we're going to match all lines
        return None
    # matching takes a lot more time if pattern starts or ends with .* and it isn't needed.
    if pattern[:2] == '.*':
        pattern = pattern[2:]
    if pattern[-2:] == '.*':
        pattern = pattern[:-2]
    try:
        if not matchcase:
            regexp = re.compile(pattern, re.IGNORECASE)
        else:
            regexp = re.compile(pattern)
    except Exception, e:
        raise Exception, 'Bad pattern, %s' %e
    return regexp

def check_string(s, regexp, hilight='', exact=False):
    """Checks 's' with a regexp and returns it if is a match."""
    if not regexp:
        return s

    elif exact:
        matchlist = regexp.findall(s)
        if matchlist:
            if isinstance(matchlist[0], tuple):
                # join tuples (when there's more than one match group in regexp)
                return [ ' '.join(t) for t in matchlist ]
            return matchlist

    elif hilight:
        matchlist = regexp.findall(s)
        if matchlist:
            if isinstance(matchlist[0], tuple):
                # flatten matchlist
                matchlist = [ item for L in matchlist for item in L if item ]
            matchlist = list(set(matchlist)) # remove duplicates if any
            # apply hilight
            color_hilight, color_reset = hilight.split(',', 1)
            for m in matchlist:
                s = s.replace(m, '%s%s%s' % (color_hilight, m, color_reset))
            return s

    # no need for findall() here
    elif regexp.search(s):
        return s

def grep_file(file, head, tail, after_context, before_context, count, regexp, hilight, exact, invert):
    """Return a list of lines that match 'regexp' in 'file', if no regexp returns all lines."""
    if count:
        tail = head = after_context = before_context = False
        hilight = ''
    elif exact:
        before_context = after_context = False
        hilight = ''
    elif invert:
        hilight = ''
    #debug(' '.join(map(str, (file, head, tail, after_context, before_context))))

    lines = linesList()
    # define these locally as it makes the loop run slightly faster
    append = lines.append
    count_match = lines.count_match
    separator = lines.append_separator
    if invert:
        def check(s):
            if check_string(s, regexp, hilight, exact):
                return None
            else:
                return s
    else:
        check = lambda s: check_string(s, regexp, hilight, exact)
    
    try:
        file_object = open(file, 'r')
    except IOError:
        # file doesn't exist
        return lines
    if tail or before_context:
        # for these options, I need to seek in the file, but is slower and uses a good deal of
        # memory if the log is too big, so we do this *only* for these options.
        file_lines = file_object.readlines()

        if tail:
            # instead of searching in the whole file and later pick the last few lines, we
            # reverse the log, search until count reached and reverse it again, that way is a lot
            # faster
            file_lines.reverse()
            # don't invert context switches
            before_context, after_context = after_context, before_context

        if before_context:
            before_context_range = range(1, before_context + 1)
            before_context_range.reverse()

        limit = tail or head

        line_idx = 0
        while line_idx < len(file_lines):
            line = file_lines[line_idx]
            line = check(line)
            if line:
                if before_context:
                    separator()
                    trimmed = False
                    for id in before_context_range:
                        try:
                            context_line = file_lines[line_idx - id]
                            if check(context_line):
                                # match in before context, that means we appended these same lines in a
                                # previous match, so we delete them merging both paragraphs
                                if not trimmed:
                                    del lines[id - before_context - 1:]
                                    trimmed = True
                            else:
                                append(context_line)
                        except IndexError:
                            pass
                append(line)
                count_match(line)
                if after_context:
                    id, offset = 0, 0
                    while id < after_context + offset:
                        id += 1
                        try:
                            context_line = file_lines[line_idx + id]
                            _context_line = check(context_line)
                            if _context_line:
                                offset = id
                                context_line = _context_line # so match is hilighted with --hilight
                                count_match()
                            append(context_line)
                        except IndexError:
                            pass
                    separator()
                    line_idx += id
                if limit and lines.matches_count >= limit:
                    break
            line_idx += 1

        if tail:
            lines.reverse()
    else:
        # do a normal grep
        limit = head

        for line in file_object:
            line = check(line)
            if line:
                count or append(line)
                count_match(line)
                if after_context:
                    id, offset = 0, 0
                    while id < after_context + offset:
                        id += 1
                        try:
                            context_line = file_object.next()
                            _context_line = check(context_line)
                            if _context_line:
                                offset = id
                                context_line = _context_line
                                count_match()
                            count or append(context_line)
                        except StopIteration:
                            pass
                    separator()
                if limit and lines.matches_count >= limit:
                    break

    file_object.close()
    return lines

def grep_buffer(buffer, head, tail, after_context, before_context, count, regexp, hilight, exact,
        invert):
    """Return a list of lines that match 'regexp' in 'buffer', if no regexp returns all lines."""
    lines = linesList()
    if count:
        tail = head = after_context = before_context = False
        hilight = ''
    elif exact:
        before_context = after_context = False
    #debug(' '.join(map(str, (tail, head, after_context, before_context, count, exact, hilight))))

    # Using /grep in grep's buffer can lead to some funny effects
    # We should take measures if that's the case
    def make_get_line_funcion():
        """Returns a function for get lines from the infolist, depending if the buffer is grep's or
        not."""
        string_remove_color = weechat.string_remove_color
        infolist_string = weechat.infolist_string
        grep_buffer = weechat.buffer_search('python', SCRIPT_NAME)
        if grep_buffer and buffer == grep_buffer:
            def function(infolist):
                prefix = infolist_string(infolist, 'prefix')
                message = infolist_string(infolist, 'message')
                if prefix: # only our messages have prefix, ignore it
                    return None
                return message
        else:
            infolist_time = weechat.infolist_time
            def function(infolist):
                prefix = string_remove_color(infolist_string(infolist, 'prefix'), '')
                message = string_remove_color(infolist_string(infolist, 'message'), '')
                date = infolist_time(infolist, 'date')
                # since WeeChat 2.2, infolist_time returns a long integer
                # instead of a string
                if not isinstance(date, str):
                    date = time.strftime('%F %T', time.localtime(int(date)))
                return '%s\t%s\t%s' %(date, prefix, message)
        return function
    get_line = make_get_line_funcion()

    infolist = weechat.infolist_get('buffer_lines', buffer, '')
    if tail:
        # like with grep_file() if we need the last few matching lines, we move the cursor to
        # the end and search backwards
        infolist_next = weechat.infolist_prev
        infolist_prev = weechat.infolist_next
    else:
        infolist_next = weechat.infolist_next
        infolist_prev = weechat.infolist_prev
    limit = head or tail

    # define these locally as it makes the loop run slightly faster
    append = lines.append
    count_match = lines.count_match
    separator = lines.append_separator
    if invert:
        def check(s):
            if check_string(s, regexp, hilight, exact):
                return None
            else:
                return s
    else:
        check = lambda s: check_string(s, regexp, hilight, exact)

    if before_context:
        before_context_range = range(1, before_context + 1)
        before_context_range.reverse()

    while infolist_next(infolist):
        line = get_line(infolist)
        if line is None: continue
        line = check(line)
        if line:
            if before_context:
                separator()
                trimmed = False
                for id in before_context_range:
                    if not infolist_prev(infolist):
                        trimmed = True
                for id in before_context_range:
                    context_line = get_line(infolist)
                    if check(context_line):
                        if not trimmed:
                            del lines[id - before_context - 1:]
                            trimmed = True
                    else:
                        append(context_line)
                    infolist_next(infolist)
            count or append(line)
            count_match(line)
            if after_context:
                id, offset = 0, 0
                while id < after_context + offset:
                    id += 1
                    if infolist_next(infolist):
                        context_line = get_line(infolist)
                        _context_line = check(context_line)
                        if _context_line:
                            context_line = _context_line
                            offset = id
                            count_match()
                        append(context_line)
                    else:
                        # in the main loop infolist_next will start again an cause an infinite loop
                        # this will avoid it
                        infolist_next = lambda x: 0
                separator()
            if limit and lines.matches_count >= limit:
                break
    weechat.infolist_free(infolist)

    if tail:
        lines.reverse()
    return lines

### this is our main grep function
hook_file_grep = None
def show_matching_lines():
    """
    Greps buffers in search_in_buffers or files in search_in_files and updates grep buffer with the
    result.
    """
    global pattern, matchcase, number, count, exact, hilight, invert
    global tail, head, after_context, before_context
    global search_in_files, search_in_buffers, matched_lines, home_dir
    global time_start
    matched_lines = linesDict()
    #debug('buffers:%s \nlogs:%s' %(search_in_buffers, search_in_files))
    time_start = now()

    # buffers
    if search_in_buffers:
        regexp = make_regexp(pattern, matchcase)
        for buffer in search_in_buffers:
            buffer_name = weechat.buffer_get_string(buffer, 'name')
            matched_lines[buffer_name] = grep_buffer(buffer, head, tail, after_context,
                    before_context, count, regexp, hilight, exact, invert)

    # logs
    if search_in_files:
        size_limit = get_config_int('size_limit', allow_empty_string=True)
        background = False
        if size_limit or size_limit == 0:
            size = sum(map(get_size, search_in_files))
            if size > size_limit * 1024:
                background = True
        elif size_limit == '':
            background = False

        regexp = make_regexp(pattern, matchcase)

        global grep_options, log_pairs
        grep_options = (head, tail, after_context, before_context,
                        count, regexp, hilight, exact, invert)

        log_pairs = [(strip_home(log), log) for log in search_in_files]

        if not background:
            # run grep normally
            for log_name, log in log_pairs:
                matched_lines[log_name] = grep_file(log, *grep_options)
            buffer_update()
        else:
            global hook_file_grep, grep_stdout, grep_stderr, pattern_tmpl
            grep_stdout = grep_stderr = ''
            hook_file_grep = weechat.hook_process(
                'func:grep_process',
                get_config_int('timeout_secs') * 1000,
                'grep_process_cb',
                ''
            )
            if hook_file_grep:
                buffer_create("Searching for '%s' in %s worth of data..." % (
                    pattern_tmpl,
                    human_readable_size(size)
                ))
    else:
        buffer_update()


def grep_process(*args):
    result = {}
    try:
        global grep_options, log_pairs
        for log_name, log in log_pairs:
            result[log_name] = grep_file(log, *grep_options)
    except Exception, e:
        result = e

    return pickle.dumps(result)

grep_stdout = grep_stderr = ''

def grep_process_cb(data, command, return_code, out, err):
    global grep_stdout, grep_stderr, matched_lines, hook_file_grep

    grep_stdout += out
    grep_stderr += err

    def set_buffer_error(message):
        error(message)
        grep_buffer = buffer_create()
        title = weechat.buffer_get_string(grep_buffer, 'title')
        title = title + ' %serror' % color_title
        weechat.buffer_set(grep_buffer, 'title', title)

    if return_code == weechat.WEECHAT_HOOK_PROCESS_ERROR:
        set_buffer_error("Background grep timed out")
        hook_file_grep = None
        return WEECHAT_RC_OK

    elif return_code >= 0:
        hook_file_grep = None
        if grep_stderr:
            set_buffer_error(grep_stderr)
            return WEECHAT_RC_OK

        try:
            data = pickle.loads(grep_stdout)
            if isinstance(data, Exception):
                raise data
            matched_lines.update(data)
        except Exception, e:
            set_buffer_error(repr(e))
            return WEECHAT_RC_OK
        else:
            buffer_update()

    return WEECHAT_RC_OK

def get_grep_file_status():
    global search_in_files, matched_lines, time_start
    elapsed = now() - time_start
    if len(search_in_files) == 1:
        log = '%s (%s)' %(strip_home(search_in_files[0]),
                human_readable_size(get_size(search_in_files[0])))
    else:
        size = sum(map(get_size, search_in_files))
        log = '%s log files (%s)' %(len(search_in_files), human_readable_size(size))
    return 'Searching in %s, running for %.4f seconds. Interrupt it with "/grep stop" or "stop"' \
        ' in grep buffer.' %(log, elapsed)

### Grep buffer ###
def buffer_update():
    """Updates our buffer with new lines."""
    global pattern_tmpl, matched_lines, pattern, count, hilight, invert, exact
    time_grep = now()

    buffer = buffer_create()
    if get_config_boolean('clear_buffer'):
        weechat.buffer_clear(buffer)
    matched_lines.strip_separator() # remove first and last separators of each list
    len_total_lines = len(matched_lines)
    max_lines = get_config_int('max_lines')
    if not count and len_total_lines > max_lines:
        weechat.buffer_clear(buffer)

    def _make_summary(log, lines, note):
        return '%s matches "%s%s%s"%s in %s%s%s%s' \
                %(lines.matches_count, color_summary, pattern_tmpl, color_info,
                  invert and ' (inverted)' or '',
                  color_summary, log, color_reset, note)

    if count:
        make_summary = lambda log, lines : _make_summary(log, lines, ' (not shown)')
    else:
        def make_summary(log, lines):
            if lines.stripped_lines:
                if lines:
                    note = ' (last %s lines shown)' %len(lines)
                else:
                    note = ' (not shown)'
            else:
                note = ''
            return _make_summary(log, lines, note)

    global weechat_format
    if hilight:
        # we don't want colors if there's match highlighting
        format_line = lambda s : '%s %s %s' %split_line(s)
    else:
        def format_line(s):
            global nick_dict, weechat_format
            date, nick, msg = split_line(s)
            if weechat_format:
                try:
                    nick = nick_dict[nick]
                except KeyError:
                    # cache nick
                    nick_c = color_nick(nick)
                    nick_dict[nick] = nick_c
                    nick = nick_c
                return '%s%s %s%s %s' %(color_date, date, nick, color_reset, msg)
            else:
                #no formatting
                return msg

    prnt(buffer, '\n')
    print_line('Search for "%s%s%s"%s in %s%s%s.' %(color_summary, pattern_tmpl, color_info,
        invert and ' (inverted)' or '', color_summary, matched_lines, color_reset),
            buffer)
    # print last <max_lines> lines
    if matched_lines.get_matches_count():
        if count:
            # with count we sort by matches lines instead of just lines.
            matched_lines_items = matched_lines.items_count()
        else:
            matched_lines_items = matched_lines.items()

        matched_lines.get_last_lines(max_lines)
        for log, lines in matched_lines_items:
            if lines.matches_count:
                # matched lines
                if not count:
                    # print lines
                    weechat_format = True
                    if exact:
                        lines.onlyUniq()
                    for line in lines:
                        #debug(repr(line))
                        if line == linesList._sep:
                            # separator
                            prnt(buffer, context_sep)
                        else:
                            if '\x00' in line:
                                # log was corrupted
                                error("Found garbage in log '%s', maybe it's corrupted" %log)
                                line = line.replace('\x00', '')
                            prnt_date_tags(buffer, 0, 'no_highlight', format_line(line))

                # summary
                if count or get_config_boolean('show_summary'):
                    summary = make_summary(log, lines)
                    print_line(summary, buffer)

            # separator
            if not count and lines:
                prnt(buffer, '\n')
    else:
        print_line('No matches found.', buffer)

    # set title
    global time_start
    time_end = now()
    # total time
    time_total = time_end - time_start
    # percent of the total time used for grepping
    time_grep_pct = (time_grep - time_start)/time_total*100
    #debug('time: %.4f seconds (%.2f%%)' %(time_total, time_grep_pct))
    if not count and len_total_lines > max_lines:
        note = ' (last %s lines shown)' %len(matched_lines)
    else:
        note = ''
    title = "'q': close buffer | Search in %s%s%s %s matches%s | pattern \"%s%s%s\"%s %s | %.4f seconds (%.2f%%)" \
            %(color_title, matched_lines, color_reset, matched_lines.get_matches_count(), note,
              color_title, pattern_tmpl, color_reset, invert and ' (inverted)' or '', format_options(),
              time_total, time_grep_pct)
    weechat.buffer_set(buffer, 'title', title)

    if get_config_boolean('go_to_buffer'):
        weechat.buffer_set(buffer, 'display', '1')

    # free matched_lines so it can be removed from memory
    del matched_lines
    
def split_line(s):
    """Splits log's line 's' in 3 parts, date, nick and msg."""
    global weechat_format
    if weechat_format and s.count('\t') >= 2:
        date, nick, msg = s.split('\t', 2) # date, nick, message
    else:
        # looks like log isn't in weechat's format
        weechat_format = False # incoming lines won't be formatted
        date, nick, msg = '', '', s
    # remove tabs
    if '\t' in msg:
        msg = msg.replace('\t', '    ')
    return date, nick, msg

def print_line(s, buffer=None, display=False):
    """Prints 's' in script's buffer as 'script_nick'. For displaying search summaries."""
    if buffer is None:
        buffer = buffer_create()
    say('%s%s' %(color_info, s), buffer)
    if display and get_config_boolean('go_to_buffer'):
        weechat.buffer_set(buffer, 'display', '1')

def format_options():
    global matchcase, number, count, exact, hilight, invert
    global tail, head, after_context, before_context
    options = []
    append = options.append
    insert = options.insert
    chars = 'cHmov'
    for i, flag in enumerate((count, hilight, matchcase, exact, invert)):
        if flag:
            append(chars[i])

    if head or tail:
        n = get_config_int('default_tail_head')
        if head:
            append('h')
            if head != n:
                insert(-1, ' -')
                append('n')
                append(head)
        elif tail:
            append('t')
            if tail != n:
                insert(-1, ' -')
                append('n')
                append(tail)

    if before_context and after_context and (before_context == after_context):
        append(' -C')
        append(before_context)
    else:
        if before_context:
            append(' -B')
            append(before_context)
        if after_context:
            append(' -A')
            append(after_context)

    s = ''.join(map(str, options)).strip()
    if s and s[0] != '-':
        s = '-' + s
    return s

def buffer_create(title=None):
    """Returns our buffer pointer, creates and cleans the buffer if needed."""
    buffer = weechat.buffer_search('python', SCRIPT_NAME)
    if not buffer:
        buffer = weechat.buffer_new(SCRIPT_NAME, 'buffer_input', '', '', '')
        weechat.buffer_set(buffer, 'time_for_each_line', '0')
        weechat.buffer_set(buffer, 'nicklist', '0')
        weechat.buffer_set(buffer, 'title', title or 'grep output buffer')
        weechat.buffer_set(buffer, 'localvar_set_no_log', '1')
    elif title:
        weechat.buffer_set(buffer, 'title', title)
    return buffer

def buffer_input(data, buffer, input_data):
    """Repeats last search with 'input_data' as regexp."""
    try:
        cmd_grep_stop(buffer, input_data)
    except:
        return WEECHAT_RC_OK
    if input_data in ('q', 'Q'):
        weechat.buffer_close(buffer)
        return weechat.WEECHAT_RC_OK

    global search_in_buffers, search_in_files
    global pattern
    try:
        if pattern and (search_in_files or search_in_buffers):
            # check if the buffer pointers are still valid
            for pointer in search_in_buffers:
                infolist = weechat.infolist_get('buffer', pointer, '')
                if not infolist:
                    del search_in_buffers[search_in_buffers.index(pointer)]
                weechat.infolist_free(infolist)
            try:
                cmd_grep_parsing(input_data)
            except Exception, e:
                error('Argument error, %s' %e, buffer=buffer)
                return WEECHAT_RC_OK
            try:
                show_matching_lines()
            except Exception, e:
                error(e)
    except NameError:
        error("There isn't any previous search to repeat.", buffer=buffer)
    return WEECHAT_RC_OK

### Commands ###
def cmd_init():
    """Resets global vars."""
    global home_dir, cache_dir, nick_dict
    global pattern_tmpl, pattern, matchcase, number, count, exact, hilight, invert
    global tail, head, after_context, before_context
    hilight = ''
    head = tail = after_context = before_context = invert = False
    matchcase = count = exact = False
    pattern_tmpl = pattern = number = None
    home_dir = get_home()
    cache_dir = {} # for avoid walking the dir tree more than once per command
    nick_dict = {} # nick cache for don't calculate nick color every time

def cmd_grep_parsing(args):
    """Parses args for /grep and grep input buffer."""
    global pattern_tmpl, pattern, matchcase, number, count, exact, hilight, invert
    global tail, head, after_context, before_context
    global log_name, buffer_name, only_buffers, all
    opts, args = getopt.gnu_getopt(args.split(), 'cmHeahtivn:bA:B:C:o', ['count', 'matchcase', 'hilight',
        'exact', 'all', 'head', 'tail', 'number=', 'buffer', 'after-context=', 'before-context=',
        'context=', 'invert', 'only-match'])
    #debug(opts, 'opts: '); debug(args, 'args: ')
    if len(args) >= 2:
        if args[0] == 'log':
            del args[0]
            log_name = args.pop(0)
        elif args[0] == 'buffer':
            del args[0]
            buffer_name = args.pop(0)

    def tmplReplacer(match):
        """This function will replace templates with regexps"""
        s = match.groups()[0]
        tmpl_args = s.split()
        tmpl_key, _, tmpl_args = s.partition(' ')
        try:
            template = templates[tmpl_key]
            if callable(template):
                r = template(tmpl_args)
                if not r:
                    error("Template %s returned empty string "\
                          "(WeeChat doesn't have enough data)." %t)
                return r
            else:
                return template
        except:
            return t

    args = ' '.join(args) # join pattern for keep spaces
    if args:
        pattern_tmpl = args  
        pattern = _tmplRe.sub(tmplReplacer, args)
        debug('Using regexp: %s', pattern)
    if not pattern:
        raise Exception, 'No pattern for grep the logs.'

    def positive_number(opt, val):
        try:
            number = int(val)
            if number < 0:
                raise ValueError
            return number
        except ValueError:
            if len(opt) == 1:
                opt = '-' + opt
            else:
                opt = '--' + opt
            raise Exception, "argument for %s must be a positive integer." %opt

    for opt, val in opts:
        opt = opt.strip('-')
        if opt in ('c', 'count'):
            count = not count
        elif opt in ('m', 'matchcase'):
            matchcase = not matchcase
        elif opt in ('H', 'hilight'):
            # hilight must be always a string!
            if hilight:
                hilight = ''
            else:
                hilight = '%s,%s' %(color_hilight, color_reset)
            # we pass the colors in the variable itself because check_string() must not use
            # weechat's module when applying the colors (this is for grep in a hooked process)
        elif opt in ('e', 'exact', 'o', 'only-match'):
            exact = not exact
            invert = False
        elif opt in ('a', 'all'):
            all = not all
        elif opt in ('h', 'head'):
            head = not head
            tail = False
        elif opt in ('t', 'tail'):
            tail = not tail
            head = False
        elif opt in ('b', 'buffer'):
            only_buffers = True
        elif opt in ('n', 'number'):
            number = positive_number(opt, val)
        elif opt in ('C', 'context'):
            n = positive_number(opt, val)
            after_context = n
            before_context = n
        elif opt in ('A', 'after-context'):
            after_context = positive_number(opt, val)
        elif opt in ('B', 'before-context'):
            before_context = positive_number(opt, val)
        elif opt in ('i', 'v', 'invert'):
            invert = not invert
            exact = False
    # number check
    if number is not None:
        if number == 0:
            head = tail = False
            number = None
        elif head:
            head = number
        elif tail:
            tail = number
    else:
        n = get_config_int('default_tail_head')
        if head:
            head = n
        elif tail:
            tail = n

def cmd_grep_stop(buffer, args):
    global hook_file_grep, pattern, matched_lines
    if hook_file_grep:
        if args == 'stop':
            weechat.unhook(hook_file_grep)
            hook_file_grep = None

            s = 'Search for \'%s\' stopped.' % pattern
            say(s, buffer)
            grep_buffer = weechat.buffer_search('python', SCRIPT_NAME)
            if grep_buffer:
                weechat.buffer_set(grep_buffer, 'title', s)
            matched_lines = {}
        else:
            say(get_grep_file_status(), buffer)
        raise Exception

def cmd_grep(data, buffer, args):
    """Search in buffers and logs."""
    global pattern, matchcase, head, tail, number, count, exact, hilight
    try:
        cmd_grep_stop(buffer, args)
    except:
        return WEECHAT_RC_OK

    if not args:
        weechat.command('', '/help %s' %SCRIPT_COMMAND)
        return WEECHAT_RC_OK

    cmd_init()
    global log_name, buffer_name, only_buffers, all
    log_name = buffer_name = ''
    only_buffers = all = False

    # parse
    try:
        cmd_grep_parsing(args)
    except Exception, e:
        error('Argument error, %s' %e)
        return WEECHAT_RC_OK

    # find logs
    log_file = search_buffer = None
    if log_name:
        log_file = get_file_by_pattern(log_name, all)
        if not log_file:
            error("Couldn't find any log for %s. Try /logs" %log_name)
            return WEECHAT_RC_OK
    elif all:
        search_buffer = get_all_buffers()
    elif buffer_name:
        search_buffer = get_buffer_by_name(buffer_name)
        if not search_buffer:
            # there's no buffer, try in the logs
            log_file = get_file_by_name(buffer_name)
            if not log_file:
                error("Logs or buffer for '%s' not found." %buffer_name)
                return WEECHAT_RC_OK
        else:
            search_buffer = [search_buffer]
    else:
        search_buffer = [buffer]

    # make the log list
    global search_in_files, search_in_buffers
    search_in_files = []
    search_in_buffers = []
    if log_file:
        search_in_files = log_file
    elif not only_buffers:
        #debug(search_buffer)
        for pointer in search_buffer:
            log = get_file_by_buffer(pointer)
            #debug('buffer %s log %s' %(pointer, log))
            if log:
                search_in_files.append(log)
            else:
                search_in_buffers.append(pointer)
    else:
        search_in_buffers = search_buffer

    # grepping
    try:
        show_matching_lines()
    except Exception, e:
        error(e)
    return WEECHAT_RC_OK

def cmd_logs(data, buffer, args):
    """List files in Weechat's log dir."""
    cmd_init()
    global home_dir
    sort_by_size = False
    filter = []

    try:
        opts, args = getopt.gnu_getopt(args.split(), 's', ['size'])
        if args:
            filter = args
        for opt, var in opts:
            opt = opt.strip('-')
            if opt in ('size', 's'):
                sort_by_size = True
    except Exception, e:
        error('Argument error, %s' %e)
        return WEECHAT_RC_OK

    # is there's a filter, filter_excludes should be False
    file_list = dir_list(home_dir, filter, filter_excludes=not filter)
    if sort_by_size:
        file_list.sort(key=get_size)
    else:
        file_list.sort()

    file_sizes = map(lambda x: human_readable_size(get_size(x)), file_list)
    # calculate column lenght
    if file_list:
        L = file_list[:]
        L.sort(key=len)
        bigest = L[-1]
        column_len = len(bigest) + 3
    else:
        column_len = ''

    buffer = buffer_create()
    if get_config_boolean('clear_buffer'):
        weechat.buffer_clear(buffer)
    file_list = zip(file_list, file_sizes)
    msg = 'Found %s logs.' %len(file_list)

    print_line(msg, buffer, display=True)
    for file, size in file_list:
        separator = column_len and '.'*(column_len - len(file))
        prnt(buffer, '%s %s %s' %(strip_home(file), separator, size))
    if file_list:
        print_line(msg, buffer)
    return WEECHAT_RC_OK


### Completion ###
def completion_log_files(data, completion_item, buffer, completion):
    #debug('completion: %s' %', '.join((data, completion_item, buffer, completion)))
    global home_dir
    l = len(home_dir)
    completion_list_add = weechat.hook_completion_list_add
    WEECHAT_LIST_POS_END = weechat.WEECHAT_LIST_POS_END
    for log in dir_list(home_dir):
        completion_list_add(completion, log[l:], 0, WEECHAT_LIST_POS_END)
    return WEECHAT_RC_OK

def completion_grep_args(data, completion_item, buffer, completion):
    for arg in ('count', 'all', 'matchcase', 'hilight', 'exact', 'head', 'tail', 'number', 'buffer',
            'after-context', 'before-context', 'context', 'invert', 'only-match'):
        weechat.hook_completion_list_add(completion, '--' + arg, 0, weechat.WEECHAT_LIST_POS_SORT)
    for tmpl in templates:
        weechat.hook_completion_list_add(completion, '%{' + tmpl, 0, weechat.WEECHAT_LIST_POS_SORT)
    return WEECHAT_RC_OK


### Templates ###
# template placeholder
_tmplRe = re.compile(r'%\{(\w+.*?)(?:\}|$)')
# will match 999.999.999.999 but I don't care
ipAddress = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
domain = r'[\w-]{2,}(?:\.[\w-]{2,})*\.[a-z]{2,}'
url = r'\w+://(?:%s|%s)(?::\d+)?(?:/[^\])>\s]*)?' % (domain, ipAddress)

def make_url_regexp(args):
    #debug('make url: %s', args)
    if args:
        words = r'(?:%s)' %'|'.join(map(re.escape, args.split()))
        return r'(?:\w+://|www\.)[^\s]*%s[^\s]*(?:/[^\])>\s]*)?' %words
    else:
        return url

def make_simple_regexp(pattern):
    s = ''
    for c in pattern:
        if c == '*':
            s += '.*'
        elif c == '?':
            s += '.'
        else:
            s += re.escape(c)
    return s

templates = {
            'ip': ipAddress,
           'url': make_url_regexp,
        'escape': lambda s: re.escape(s),
        'simple': make_simple_regexp,
        'domain': domain,
        }

### Main ###
def delete_bytecode():
    global script_path
    bytecode = path.join(script_path, SCRIPT_NAME + '.pyc')
    if path.isfile(bytecode):
        os.remove(bytecode)
    return WEECHAT_RC_OK

if __name__ == '__main__' and import_ok and \
        weechat.register(SCRIPT_NAME, SCRIPT_AUTHOR, SCRIPT_VERSION, SCRIPT_LICENSE, \
        SCRIPT_DESC, 'delete_bytecode', ''):
    home_dir = get_home()

    # for import ourselves
    global script_path
    script_path = path.dirname(__file__)
    sys.path.append(script_path)
    delete_bytecode()

    # check python version
    import sys
    global bytecode
    if sys.version_info > (2, 6):
        bytecode = 'B'
    else:
        bytecode = ''


    weechat.hook_command(SCRIPT_COMMAND, cmd_grep.__doc__,
            "[log <file> | buffer <name> | stop] [-a|--all] [-b|--buffer] [-c|--count] [-m|--matchcase] "
            "[-H|--hilight] [-o|--only-match] [-i|-v|--invert] [(-h|--head)|(-t|--tail) [-n|--number <n>]] "
            "[-A|--after-context <n>] [-B|--before-context <n>] [-C|--context <n> ] <expression>",
# help
"""
     log <file>: Search in one log that matches <file> in the logger path.
                 Use '*' and '?' as wildcards.
  buffer <name>: Search in buffer <name>, if there's no buffer with <name> it will
                 try to search for a log file.
           stop: Stops a currently running search.
       -a --all: Search in all open buffers.
                 If used with 'log <file>' search in all logs that matches <file>.
    -b --buffer: Search only in buffers, not in file logs.
     -c --count: Just count the number of matched lines instead of showing them.
 -m --matchcase: Don't do case insensitive search.
   -H --hilight: Colour exact matches in output buffer.
-o --only-match: Print only the matching part of the line (unique matches).
 -v -i --invert: Print lines that don't match the regular expression.
      -t --tail: Print the last 10 matching lines.
      -h --head: Print the first 10 matching lines.
-n --number <n>: Overrides default number of lines for --tail or --head.
-A --after-context <n>: Shows <n> lines of trailing context after matching lines.
-B --before-context <n>: Shows <n> lines of leading context before matching lines.
-C --context <n>: Same as using both --after-context and --before-context simultaneously.
  <expression>: Expression to search.

Grep buffer:
  Input line accepts most arguments of /grep, it'll repeat last search using the new
  arguments provided. You can't search in different logs from the buffer's input.
  Boolean arguments like --count, --tail, --head, --hilight, ... are toggleable

Python regular expression syntax:
  See http://docs.python.org/lib/re-syntax.html

Grep Templates:
     %{url [text]}: Matches anything like an url, or an url with text.
             %{ip}: Matches anything that looks like an ip.
         %{domain}: Matches anything like a domain.
    %{escape text}: Escapes text in pattern.
 %{simple pattern}: Converts a pattern with '*' and '?' wildcards into a regexp.

Examples:
  Search for urls with the word 'weechat' said by 'nick'
    /grep nick\\t.*%{url weechat}
  Search for '*.*' string
    /grep %{escape *.*}
""",
            # completion template
            "buffer %(buffers_names) %(grep_arguments)|%*"
            "||log %(grep_log_files) %(grep_arguments)|%*"
            "||stop"
            "||%(grep_arguments)|%*",
            'cmd_grep' ,'')
    weechat.hook_command('logs', cmd_logs.__doc__, "[-s|--size] [<filter>]",
            "-s --size: Sort logs by size.\n"
            " <filter>: Only show logs that match <filter>. Use '*' and '?' as wildcards.", '--size', 'cmd_logs', '')

    weechat.hook_completion('grep_log_files', "list of log files",
            'completion_log_files', '')
    weechat.hook_completion('grep_arguments', "list of arguments",
            'completion_grep_args', '')

    # settings
    for opt, val in settings.iteritems():
        if not weechat.config_is_set_plugin(opt):
            weechat.config_set_plugin(opt, val)

    # colors
    color_date        = weechat.color('brown')
    color_info        = weechat.color('cyan')
    color_hilight     = weechat.color('lightred')
    color_reset       = weechat.color('reset')
    color_title       = weechat.color('yellow')
    color_summary     = weechat.color('lightcyan')
    color_delimiter   = weechat.color('chat_delimiters')
    color_script_nick = weechat.color('chat_nick')
    
    # pretty [grep]
    script_nick = '%s[%s%s%s]%s' %(color_delimiter, color_script_nick, SCRIPT_NAME, color_delimiter,
            color_reset)
    script_nick_nocolor = '[%s]' %SCRIPT_NAME
    # paragraph separator when using context options
    context_sep = '%s\t%s--' %(script_nick, color_info)

    # -------------------------------------------------------------------------
    # Debug

    if weechat.config_get_plugin('debug'):
        try:
            # custom debug module I use, allows me to inspect script's objects.
            import pybuffer
            debug = pybuffer.debugBuffer(globals(), '%s_debug' % SCRIPT_NAME)
        except:
            def debug(s, *args):
                if not isinstance(s, basestring):
                    s = str(s)
                if args:
                    s = s %args
                prnt('', '%s\t%s' %(script_nick, s))
    else:
        def debug(*args):
            pass

# vim:set shiftwidth=4 tabstop=4 softtabstop=4 expandtab textwidth=100: