Censorship banner.png

Help:RegexpParserFunctions

From Wiki - Hipatia
Jump to: navigation, search
The help pages are provided by Hipatia volunteers to help wiki contributors.
Help index: Mediawiki Server configuration LaTeX Streaming Templates

Search for help:


Functions

This module defines these functions: len, pos, rpos, sub, replace and explode. Plus, the extension RegexpParersFunctions adds the function regex.

All of these functions operate in O(n) time complexity, making them safe against DoS attacks.

Note:

  1. Some parameters of these functions are limited through global settings to prevent abuse. See section Limits below.
  2. For functions that are case sensitive, you may use the magic word {{lc:your_string_here}} as a workaround in some cases.

#regex:

Once installed, editors of your wiki can evaluate regular expressions in one of two ways: simple match, and replacement.

For example, say you're trying to grab the last portion of a Title which is using '/' delimiting subpage notation. For that, you could use:

{{#regex:{{PAGENAME}}|%^.*/(.*)$%|$1}}

#replaceset:

{{#replaceset: text to replace | regex pattern or string to be replaced = replacement | ... }}

Regex patterns are wrapped in !pattern! #pattern# (pattern) [pattern] or {pattern}, and may be followed with any of the flags "imsxADU" (see php's documentation on pcre modifiers for what they individually do). Patterns that do not use any of those will be considered plaintext replacements such as "|A=B|" (replace all occurrences of "A" with "B").

example:

{{#replaceset:Text to replace|/(\w+)/i="\1"|to=2}}

Would produce "Text" "2" "replace".

#len:

The #len function returns the length of the given string. The syntax is:

{{#len:string}}

The return value is always a number of characters in the string. If no string is specified, the return value is zero.

Notes:

  • Trailing spaces are not counted. Example: {{#len:Icecream }} returns 8.
  • This function is safe with utf-8 multibyte characters. Example: {{#len:Žmržlina}} returns 8.
  • Tags such as <nowiki> and other tag extensions will always have a length of zero, since their content is hidden from the parser. Example: {{#len:<nowiki>This is a </nowiki>test}} returns 4.

#pos:

The #pos function returns the position of a given search term within the string. The syntax is:

{{#pos:string|search term|offset}}

The offset parameter, if specified, tells a starting position where this function should begin searching.

If the search term is found, the return value is a zero-based integer of the first position within the string. If the search term is not found, the function returns an empty string.

Notes:

  • This function is case sensitive.
  • The maximum allowed length of the search term is limited through the $wgStringFunctionsLimitSearch global setting.
  • This function is safe with utf-8 multibyte characters. Example: {{#pos:Žmržlina|lina}} returns 4.
  • As with #len, <nowiki> and other tag extensions are treated as having a length of 1 for the purposes of character position. Example: {{#pos:<nowiki>This is a </nowiki>test|test}} returns 1.

#rpos:

The #rpos function returns the last position of a given search term within the string. The syntax is:

 {{#rpos:string|search term}}

If the search term is found, the return value is a zero-based integer of its last position within the string. If the search term is not found, the function returns -1.

Tip: When using this to search for the last delimiter, add +1 to the result to retrieve position after the last delimiter. This also works when the delimiter is not found, because "-1 + 1" is zero, which is the beginning of the given value.

Notes:

  • This function is case sensitive.
  • The maximum allowed length of the search term is limited through the $wgStringFunctionsLimitSearch global setting.
  • This function is safe with utf-8 multibyte characters. Example: {{#rpos:Žmržlina|lina}} returns 4.
  • As with #len, <nowiki> and other tag extensions are treated as having a length of 1 for the purposes of character position. Example: {{#rpos:<nowiki>This is a </nowiki>test|test}} returns 1.

#sub:

The #sub function returns a substring from the given string. The syntax is:

{{#sub:string|start|length}}

The start parameter, if positive (or zero), specifies a zero-based index of the first character to be returned. Example: {{#sub:Icecream|3}} returns cream
{{#sub:Icecream|0|3}} returns Ice.

If the start parameter is negative, it specifies how many characters from the end should be returned. Example: {{#sub:Icecream|-3}} returns eam.

The length parameter, if present and positive, specifies the maximum length of the returned string. Example: {{#sub:Icecream|3|3}} returns cre.

If the length parameter is negative, it specifies how many characters will be omitted from the end of the string. Example: {{#sub:Icecream|3|-3}} returns cr.

Notes:

  • If the length parameter is zero, it is not used for truncation at all.
    • Example: {{#sub:Icecream|3|0}} returns cream, {{#sub:Icecream|0|3}} returns Ice
  • If start denotes a position beyond the truncation from the end by negative length parameter, an empty string will be returned.
    • Example: {{#sub:Icecream|3|-6}} returns an empty string.
  • This function is safe with utf-8 multibyte characters. Example: {{#sub:Žmržlina|3}} returns žlina.
  • As with #len, <nowiki> and other tag extensions are treated as having a length of 1 for the purposes of character position. Example: {{#sub:<nowiki>This is a </nowiki>test|1}} returns test.
  • If your string contains a colon ("like:this"), removing just the text that precedes the colon has the effect of putting the remaining text indented on a new line. i.e. {{#sub:me:test|2|0}}.

#replace:

The #replace function returns the given string with all occurrences of a search term replaced with a replacement term.

{{#replace:string|search term|replacement term}}

If the search term is unspecified or empty, a single space will be searched for.

If the replacement term is unspecified or empty, all occurrences of the search term will be removed from the string.

Notes:

  • This function is case sensitive.
  • The maximum allowed length of the search term is limited through the $wgStringFunctionsLimitSearch global setting.
  • The maximum allowed length of the replacement term is limited through the $wgStringFunctionsLimitReplace global setting.
  • Even if the replacement term is a space, an empty string is used. This is a side-effect of the MediaWiki parser. To use a space as the replacement term, put it in nowiki tags.
    • Example: {{#replace:My_little_home_page|_|<nowiki> </nowiki>}} returns My little home page.
    • Note that this is the only acceptable use of nowiki in the replacement term, as otherwise nowiki could be used to bypass $wgStringFunctionsLimitReplace, injecting an arbitrarily large number of characters into the output. For this reason, all occurrences of <nowiki> or any other tag extension within the replacement term are replaced with spaces.
  • This function is safe with utf-8 multibyte characters. Example: {{#replace:Žmržlina|ž|z}} returns Žmrzlina.
Case insensitive replace

Currently the syntax doesn't provide a switch to toggle case sensitivity setting. But you may make use of magic words of formatting (e.g. {{lc:your_string_here}} ) as a workaround. For example if you want to remove the word "Category:" from the string regardless of its case, you may type:

{{#replace:{{lc:{{{1}}}}}|category:|}}

But the disadvantage is the output will become all lower cases. If you want to keep the casing after replacement, you have to use multiple nesting level (i.e. multiple replace calls) to achieve the same thing.

#explode:

The #explode functions splits the given string into pieces and then returns one of the pieces. The syntax is:

{{#explode:string|delimiter|position}}

The delimiter parameter specifies a string to be used to divide the string into pieces. This delimiter string is then not part of any piece, and when two delimiter strings are next to each other, they create an empty piece between them. If this parameter is not specified, a single space is used.

The position parameter specifies which piece is to be returned. Pieces are counted from 0. If this parameter is not specified, the first piece is used (piece with number 0). When a negative value is used as position, the pieces are counted from the end. In this case, piece number -1 means the last piece. Examples:

  • {{#explode:And if you tolerate this| |2}} returns you.
  • {{#explode:String/Functions/Code|/|-1}} returns Code.
  • {{#explode:Split%By%Percentage%Signs|%|2}} returns Percentage.

The return value is the position-th piece. If there are fewer pieces than the position specifies, an empty string is returned.

Notes:

  • This function is case sensitive.
  • The maximum allowed length of the delimiter is limited through $wgStringFunctionsLimitSearch global setting.
  • This function is safe with utf-8 multibyte characters. Example: {{#explode:Žmržlina|ž|1}} returns lina.