Path: blob/master/src/infrastructure/markup/PhutilRemarkupBlockStorage.php
12241 views
<?php12/**3* Remarkup prevents several classes of text-processing problems by replacing4* tokens in the text as they are marked up. For example, if you write something5* like this:6*7* //D12//8*9* It is processed in several stages. First the "D12" matches and is replaced10* with a token, in the form of "<0x01><ID number><literal "Z">". The first11* byte, "<0x01>" is a single byte with value 1 that marks a token. If this is12* token ID "444", the text may now look like this:13*14* //<0x01>444Z//15*16* Now the italics match and are replaced, using the next token ID:17*18* <0x01>445Z19*20* When processing completes, all the tokens are replaced with their final21* equivalents. For example, token 444 is evaluated to:22*23* <a href="http://...">...</a>24*25* Then token 445 is evaluated:26*27* <em><0x01>444Z</em>28*29* ...and all tokens it contains are replaced:30*31* <em><a href="http://...">...</a></em>32*33* If we didn't do this, the italics rule could match the "//" in "http://",34* or any other number of processing mistakes could occur, some of which create35* security risks.36*37* This class generates keys, and stores the map of keys to replacement text.38*/39final class PhutilRemarkupBlockStorage extends Phobject {4041const MAGIC_BYTE = "\1";4243private $map = array();44private $index = 0;4546public function store($text) {47$key = self::MAGIC_BYTE.(++$this->index).'Z';48$this->map[$key] = $text;49return $key;50}5152public function restore($corpus, $text_mode = false) {53$map = $this->map;5455if (!$text_mode) {56foreach ($map as $key => $content) {57$map[$key] = phutil_escape_html($content);58}59$corpus = phutil_escape_html($corpus);60}6162// NOTE: Tokens may contain other tokens: for example, a table may have63// links inside it. So we can't do a single simple find/replace, because64// we need to find and replace child tokens inside the content of parent65// tokens.6667// However, we know that rules which have child tokens must always store68// all their child tokens first, before they store their parent token: you69// have to pass the "store(text)" API a block of text with tokens already70// in it, so you must have created child tokens already.7172// Thus, all child tokens will appear in the list before parent tokens, so73// if we start at the beginning of the list and replace all the tokens we74// find in each piece of content, we'll end up expanding all subtokens75// correctly.7677$map[] = $corpus;78$seen = array();79foreach ($map as $key => $content) {80$seen[$key] = true;8182// If the content contains no token magic, we don't need to replace83// anything.84if (strpos($content, self::MAGIC_BYTE) === false) {85continue;86}8788$matches = null;89preg_match_all(90'/'.self::MAGIC_BYTE.'\d+Z/',91$content,92$matches,93PREG_OFFSET_CAPTURE);9495$matches = $matches[0];9697// See PHI1114. We're replacing all the matches in one pass because this98// is significantly faster than doing "substr_replace()" in a loop if the99// corpus is large and we have a large number of matches.100101// Build a list of string pieces in "$parts" by interleaving the102// plain strings between each token and the replacement token text, then103// implode the whole thing when we're done.104105$parts = array();106$pos = 0;107foreach ($matches as $next) {108$subkey = $next[0];109110// If we've matched a token pattern but don't actually have any111// corresponding token, just skip this match. This should not be112// possible, and should perhaps be an error.113if (!isset($seen[$subkey])) {114if (!isset($map[$subkey])) {115throw new Exception(116pht(117'Matched token key "%s" while processing remarkup block, but '.118'this token does not exist in the token map.',119$subkey));120} else {121throw new Exception(122pht(123'Matched token key "%s" while processing remarkup block, but '.124'this token appears later in the list than the key being '.125'processed ("%s").',126$subkey,127$key));128}129}130131$subpos = $next[1];132133// If there were any non-token bytes since the last token, add them.134if ($subpos > $pos) {135$parts[] = substr($content, $pos, $subpos - $pos);136}137138// Add the token replacement text.139$parts[] = $map[$subkey];140141// Move the non-token cursor forward over the token.142$pos = $subpos + strlen($subkey);143}144145// Add any leftover non-token bytes after the last token.146$parts[] = substr($content, $pos);147148$content = implode('', $parts);149150$map[$key] = $content;151}152$corpus = last($map);153154if (!$text_mode) {155$corpus = phutil_safe_html($corpus);156}157158return $corpus;159}160161public function overwrite($key, $new_text) {162$this->map[$key] = $new_text;163return $this;164}165166public function getMap() {167return $this->map;168}169170public function setMap(array $map) {171$this->map = $map;172return $this;173}174175}176177178