Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
galaxyproject
GitHub Repository: galaxyproject/training-material
Path: blob/main/faqs/galaxy/analysis_regular_expressions.md
1677 views
---
title: Regular Expressions 101 area: tools box_type: tip layout: faq contributors: [shiltemann]
---

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expressionMatches
abcan occurrence of abc within your data
`(abcdef)`
[abc]a single character which is either a, b, or c
[^abc]a character that is NOT a, b, nor c
[a-z]any lowercase letter
[a-zA-Z]any letter (upper or lower case)
[0-9]numbers 0-9
\dany digit (same as [0-9])
\Dany non-digit character
\wany alphanumeric character
\Wany non-alphanumeric character
\sany whitespace
\Sany non-whitespace character
.any character
\.literal . (period)
{x,y}between x and y repetitions
^the beginning of the line
$the end of the line

Note: you see that characters such as *, ?, ., + etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So \? matches the question mark character exactly.

Examples

Regular expressionmatches
\d{4}4 digits (e.g. a year)
chr\d{1,2}chr followed by 1 or 2 digits
.*abc$anything with abc at the end of the line
^$empty line
^>.*Line starting with > (e.g. Fasta header)
^[^>].*Line not starting with > (e.g. Fasta sequence)

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups (...), which we can refer to using \1, \2 etc for the first and second captured values. If you want to refer to the whole match, use &.

Regular expressionInputCaptures
chr(\d{1,2})chr14\1 = 14
(\d{2}) July (\d{4})24 July 1984\1 = 24, \2 = 1984

An expression like s/find/replacement/g indicates a replacement expression, this will search (s) for any occurrence of find, and replace it with replacement. It will do this globally (g) which means it doesn't stop after the first match.

Example: s/chr(\d{1,2})/CHR\1/g will replace chr14 with CHR14 etc.

You can also use replacement modifier such as convert to lower case \L or upper case \U. Example: s/.*/\U&/g will convert the whole text to upper case.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don't have to use the s/../../g structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.