How to use Golang Regex to match xml tags using attributes without lookahead or lookbehind

Herkus · November 22, 2024, 11:12am

I am trying to match xml tags using golang regex. i want to capture the value if below criteria matches.

elementProp name=“client_secret”
the inner tag should have attribute name=“Argument.value”
then i need to check the value inside the inner tag if it is a word or number with min of 8 charaters (inner tag can be any tag)My regex is

<elementProp[^>]*\bname="(?:client_secret)(?:[ \t\w.-]{0,20})"[^>]*>(?:.|\n)*?<(\w+)[^>]*\bname="Argument\.value"[^>]*>([a-zA-Z0-9]{8,})<\/[^>]*>(?:.|\n)*?<\s*\/elementProp[^>]*>

when i use below test data, it actually selects whole data from begining to untill first match. but what i want is to select only the matching elementProp tag.

<elementProp name="client_secret" elementType="Argument">
            <stringProp name="Argument.name">client_secret</stringProp>
            <stringProp name="Argument.value">${__P(client_secret_aws_testing)}</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="grant_type" elementType="Argument">
            <stringProp name="Argument.name">grant_type</stringProp>
            <stringProp name="Argument.value">client_credentials</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="client_secret" elementType="Argument">
            <stringProp name="Argument.name">client_secret</stringProp>
            <stringProp name="Argument.value">1234567788777123123</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="callbackURL" elementType="Argument">
            <stringProp name="Argument.name">callbackURL</stringProp>
            <stringProp name="Argument.value">test</stringProp>
            <stringProp name="Argument.desc">test</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>

Nikodem · November 22, 2024, 11:48am

To match only the <elementProp> tags meeting your criteria using Go’s regex, the issue lies in the part of your pattern that greedily matches everything between tags. Instead of using (?:.|\n)*?, we can use a non-greedy quantifier to ensure only the relevant parts are matched.

Here’s an updated regex pattern:

<elementProp[^>]*\bname="client_secret"[^>]*>(?:\s*<[^>]*>)*?<\w+[^>]*\bname="Argument\.value"[^>]*>([a-zA-Z0-9]{8,})<\/\w+>\s*<\/elementProp>

Explanation

<elementProp[^>]*\bname="client_secret"[^>]*>: Matches the opening <elementProp> tag with the name="client_secret" attribute.
(?:\s*<[^>]*>)*?: Matches any number of nested tags inside <elementProp> non-greedily.
<\w+[^>]*\bname="Argument\.value"[^>]*>: Matches an inner tag with the attribute name="Argument.value".
([a-zA-Z0-9]{8,}): Captures the value inside the inner tag if it’s a word or number of at least 8 characters.
<\/\w+>: Matches the closing tag of the inner tag.
\s*<\/elementProp>: Matches the closing </elementProp> tag.

Usage in Go

Here’s how you can use this regex pattern in Go:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	data := `<elementProp name="client_secret" elementType="Argument">
            <stringProp name="Argument.name">client_secret</stringProp>
            <stringProp name="Argument.value">${__P(client_secret_aws_testing)}</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="grant_type" elementType="Argument">
            <stringProp name="Argument.name">grant_type</stringProp>
            <stringProp name="Argument.value">client_credentials</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="client_secret" elementType="Argument">
            <stringProp name="Argument.name">client_secret</stringProp>
            <stringProp name="Argument.value">1234567788777123123</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="callbackURL" elementType="Argument">
            <stringProp name="Argument.name">callbackURL</stringProp>
            <stringProp name="Argument.value">test</stringProp>
            <stringProp name="Argument.desc">test</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>`

	pattern := `<elementProp[^>]*\bname="client_secret"[^>]*>(?:\s*<[^>]*>)*?<\w+[^>]*\bname="Argument\.value"[^>]*>([a-zA-Z0-9]{8,})<\/\w+>\s*<\/elementProp>`
	re := regexp.MustCompile(pattern)

	matches := re.FindAllStringSubmatch(data, -1)

	for _, match := range matches {
		fmt.Printf("Match found: %s\n", match[0]) // Full match
		fmt.Printf("Captured value: %s\n", match[1]) // Captured value
	}
}

Expected Output

For the provided data, this script will output:

Match found: <elementProp name="client_secret" elementType="Argument">...</elementProp>
Captured value: 1234567788777123123

Notes

The regex is designed to be robust for nested tags and ensures it stops matching as soon as it finds the closing </elementProp> tag for the relevant section.
It avoids greedily matching unrelated sections of your XML.
Modify or optimize based on specific requirements or larger XML datasets.