Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract/print the only fields/string included in square brackets

Status
Not open for further replies.

Andtea

Technical User
Aug 8, 2017
2
0
0
IT
Hi All,

as reported in the object I need to extract only all the fields inside this XML file also other included in square brackets not teh other, is there a way ?

[nz@mi-e2bi-prod-3 XML]$ cat Agenti.xml
<ImpactRequest><SearchFor><ModelItem><Path>[CRM - Agenti (Analisi Relational)]</Path></ModelItem></SearchFor><SearchIn><CMSelect>/content/folder[@name=&apos;E2BI&apos;]/folder[@name=&apos;CRM&apos;]</CMSelect></SearchIn><Params><URL> CMPath="/content/folder[@name=&apos;E2BI&apos;]/folder[@name=&apos;CRM&apos;]/folder[@name=&apos;Package&apos;]/package[@name=&apos;E2BI - CRM&apos;]/report[@name=&apos;A260316 Data Base Agenti&apos;]" Name="A260316 Data Base Agenti" Path="Cartelle pubbliche &gt; E2BI &gt; CRM &gt; Package &gt; E2BI - CRM" Type="Report" linkName="A260316 Data Base Agenti"><Item DirectImpact="1" FndIn="expression" QueryName="Query1">[CRM - Agenti (Analisi Relational)].[Agenti].[Cognome Agente]</Item><Item DirectImpact="1" FndIn="expression" QueryName="Query1">[CRM - Agenti (Analisi Relational)].[Agenti].[Nome Agente]</Item><Item DirectImpact="1" FndIn="expression" QueryName="Query1">[CRM - Agenti (Analisi Relational)].[Agenti].[Login Agente]</Item><Item DirectImpact="1" FndIn="expression" QueryName="Query1">[CRM - Agenti (Analisi Relational)].[Agenti].[Tipo Agente]</Item><Item DirectImpact="1" FndIn="expression" QueryName="Query1">[CRM - Agenti (Analisi Relational)].[Agenti].[Codice Fiscale Agente]</Item></CMObject></Results></ImpactRequest>[nz@mi-e2bi-prod-3 XML]$

Thanks

Bye
Andrea
 
What exactly does "also other included in square brackets not teh other"

Because it reads as contradictory.



Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.

Never mind this jesus character, stars had to die for me to live.
 
It means just the fields inside square backets not the other fields (also othter means I have other different XML files).

Thanks.

Ciao
Andrea
 
I would try to use regular expression for matching strings inside of square brackets - something like this:

andtea.awk
Code:
[COLOR=#0000ff]# Run:[/color]
[COLOR=#0000ff]# awk -f andtea.awk andtea.xml[/color]

[COLOR=#6a5acd]BEGIN[/color] {
  all_lines = [COLOR=#ff00ff]""[/color]
}

{
  [COLOR=#0000ff]# concatenate all lines into one string[/color]
  all_lines = all_lines [COLOR=#6a5acd]$0[/color]
}

[COLOR=#6a5acd]END[/color] {
  str = all_lines
  [COLOR=#0000ff]# initialize vars[/color]
  [COLOR=#a52a2a][b]delete[/b][/color] matches_array
  nr = [COLOR=#ff00ff]0[/color]
  [COLOR=#0000ff]# store all matches into an array[/color]
  [COLOR=#a52a2a][b]while[/b][/color] (str != [COLOR=#ff00ff]""[/color] && [COLOR=#008b8b]match[/color](str[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\[[/color][COLOR=#ff00ff][^[/color][COLOR=#6a5acd]][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#6a5acd]\][/color][COLOR=#ff00ff]/[/color]) > [COLOR=#ff00ff]0[/color] ) {
    matches_array[[COLOR=#6a5acd]++nr[/color]] = [COLOR=#008b8b]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RLENGTH[/color])
    str = [COLOR=#008b8b]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color] + ([COLOR=#6a5acd]RLENGTH[/color] ? [COLOR=#6a5acd]RLENGTH[/color] : [COLOR=#ff00ff]1[/color]))
  }
  [COLOR=#0000ff]# print result[/color]
  [COLOR=#a52a2a][b]printf[/b][/color]([COLOR=#ff00ff]"found [/color][COLOR=#6a5acd]%2d[/color][COLOR=#ff00ff] matches:[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] nr)
  [COLOR=#a52a2a][b]for[/b][/color] (i=[COLOR=#ff00ff]1[/color][COLOR=#6a5acd];[/color] i <= nr[COLOR=#6a5acd];[/color] i++) {
    [COLOR=#a52a2a][b]printf[/b][/color]([COLOR=#ff00ff]"[/color][COLOR=#6a5acd]%02d[/color][COLOR=#6a5acd]\t[/color][COLOR=#6a5acd]%s[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] i[COLOR=#6a5acd],[/color] matches_array[[COLOR=#6a5acd]i[/color]])
  }
}

Result:
Code:
$ awk -f andtea.awk andtea.xml
found 23 matches:
01	[CRM - Agenti (Analisi Relational)]
02	[@name=&apos;E2BI&apos;]
03	[@name=&apos;CRM&apos;]
04	[@name=&apos;E2BI&apos;]
05	[@name=&apos;CRM&apos;]
06	[@name=&apos;Package&apos;]
07	[@name=&apos;E2BI - CRM&apos;]
08	[@name=&apos;A260316 Data Base Agenti&apos;]
09	[CRM - Agenti (Analisi Relational)]
10	[Agenti]
11	[Cognome Agente]
12	[CRM - Agenti (Analisi Relational)]
13	[Agenti]
14	[Nome Agente]
15	[CRM - Agenti (Analisi Relational)]
16	[Agenti]
17	[Login Agente]
18	[CRM - Agenti (Analisi Relational)]
19	[Agenti]
20	[Tipo Agente]
21	[CRM - Agenti (Analisi Relational)]
22	[Agenti]
23	[Codice Fiscale Agente]

As you can see I first concatenated all lines of input file into one big string and then used regex on it to extract substrings in square brackets. But I am not sure if this approach would be good for bigger XML files and whether awk is a suitable tool for it at all. Because, if the XML should be very huge, you should first parse only the elements where the strings in square brackets can occur and then use regular expression. In that case (parsing XML + matching regex) I would rather use other tool like Python or Perl.

.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top