×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Power Query Extract from pdf

Power Query Extract from pdf

Power Query Extract from pdf

(OP)
I am extracting data using power from a pdf file with headers and was wondering if there was an easy way to transform the data into a data cube in power query
eg currently it imports like this
Dept Sales
Jo Bloggs 120
John Smith 100
Dept Sales Total 220

Dept Admin
Ned Kelly 80
Butch Cassidy 200
Dept Admin Total 280

And I would like to see it like this is put the dept in the same data line with no header and total lines

Sales Jo Bloggs 120
Sales John Smith 100
Admin Ned Kelly 80
Admin Butch Cassidy 200

Thanks

RE: Power Query Extract from pdf

I'm not sure what is precise format of your entry data. Assuming that it is a single column string, with some preprocessing in excel:
- add header 'Data'
- convert the range to table and name it 'tData'

the M query that converts input to 3-column table:

CODE --> M

let
    Source = Excel.CurrentWorkbook(){[Name="tData"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Data", type text}}),
    ExtractDeptNotTotal = Table.AddColumn(#"Changed Type", "Dept", each if Text.Contains([Data], "Dept")=true and Text.Contains([Data], "Total")=false then Text.Replace([Data],"Dept ","") else null),
    FillDeptDown = Table.FillDown(ExtractDeptNotTotal,{"Dept"}),
    ExtractAmount = Table.AddColumn(FillDeptDown, "Amount", each Text.Reverse(Text.BeforeDelimiter(Text.Reverse([Data]), " "))),
    ExtractName = Table.AddColumn(ExtractAmount, "Name", each Text.Replace([Data],Text.Combine({" ",[Amount]}),"")),
    ReorderColumns = Table.ReorderColumns(ExtractName,{"Data", "Dept", "Name", "Amount"}),
    RemoveNullAndEmptyData = Table.SelectRows(ReorderColumns, each [Data] <> null and [Data] <> ""),
    RemoveDeptInData = Table.SelectRows(RemoveNullAndEmptyData, each not Text.StartsWith([Name], "Dept")),
    RemoveData = Table.RemoveColumns(RemoveDeptInData,{"Data"}),
    AmountToNumber = Table.TransformColumnTypes(RemoveData,{{"Amount", type number}})
in
    AmountToNumber 

combo

RE: Power Query Extract from pdf

(OP)
Thanks works great

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close