Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login




Remember Me
Forgot Password?
Join Us!

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Donate Today!

Do you enjoy these
technical forums?
Donate Today! Click Here

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.
Jobs from Indeed

Link To This Forum!

Partner Button
Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Extract all hyperlinks from a Word 2010 DocumentHelpful Member! 

pattyjean (TechnicalUser)
3 Aug 12 10:58
I would like to Extract all hyperlinks from a word Document to list them all in one document.



macropod (TechnicalUser)
4 Aug 12 1:41
That's nice.

However, unless you can tell us what issues you're having doing this, how the document is formatted, etc, it's hard to be sure if the following will work:
• Use Ctrl-A, then mark all text as hidden. If it disappears click on the ¶ symbol on the toolbar/ribbon to make it all visible again.
• Using Find/Repace, do a Find for all text in the Hyperlink Style, setting the Replace parameter to 'Not Hidden'
• Using Find/Repace, do a Find for all hidden text, setting the Replace parameter to ^p
• Using a wildcard Find/Repace, delete the 'hidden text' setting and do a Find for [^13]{1,}, setting the Replace parameter to ^p
What you should end up with is a list of all hyperlinks in the document. All of the above assumes your hyperlinks are formatted as such, with the Hyperlink Style.

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
29 Nov 12 19:50
Thank you Macropod.
I guess I don't work in word enough to understand what you are asking me for. What do you mean formatted? It's just a typical word doc with hyperlink attached to text.
I am also not sure how to set the replace to Not Hidden or hidden text.

In the find box how to I do find for all text in the Hyperlink Style? Is there a special code? Thank you in advance as we have over 2000 hyperlinks that we need to index at the end.

Using Find/Repace, do a Find for all text in the Hyperlink Style, setting the Replace parameter to 'Not Hidden'
• Using Find/Repace, do a Find for all hidden text, setting the Replace parameter to ^p
• Using a wildcard Find/Repace, delete the 'hidden text' setting and do a Find for [^13]{1,}, setting the Replace parameter to ^p
macropod (TechnicalUser)
29 Nov 12 19:57
Formatted: Do you hyperlinks look like & function as hyperlinks?

The rest is simply a matter of learning to use the options available to you on the Find/Replace dialogue. You may need to click on the 'More' button to access them, especially the 'Format' options you'll need to use.

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
30 Nov 12 8:29
the hyperlinks have nothing in front of them but are blue and underlined. there is no a format
Helpful Member!  macropod (TechnicalUser)
30 Nov 12 19:15
If they're blue & underlined, and act as hyperlinks when you click on them, then they are formatted as hyperlinks; if they don't act as hyperlinks, then they're not formatted as hyperlinks - they're simply blue underlined text formatted to look like hyperlinks. Another way to test is to press Alt-F9. Do the 'hyperlinks' change their appearance?

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
3 Dec 12 13:40
Got it! Thank you so much. One question - do I need to do all your steps mentioned in order?

Once we completed all the steps we saved the word document as a XML document and was able to open it with excel, so we have a list of the targets (hyperlink as true value, pdfs)

Hope I can make a macro to do all the steps.
thank you again.
macropod (TechnicalUser)
3 Dec 12 17:00
Hi pattyjean,

Here's a macro to do the job:

CODE --> VBA

Sub ExtractHyperlinks()
With ActiveDocument.Range
  .Font.Hidden = True
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    .Style = "Hyperlink"
    .Text = ""
    .Replacement.Text = ""
    .Replacement.Font.Hidden = False
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Font.Hidden = True
    .Replacement.Text = "^p"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Text = "[^13]{1,}"
    .Execute Replace:=wdReplaceAll
  End With
End With
End Sub 

Cheers
Paul Edstein
[MS MVP - Word]

jpadie (TechnicalUser)
4 Dec 12 3:09
Hi,
I'm very far from a Word VBA Guru, but would this macro not be a bit simpler? you get out a clean word doc with all the hyperlinks listed in paragraphs.

CODE

Function doHL()
    Dim nd As Document
    Dim a As Document
    Dim h As Hyperlink
    Dim r As Range
    
    Application.ScreenUpdating = False
    
    Set a = ActiveDocument
    Set nd = Documents.Add
    
    For Each h In a.Hyperlinks
        Set r = nd.Range
        r.Collapse
        r.InsertParagraph
        r.InsertAfter (h.Address)
    Next

    nd.Activate
    Application.ScreenUpdating = True
    Application.ScreenRefresh
End Function 
macropod (TechnicalUser)
4 Dec 12 4:07
Hi jpadie,

Your code might be 'simpler', but it's far less efficient once you get beyond a few hyperlinks. FWIW, for all its extra lines, my code does all the extraction, even in a document with 100,000 hyperlinks, in four simple steps. Your's would probably still be running hours after mine has finished.

Cheers
Paul Edstein
[MS MVP - Word]

strongm (MIS)
4 Dec 12 6:10
On the other hand it has the advantage that, with a very minor change, it can show the real target, which pattyjean seems to have suggested is the goal in their post of 3 Dec 12 13:40, for example:

CODE

'Private Declare Function GetTickCount Lib "kernel32" () As Long

Public Sub GetHyperlinks()
    Dim myDoc As Document
    Dim wombat As Hyperlink
'    Dim starttime As Long
    Dim CurrentDoc As Document
    
    Application.ScreenUpdating = False
    Set CurrentDoc = ActiveDocument
    Set myDoc = Application.Documents.Add()

'    starttime = GetTickCount
    For Each wombat In CurrentDoc.Hyperlinks
        myDoc.Range.InsertAfter wombat.TextToDisplay & vbTab & wombat.Address & vbCrLf
    Next
'    Debug.Print GetTickCount - starttime

    Application.ScreenUpdating = True
    myDoc.Range.ParagraphFormat.TabStops.Add CentimetersToPoints(7.5), wdAlignTabLeft, wdTabLeaderSpaces 'basic formatting
End Sub
 


Furthermore, an actual test of your assertion on performance (against a 234 page document with over 8000 hyperlinks) indicates that the contrary is true - performance of jpadie's solution (or at least my variant above) starts to convincingly outstrip the find/replace solution as the number of hyperlinks goes up.
macropod (TechnicalUser)
4 Dec 12 6:23
Setting 'Application.ScreenUpdating = False' makes a fairly fundamental difference. If you're going to use that for a timing comparison, you should use it in both implementations.

Cheers
Paul Edstein
[MS MVP - Word]

jpadie (TechnicalUser)
4 Dec 12 6:25
i suspect that paging between the documents would slow down the script, even with screen updating off.

I tried to experiment with storing the targets in a string and then finally inserting into a new document. I tested on a file with 280000 hyperlinks across 8000 pages and got bored after ten minutes (so force quit the app). in the meantime I wrote a php app to open the raw xml and retrieve the hyperlinks. that op takes milliseconds...

i know that VBA is not a real language but i'm still really surprised by how badly optimised it is. Luckily I never have to use it for anything other than the most trivial things.
strongm (MIS)
4 Dec 12 6:52
>If you're going to use that for a timing comparison, you should use it in both implementations.

I did
strongm (MIS)
4 Dec 12 7:27
And this is the test version of your code that I used against the same document as my code:

CODE

Sub ExtractHyperlinks()
Dim starttime As Long

Application.ScreenUpdating = False
starttime = GetTickCount
With ActiveDocument.Range
  .Font.Hidden = True
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    .Style = "Hyperlink"
    .Text = ""
    .Replacement.Text = ""
    .Replacement.Font.Hidden = False
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Font.Hidden = True
    .Replacement.Text = "^p"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Text = "[^13]{1,}"
    .Execute Replace:=wdReplaceAll
  End With
End With
Debug.Print GetTickCount - starttime
Application.ScreenUpdating = False
End Sub 
strongm (MIS)
4 Dec 12 7:33
]>i'm still really surprised by how badly optimised it is

it isn't really VBA itself that is the culprit with your code, it is the fact that you are using relatively expensive (slow) Word operations: Collapse and InsertParagraph.
jpadie (TechnicalUser)
4 Dec 12 9:45
I live and learn!

I wrote an alternative that just stored the addresses in a string and didn't write it anywhere (so no 'expensive' calls). I quit the app again after 25 minutes running on the same document (8k pages 200k+ hyperlinks).

Ho hum ...
strongm (MIS)
4 Dec 12 11:15
Well, my admittedly paltry 8000 hyperlinks only took about 6 or 7 seconds on a somewhat ageing 3Ghz Pentium 4
strongm (MIS)
4 Dec 12 11:47
Now tried it against 48384 links in a 98000 word document. Took about 136 second. The find/replace solution is taking somewhat longer. Currently 8 mins and counting. ISuspect that memory will be a factor here. Will have to test on my monster at home tonight[link ][color ]Link[/color][/link]
jpadie (TechnicalUser)
4 Dec 12 11:53
curious. i'm using a 2.53 Ghz core 2 duo with 4GB RAM. but am using MacWord which may well not have an optimised memory handler or VBA compiler.

macropod (TechnicalUser)
5 Dec 12 4:33
Hi strongm,

I concede your point re the ultimate aim being to extract the addresses (something I hadn't picked up from pattyjean's last post), whereas my code was designed to preserve the hyperlinks as such.

FWIW, I tested a document containing 100,000 hyperlinks amongst 4,735,000 words spread over 19,003 pages. The optimised loop code to extract the addresses to a new document took 00:04:04, whereas the optimised F/R to delete everything except the hyperlinks took 00:07:53. I also tried an optimised loop to copy the hyperlinks to a new document, I gave up waiting after 01:30:00, by which time only 1/3rd of them had been processed.

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
21 Dec 12 14:51
Thanks everyone for the information.

Well after trail and error I found out that an easy way to do the same thing is same the word document as xml and when you open it in excel it gives you a clean column named target to easily identify all the linked documents.
Now that I have this part of the process complete, the next step is to match up the link names with the friendly name (excel formula = hyperlink() It enables me to rename the links into the text name but ........coping it back into the word document is the new challenge for me. Any ideas? I am going to post this into another category if it makes sense to you all.

macropod (TechnicalUser)
21 Dec 12 16:37
Hi pattyjean,

So you have a set of hyperlinks in Word and, in Excel, a corresponding set of hyperlinks in one column and their 'friendly' names in another, and you want the Word hyperlinks to display the 'friendly' names. Corect? If so, that's easily enough done. A couple of questions, though:
1. Are the hyperlinks in Word & Excel listed in the same order?
2. Are there any duplicates or instances or the same hyperlink with two or more 'friendly' names?

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
22 Dec 12 10:23
Paul,
Thanks for the response, in answer to your questions I have to give you the whole picture.
There are 91 different word documents with 2000 attachments in pdf format.
Each word document contains the hyperlinks but at the end of the document we want to add a list of evidence with the list of hyperlinks and their friendly names.
All the hyperlinks are in one folder with the word documents outside the folder.
But the final document will be a pdf version with all sets of clickable links. So after the link of evidence the word document will be saved as pdf. The hyperlinks are on a drive and will be saved to flash drives.

1. Are the hyperlinks in Word & Excel listed in the same order? Could be haven't set it up yet.

2. Are there any duplicates or instances or the same hyperlink with two or more 'friendly' names? No each hyperlink might be multiple documents but the same friendly name.

Does that answer some of your questions?
macropod (TechnicalUser)
22 Dec 12 21:37
Hi pattyjean,

If you're hyperlinking to documents, I think you'll find the hyperlinks will have the full filepaths, including drive letters, etc for the target files. So, when you do your PDF conversion, that's what'll be replicated in the PDF. If you then copy the files to a USB stick or CD and open them on another computer, the hyperlinks will still be looking for the original filepaths on your computer and, in all likelihood, will fail.

As for the "list of evidence with the list of hyperlinks and their friendly names", that suggests some form of table, but it's not clear how the 'list of evidence' entries are to be compiled and matched with the hyperlinks. Also, it seems to me you don't need both the 'hyperlinks and their friendly names'. Rather, you should be able to have the hyperlinks displaying only their friendly names.

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
23 Dec 12 7:37
In answer to
If you then copy the files to a USB stick or CD and open them on another computer, the hyperlinks will still be looking for the original filepaths on your computer and, in all likelihood, will fail.
The way we linked them and it works is to have the folder of attachments on each flash drive and in the word document it is linked like attachments\filename.pdf. It works for the current links but the

List of Evidence is a different story. I don't want to link each one separately. I used the list of hyperlinks from the word document so I have the name of them already from the other step mentioned above. I match them up with the friendly name for each document in an excel table and use the hyperlinks function. The problem is - how do I copy the friend name with the link to paste into the word document. It brings over the path of the excel file instead of the real link. I need some kind of function to keep together the pdf with the friendly name. Any clue? Does this make sense?

macropod (TechnicalUser)
27 Dec 12 1:22
Hi pattyjean,

Even if your hyperlinked files are on a flash drive, by default they'll include the drive's letter in Word. Put the flash drive into another PC where it gets assigned a different drive letter and the hyperlinks will fail.

It's still not clear what you intend regarding the 'List of Evidence'. It is easy enough to modify the Word hyperlinks so they display the friendly names in the body of the document, rather than the actual paths, whilst hovering over them will display the actual paths. To that end, you don't need a separate 'List of Evidence'. if you want one, though, perhaps what you need is an Index to provide that list.

Cheers
Paul Edstein
[MS MVP - Word]

pattyjean (TechnicalUser)
2 Jan 13 9:35
Yes I need an index. We got the links to work on the flash drives in word, now we are converting word to pdf. I will work on this project later today to figure it out than repost my questions. Thanks
pattyjean (TechnicalUser)
7 Jan 13 11:52
We have approximately 3000 attachments that we have to hyperlink into different word documents with the final product in pdf (maintaining the hyperlinks). The final documentation will be on flash drives to 7 different reviewers.

We can save the word to pdf and maintain the links in pdf, but when we click to open the hyperlink it takes us to the attachment but on close, it closes everything.
For more detail, I found this same issue here: http://forums.adobe.com/message/4005350
There are 3 ways we can do this:
1: Change the setting in pdf to 'Open cross-document links in same window' unchecked in Edit>Preferences>Documents (works great but the SACS reviewer would have to follow these steps also. (we are using Adobe X, don’t know what version they would use)
2: We can ask the SACS reviewer to hold Ctrl and enter to open the pdf in a new window, or
3: Can you deploy a configuration file (autorun) to add to the flash drive so they can just click the link and it opens in a new window?

We want to make this as easy as possible for them to review on a flashdrive. Can you give us any advice or help with deployment? Is it possible?
Thanks for any information you can provide. This would have to work in a MAC environment as well.
macropod (TechnicalUser)
31 Jan 13 1:44
Hi pattyjean,

You have 3000 attachments, or 3000 links?

It's still not clear to me how the hyperlinked content in the body of a given document is intended to relate to the 'index', which apparently uses the 'friendly' name. Doesn't the 'friendly' name get used as the display text in the body also? If not, how is a user meant to recognise which 'friendly' name in the 'index' relates to a given hyperlink in the body?

It's also still not clear as to how the 'index' is to be compiled. Is the idea to go through all the hyperlinks in the body, find the corresponding entries in the Excel workbook, then insert the 'friendly name' hyperlinks into the 'index'? What happens if the same hyperlink is found more than once? Should the 'index' entries be sorted and, if so, how?

PS: I've been away for a fwe weeks, hence the delay in replying.

Cheers
Paul Edstein
[MS MVP - Word]

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close