A lot of that will depend on your specific infrastructure and how you want to conduct the test. For DR planning, we focus on three levels: Circuit, Server, and Site. I can't write a plan for you, but here are some questions and suggestions to get you started.
Circuit Level
How do you handle DIDs? Are there redirection plans, such as duplicate SIP trunks in DC1 and DC2 with automatic redirection? Manual, such as AERS carrier redirection from PRI in DC1 to PRI in DC2?
Do you have secondary routes out DC2 or other locations? How about for LSP sites?
Server Level
Do you have full redundancy between two data centers? DC1 = ASM, ACM, sysmgr; DC2 = ASM ACM ESS, with LSPs in random locations?
Will you systematically power down each server or physically disconnect their network connection to test the server level fail-over?
Are there any Virtual Machines that can be rebuilt in DC2? What is the lead time?
Site Level
Will your network team disable network routes to simulate a network outage in the DC? This is a much larger test and will definitely need to be coordinated among all the various players (voice, data, development, etc).
Do you have any sites with gateways only? Do you have Primary, Secondary, and Tertiary registration points?
What are acceptable outages? System Manager, for example, doesn't need to be duplicated. It would certainly help, but you may consider that an acceptable outage for a short duration.
Write down all the scenarios, their impact to the system and your action plan to resolve. For example, let's say Trunk Group 1 has two PRI and is used for inbound traffic in DC1, TG2 in DC2). Your scenario may look something like this:
Single PRI outage: DIDs continue to enter via TG1, capacity diminished by half. Action Plan: Determine outage and work with vendor to replace hardware or carrier to resolve circuit issue.
Dual PRI outage: All DIDs fail. Engage carrier to point critical DIDs to TG2.
Once you have a matrix of possible outcomes, determine what situation you want to test. Compare your actual outcome to the projected outcome and figure out the why there was a difference. Is the system not programmed correctly or was your expectation unrealistic? Correct any programming issues and update your documentation.
I hope that helps