First important step is to describe the problem completely. You will not know where to start and what investigation you need to do without a problem description.
Basic questions you should ask yourself:
==> What are the symptoms?
==> Where is the problem happening?
==> When does the problem happen?
==> Under which condition problem happen?
==> Is the problem reproducible?
What are the symptoms?
When analyzing symptoms you should think about below questions.
- Who or what is reporting the problem?
- Error codes and error messages?
- How does it fail?
- How it is effecting business?
What is the problem happening?
Important step to resolve a problem is to find origin, but it is not very easy. Network, disks, and drivers are only a few components to consider when you investigate problems.
- Is the problem platform specific, or common to multiple platforms?
- Environment running locally on the database server or on a remote machine?
- Any gateway involved?
- Data stored on individual disks, or on a RAID disk array?
If you find problem reported on one layer, that might not be the only reason or root cause.
- To identify a problem you should understand the environment where it exists.
- You should always spend time to understand environment like, OS, its version, all corresponding software & versions, and hardware information.
When does the problem happen?
For problem analysis time line is very important factor. Start at the time an error was reported and work backwards through available logs & information.
The diagnostic information we need to look for is as below:
- The problem only happen at a certain time of day or night?
- How often does it happen?
- What are the sequence of events upto the time the problem is reported?
- Does the problem happen after an upgrade or installing new software & hardware.
Under which condition does the problem happen?
Knowing what else is running at the time of a problem is important. If a problem occurs in a certain environment or under certain conditions, that can be a key indicator of the problem cause.
- Does the problem always occur when performing same task?
- Certain sequence if events need to occur for the problem to surface?
- Do other applications fails at the same time?
Is the problem reproducible?
The "ideal" problem is one that is reproducible. Reproducible problems are usually easier to debug and solve.
If problem causes business impact, you don't want it recurring. If possible create the problem in a test or dev environment is often preferable in this case.
- Can the problem be recreated on a test machine?
- Multiple users or applications encountering the same type of problem?
- Can the problem be recreated by running a single command, a set of commands, or a particular existing application.
So informative...Very well written..
ReplyDelete