sqlplus应当是DBA 1.0时代使用最为频繁的管理工具,经常有经验丰富的老DBA会提到自己敲过几万次的sqlplus:),但有的时候这个吃饭家伙也会不好用,偶尔还会出现Segmentation fault错误,亦或者彻底hang住。在这里我介绍几种应对sqlplus无法正常使用的应对方法: 1.出现Segmentation fault,这种情况下一般是sqlplus 2进制文件被损坏了,可以通过重新build一个sqlplus来解决问题
[oracle@rh2 bin]$ sqlplusSegmentation fault/* 使用$ORACLE_HOME/sqlplus/lib目录下的make文件,编译一个新的sqlplus */[oracle@rh2 ~]$ make -f $ORACLE_HOME/sqlplus/lib/ins_sqlplus.mk  newsqlplusLinking /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplusrm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplusgcc -o /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus -m64-L/s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/ -L/s01/oracle/product/11.2.0/dbhome_1/lib/-L/s01/oracle/product/11.2.0/dbhome_1/lib/stubs/  /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/s0afimai.o-lsqlplus -lclntsh  `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11-ln11 -lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11-ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11 -lmm-lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11-lcore11 -lnls11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11-lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11-lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11-lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11  -lvsn11-lcommon11 -lgeneric11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11-lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`-Wl,-rpath,/s01/oracle/product/11.2.0/dbhome_1/lib -lm -lpthread   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`-ldl -lm -lpthread  -L/s01/oracle/product/11.2.0/dbhome_1/lib/bin/chmod 755 /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplusrm -f /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplusmv -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus/bin/chmod 751 /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplusrm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/libsqlplus.sorm -rf /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus[oracle@rh2 ~]$ sqlplus  / as sysdbaSQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:38:21 2011Copyright (c) 1982, 2010, Oracle.  All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,Data Mining and Real Application Testing options
2.出现sqlplus之后hang住的现象,hang的原因存在多种可能: 1)instance hanging数据库实例hang住,这种情况下sqlplus无法正常登陆到正hang的实例,而登陆到其他实例是可以的;若在10g以后版本中可以使用-prelim选项登陆实例,使用该选项登陆后无法执行普通的SQL语句,但可以使用oradebug内部调试工具,通过oradebug收集必要的hanganalyze信息后,可以进一步判断hang住的原因并决定下一步的操作。
[oracle@rh2 ~]$ sqlplus  / as sysdba.............................we suspend here!!![oracle@rh2 ~]$ sqlplus  -prelim / as sysdbaSQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:46:27 2011Copyright (c) 1982, 2010, Oracle.  All rights reserved.SQL> oradebug setmypid;Statement processed.SQL> oradebug dump hanganalyze 4;Statement processed.SQL> oradebug dump systemstate 266;Statement processed.SQL> oradebug tracefile_name/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_23436.trc   -- where dump resides
将以上trc文件提交给Oracle Support或者资深的Oracle技术人员,以便他们分析出实例hang住的原因,通过调整参数或者修复bug可以避免再次出现类似的状况。 2)一执行sqlplus就出现挂起现象,甚至没有登陆任何数据库。一般这种情况是在读取sqlplus 2进制文件或其相关的共享库文件(.so文件)时遇到了问题,或者是在实际system call系统调用execve("sqlplus")时遇到了错误,一般我们可以使用系统跟踪工具strace(Linux)或truss(Unix)工具来分析这种挂起现象:
/* Unix */truss -o sqlplus_hang.log sqlplus/* Linux */strace -o sqlplus_hang.log sqlplushead -10 sqlplus_hang.logexecve("/s01/db_1/bin/sqlplus", ["sqlplus"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error)
可以看到以上strace记录中发现了调用execve函数(execve() executes the program pointed to by filename)运行sqlplus程序时出现了ENOEXEC错误,该ENOEXEC错误代码说明我们正在执行一个格式无效的可执行文件,具体的解释如下:
This error indicates that a request has been made to execute a file which, although it has theappropriate permissions, does not start with a valid magic number. A magic number is the first twobytes in a file, used to determine what type of file it is.You tried to execute a file that is not in a valid executable format. The most common format forbinary programs under linux is called ELF. Note that your shell will run ascii files that have theexecutable bit set as a shell script (ie run it as shell commands).You can reproduce this by doing    $ dd if=/dev/random of=myfile bs=1k count=1 $ chmod +x myfile $ ./myfile zsh: exec format error: ./myfileNote that there is a very slight possibility that you could create a valid program that doessomething bad to your system!!Note, you can have user defined ways of running programs using Linux's binfmt_misc. See    /usr/src/linux/Documentation/binfmt_misc.txt
to be continued ............