Zawieszanie serwera i b

Wszystko związane z jądrem systemowym, sterownikami, sprzętem itp.
CreedOne
Posty: 12
Rejestracja: 02 stycznia 2009, 13:07

Zawieszanie serwera i błędy na dysku

Post autor: CreedOne »

Mam problem z serwerem. Mniej więcej co 2, 3 dni zawiesza się, lecz działa ,,ping''. Pomaga tylko restart przez wyłączenie z sieci. Podczas startu systemu fsck przeskanował dysk i w logu jest takie coś:

Kod: Zaznacz cały

Log of fsck -C -R -A -y
Fri Jan 16 10:34:52 2009

fsck 1.40-WIP (14-Nov-2006)
e2fsck 1.40-WIP (14-Nov-2006)
/boot: recovering journal
/boot has gone 181 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/boot: 20/26104 files (10.0% non-contiguous), 11995/104388 blocks
e2fsck 1.40-WIP (14-Nov-2006)
/tmp: recovering journal
Clearing orphaned inode 15 (uid=105, gid=105, mode=0100600, size=0)
Clearing orphaned inode 14 (uid=105, gid=105, mode=0100600, size=0)
Clearing orphaned inode 13 (uid=105, gid=105, mode=0100600, size=0)
Clearing orphaned inode 12 (uid=105, gid=105, mode=0100600, size=20)
Clearing orphaned inode 11 (uid=105, gid=105, mode=0100600, size=0)
/tmp: clean, 19/262144 files, 16452/524112 blocks
e2fsck 1.40-WIP (14-Nov-2006)
/usr: recovering journal
Clearing orphaned inode 281830 (uid=0, gid=0, mode=0100644, size=14504)
Clearing orphaned inode 281829 (uid=0, gid=0, mode=0100644, size=508328)
Clearing orphaned inode 281827 (uid=0, gid=0, mode=0100644, size=151252)
Clearing orphaned inode 281825 (uid=0, gid=0, mode=0100644, size=111708)
Clearing orphaned inode 279460 (uid=0, gid=0, mode=0100644, size=1270520)
Clearing orphaned inode 279461 (uid=0, gid=0, mode=0100644, size=253120)
/usr: clean, 37068/919296 files, 183507/1835008 blocks
e2fsck 1.40-WIP (14-Nov-2006)
/var: recovering journal
Clearing orphaned inode 246136 (uid=0, gid=4, mode=0100640, size=516076)
/var: clean, 4172/786432 files, 146209/1572354 blocks
e2fsck 1.40-WIP (14-Nov-2006)
/home: recovering journal
/home has gone 181 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/home: ***** FILE SYSTEM WAS MODIFIED *****
/home: 33484/27901952 files (2.1% non-contiguous), 27776613/55779680 blocks
fsck died with exit status 1

Fri Jan 16 10:39:40 2009
----------------
Widzę że jakiś problem z /home jest. Co proponujecie w tej sytuacji. Czy da się to naprawić? Czy tylko wymiana dysku?

[ Dodano: 2009-01-16, 15:29 ]
Sprawdziłem dysk smartctl -a -d ata /dev/hda

Kod: Zaznacz cały

serv1:~# smartctl -a -d ata /dev/hda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is [url]http://smartmontools.sourceforge.net/[/url]

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDP725025GLAT80
Serial Number:    GE1230RB1NLZAA
Firmware Version: GM2OA42A
User Capacity:    250,059,350,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Not recognized. Minor revision code: 0x29
Local Time is:    Fri Jan 16 15:09:21 2009 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (3981) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  66) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       140
  3 Spin_Up_Time            0x0007   103   103   024    Pre-fail  Always       -       197 (Average 200)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       6041
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       26
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       50
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       50
194 Temperature_Celsius     0x0002   130   130   000    Old_age   Always       -       46 (Lifetime Min/Max 25/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Awatar użytkownika
kajoj
Posty: 12
Rejestracja: 09 stycznia 2009, 11:53
Lokalizacja: Bydgoszcz

Post autor: kajoj »

Dysk nie wygląda źle. Sprawdziłbym pamięć ram memtestem, gdyż błędy na dysku mogą wynikać z błędów odczytu z pamięci buforowej. No i dziurawe pamięci ram pięknie tłumaczą wieszanie się maszyny.
CreedOne
Posty: 12
Rejestracja: 02 stycznia 2009, 13:07

Post autor: CreedOne »

Memtest nic nie wykazał. Wygląda na to że pamięć jest jak najbardziej sprawna.

Dodam, że serwer działa bez problemu jak jest mało obciążony. Jak go obciążę do 40% to po jakimś czasie się zawiesza. Możliwe, że to problem z temperaturą procesora, ale mam kłopot z zainstalowanie lm-sensors, ponieważ pojawia się błąd podczas sensors-detect.
borlus
Beginner
Posty: 299
Rejestracja: 08 stycznia 2008, 14:27
Lokalizacja: okolice Poznania

Post autor: borlus »

Co to za błąd?
CreedOne
Posty: 12
Rejestracja: 02 stycznia 2009, 13:07

Post autor: CreedOne »

Podczas instalacji lm-sensors nie mogłem zainstalować lm-source...poprostu nie było tego w repo.

A podczas sensors-detect pojawia się taki problem:

Kod: Zaznacz cały

serv1:~# sensors-detect
# sensors-detect revision 4171 (2006-09-24 03:37:01 -0700)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

We can start with probing for (PCI) I2C or SMBus adapters.
Do you want to probe now? (YES/no): y
Probing for PCI bus adapters...
Use driver `i2c-viapro' for device 0000:00:11.0: VIA Technologies VT8237 South Bridge

We will now try to load each adapter module in turn.
Load `i2c-viapro' (say NO if built into your kernel)? (YES/no): y
FATAL: Could not load /lib/modules/2.6.24.5-grsec-xxxx-grs-ipv4-32/modules.dep: No such file or directory
Loading failed... skipping.
If you have undetectable or unsupported adapters, you can have them
scanned by manually loading the modules before running this script.

To continue, we need module `i2c-dev' to be loaded.
Do you want to load `i2c-dev' now? (YES/no): y
FATAL: Could not load /lib/modules/2.6.24.5-grsec-xxxx-grs-ipv4-32/modules.dep: No such file or directory
Loading failed, expect problems later on.

We are now going to do the I2C/SMBus adapter probings. Some chips may
be double detected; we choose the one with the highest confidence
value in that case.
If you found that the adapter hung after probing a certain address,
you can specify that address to remain unprobed.

Some chips are also accessible through the ISA I/O ports. We have to
write to arbitrary I/O ports to probe them. This is usually safe though.
Yes, you do have ISA I/O ports even if you do not have any ISA slots!
Do you want to scan the ISA I/O ports? (YES/no): y
/dev/port: Operacja niedozwolona
Awatar użytkownika
lis6502
Member
Posty: 1798
Rejestracja: 05 listopada 2008, 20:14
Lokalizacja: Miasto Szybowców

Post autor: lis6502 »

Próbowałeś nie skanować portów ISA?
CreedOne
Posty: 12
Rejestracja: 02 stycznia 2009, 13:07

Post autor: CreedOne »

Próbowałem również:

Kod: Zaznacz cały

serv1:~# sensors-detect
# sensors-detect revision 4171 (2006-09-24 03:37:01 -0700)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

We can start with probing for (PCI) I2C or SMBus adapters.
Do you want to probe now? (YES/no): y
Probing for PCI bus adapters...
Use driver `i2c-viapro' for device 0000:00:11.0: VIA Technologies VT8237 South Bridge

We will now try to load each adapter module in turn.
Load `i2c-viapro' (say NO if built into your kernel)? (YES/no): y
FATAL: Could not load /lib/modules/2.6.24.5-grsec-xxxx-grs-ipv4-32/modules.dep: No such file or directory
Loading failed... skipping.
If you have undetectable or unsupported adapters, you can have them
scanned by manually loading the modules before running this script.

To continue, we need module `i2c-dev' to be loaded.
Do you want to load `i2c-dev' now? (YES/no): y
FATAL: Could not load /lib/modules/2.6.24.5-grsec-xxxx-grs-ipv4-32/modules.dep: No such file or directory
Loading failed, expect problems later on.

We are now going to do the I2C/SMBus adapter probings. Some chips may
be double detected; we choose the one with the highest confidence
value in that case.
If you found that the adapter hung after probing a certain address,
you can specify that address to remain unprobed.

Some chips are also accessible through the ISA I/O ports. We have to
write to arbitrary I/O ports to probe them. This is usually safe though.
Yes, you do have ISA I/O ports even if you do not have any ISA slots!
Do you want to scan the ISA I/O ports? (YES/no): n

Some Super I/O chips may also contain sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): n

Sorry, no sensors were detected.
Either your sensors are not supported, or they are connected to an
I2C or SMBus adapter that is not supported. See doc/FAQ,
doc/lm_sensors-FAQ.html or [url]http://www.lm-sensors.org/wiki/FAQ[/url]
(FAQ #4.24.3) for further information.
If you find out what chips are on your board, check
[url]http://www.lm-sensors.org/wiki/Devices[/url] for driver status.
Awatar użytkownika
lis6502
Member
Posty: 1798
Rejestracja: 05 listopada 2008, 20:14
Lokalizacja: Miasto Szybowców

Post autor: lis6502 »

Tutaj był?
Swoją drogą spójrz

Kod: Zaznacz cały

FATAL: Could not load /lib/modules/2.6.24.5-grsec-xxxx-grs-ipv4-32/modules.dep: No such file or directory
Loading failed, expect problems later on. 
Wejdź do /lib/modules/$(uname -r)/kernel/drivers/i2c, wylistuj zawartość i wrzuć na forum. Dodaj także wynik

Kod: Zaznacz cały

lsmod
Zapuść jeszcze z root'a

Kod: Zaznacz cały

depmod
Utumno
Beginner
Posty: 432
Rejestracja: 09 listopada 2008, 13:04
Lokalizacja: Gdansk

Post autor: Utumno »

depmod -a.

Masz zle skompilowany kernel.
wojtekz_
Beginner
Posty: 337
Rejestracja: 13 marca 2007, 16:50
Lokalizacja: Warszawa

Post autor: wojtekz_ »

Utumno pisze:Masz zle skompilowany kernel.
Też bym tak obstawiał. A testu dysku to tak naprawdę nie zrobiłeś...

Pozdrawiam
ODPOWIEDZ