HLRN System - News Archive 2010


IDPublished Subject
[1880] Dec 30, 2010List of potentially corrupted files on /gfs1
[1878] Dec 22, 2010HLRN Hannover online again
[1875] Dec 20, 2010Filesystem check on /gfs1 in Hannover
[1873] Dec 17, 2010Preventive drain initiated in Hannover
[1871] Dec 17, 2010Newsletter: Tenth edition of HLRN Informationen published
[1869] Dec 16, 2010File system problems on /gfs1 at site Hannover
[1864] Dec 02, 2010Hannover HLRN Maintenance ANSYS license server on Dec 7th
[1861] Dec 01, 2010Hannover HLRN tape system upgrade - finished
[1859] Dec 01, 2010Finished: Maintenance in Hannover December 3-9
[1855] Nov 29, 2010Hannover HLRN tape system upgrade
[1852] Nov 19, 2010Trouble in Hannover with XE and data servers - finished
[1849] Nov 18, 2010Finished: Maintenance on BICE November 26-29
[1848] Nov 15, 2010Berlin "perm-servers" available
[1845] Nov 09, 2010Hannover up again
[1843] Nov 09, 2010Lustre problem in Hannover
[1839] Nov 08, 2010Lustre problem in Hannover
[1835] Oct 29, 2010Finished: HLRN Berlin ICE Network Maintenance
[1833] Oct 26, 2010NAMD 2.7 available
[1830] Oct 18, 2010HLRN Berlin Network Maintenance
[1828] Oct 12, 2010Project proposals - next deadline October 28, 2010
[1826] Oct 12, 2010VASP 5.2.8 available
[1824] Oct 07, 2010UV systems available for all HLRN users
[1821] Sep 27, 2010Resolved: Lustre problem on /gfs1 in Berlin
[1817] Sep 10, 2010Finished - Maintenance of data servers / PERM / qfs in Hannover
[1815] Sep 09, 2010Maintenance of data servers / PERM / qfs in Hannover Fri Sep 10th, 11:00 am
[1810] Sep 09, 2010Default Compilers to be removed from the User Environment (Module PROGenv)
[1812] Sep 08, 2010CFD package Fluent 6 discontinued, ANSYS CFD 12 replaces Fluent
[1811] Sep 08, 2010FEM package ABAQUS versions 6.9-2 and 6.10-1 available
[1806] Sep 03, 2010Finished - Maintenance in Hannover Mon Sep 6th, 09:00 am
[1805] Sep 03, 2010Maintenance finished in Hannover - SAM-QFS / PERM / qfs1
[1798] Sep 02, 2010Finished: Downtime and Maintenance Day in Berlin at ZIB, Sep 6-7, 2010
[1801] Sep 01, 2010HLRN OpenMPI apps will fail
[1799] Sep 01, 2010Intel Compiler Suite Professional Edition v11.1.073 installed
[1793] Aug 31, 2010HLRN Batch system upgrade finished
[1789] Aug 27, 2010HLRN Berlin MPI network outage
[1786] Aug 26, 2010HLRN Maintenance Mon Aug 30 09:00
[1784] Aug 16, 2010Maintenance Hannover XE finished
[1782] Aug 16, 2010Maintenance Hannover finished
[1780] Aug 12, 2010Maintenance in Hannover on Monday Aug 16th, 11:30 am
[1776] Aug 10, 2010Resolved: HLRN Berlin not accessible
[1772] Aug 09, 2010Resolved: HLRN Berlin not accessible
[1770] Aug 06, 2010HLRN Hannover - Archive filesystem available again
[1768] Aug 06, 2010HLRN Hannover - Archive file system
[1764] Jul 27, 2010New msub filter installed
[1761] Jul 23, 2010Resolved: Maintenance Berlin ICE Monday July 26
[1758] Jul 20, 2010Power Outage in Berlin, Complex BICE down (Resolved)
[1755] Jul 19, 2010Berlin ICE2 not available from July 26 to August 01
[1752] Jul 12, 2010SGI MPT 1.26 available
[1750] Jul 12, 2010HLRN Hannover - $PERM available again
[1748] Jul 05, 2010Hannover up again - Maintenance tomorrow cancelled
[1747] Jul 05, 2010Maintenance Berlin finished
[1743] Jul 03, 2010Hannover with Lustre problems
[1739] Jul 01, 2010Scheduled Maintenance in Berlin and Hannover
[1736] Jun 29, 2010Hannover up
[1734] Jun 29, 2010Hannover down
[1733] Jun 23, 2010GROMACS 4.0.7 available at HLRN
[1729] Jun 22, 2010HLRN - Hannover up
[1724] Jun 16, 2010Hannover up
[1721] Jun 16, 2010Hannover up
[1719] Jun 15, 2010Berlin complex down
[1717] Jun 15, 2010HLRN - Hannover down
[1714] Jun 14, 2010Hannover up
[1710] Jun 12, 2010Berlin complex drained
[1708] Jun 10, 2010HLRN Hannover maintenance finished
[1705] Jun 01, 2010Newsletter: Nineth edition of HLRN Informationen published
[1703] May 27, 2010Environment modules restructured
[1700] May 26, 2010CP2K available
[1699] May 25, 2010Das HLRN-System bei der Langen Nacht der Wissenschaften in Berlin
[1697] May 17, 2010Solved: HLRN Berlin Problem with $WORK
[1695] May 12, 2010Intel Compiler Suite Professional Edition v11.1.072 installed
[1693] May 11, 2010Berlin ICE-2 not available from 17 May to 21 May
[1690] May 10, 2010Solved: HLRN site Berlin down
[1687] May 04, 2010HLRN maintenance - Downtime Hannover complex finished.
[1685] Apr 29, 2010HLRN maintenance - Downtime Hannover complex
[1683] Apr 20, 2010HLRN Hannover - Lustre Problem solved
[1681] Apr 19, 2010HLRN Hannover - Lustre problems
[1678] Apr 16, 2010Hannover up again
[1675] Apr 14, 2010Emergency maintenance HLRN site Berlin Thu Apr 15 finished
[1672] Apr 12, 2010VASP 4.6.36 and 5.2.2 installed with VTST Tools
[1670] Apr 08, 2010Ferret Versions 6.5 and 6.6 available
[1664] Mar 31, 2010MVAPICH2 1.4.1 available
[1660] Mar 18, 2010Gaussian 09 available
[1657] Mar 12, 2010Resolved: Lustre problems in Berlin
[1656] Mar 10, 2010Stellenausschreibung an der TU Hamburg-Harburg
[1654] Mar 09, 2010Gaussian03 with NBO 5.9 available
[1652] Mar 09, 2010MOLPRO 2006.1 available at HLRN
[1648] Mar 08, 2010Database server maintenance
[1645] Mar 05, 2010HLRN Hannover up again
[1642] Mar 04, 2010HLRN Hannover ready for use
[1639] Mar 03, 2010HLRN Hannover - Lustre problems
[1637] Mar 02, 2010HLRN Hannover - Archive filesystem available again
[1635] Mar 02, 2010HLRN Hannover - Archive file system
[1634] Feb 26, 2010Batchsystem Problem after msub -I
[1632] Feb 24, 2010New ABAQUS license server for ni* users
[1630] Feb 11, 2010HLRN Berlin ready for use
[1628] Feb 10, 2010HLRN Hannover ready for use
[1626] Feb 08, 2010Charge formula for ICE2 nodes changed
[1622] Feb 05, 2010Maintenance on Wednesday, February 10
[1621] Feb 03, 2010STAR-CCM+ Version 4.06 is installed
[1620] Feb 01, 2010TotalView Debugger 8.7 available
[1616] Jan 28, 2010Quota enforcement on $HOME
[1611] Jan 28, 2010Archive facilities at HLRN (Hannover site) available.
[1610] Jan 28, 2010Tape robot maintenance in Hannover Tuesday Feb. 2nd 2010
[1607] Jan 27, 2010solved: $HOME not available on Berlin XE and data nodes
[1605] Jan 21, 2010ICE2 memory upgrade finished
[1603] Jan 20, 2010HLRN Berlin $HOME replication is delayed
[1601] Jan 18, 2010HLRN Site Berlin available again
[1598] Jan 15, 2010HLRN Berlin $HOME was not available 14.01.2010
[1596] Jan 15, 2010Additional HLRN login servers available
[1594] Jan 14, 2010HLRN Site Berlin not available 18.01.2010
[1592] Jan 12, 2010HLRN Hannover - Lustre Problem solved
[1590] Jan 12, 2010HLRN Hannover - Lustre problems
[1589] Jan 08, 2010Parallel Programming Workshops


List of potentially corrupted files on /gfs1
[1880] Dec 30, 2010

Dear HLRN-User,

as reported before, last week`s filesystem checks left numerous files
on /gfs1 corrupted. We have now assembled for each user a list of
potentially affected files(*). Since there is no backup for
/gfs1/work we cannot determine whether a file was actually modified,
hence "potentially" affected files. It is also not guaranteed that
these are the only files that may have been corrupted. Nevertheless
we decided to give you what information we have at this time hoping
it may be useful to some of you.

(*) The list of files is called "hannover_gfs1_scratch_files_null_byte_2010_USERACCOUNT.txt"
and is located in your home directory. If that file is missing you
are lucky and no files showed up corrupted for your account.

Best Wishes for 2011!

The HLRN-Team



Previous Message:
-----------------

HLRN Hannover online again



HLRN Hannover online again
[1878] Dec 22, 2010

Dear HLRN-User,

thanks to SGI for a filesystem check that terminated successfully
well ahead of schedule. Filesystem /gfs1 is back in a consistent
state.

The bad news is that filesystem errors had to be fixed which left a number of files corrupted or cut to size zero. So far, the best heuristic to identify potentially modified files is via:

lfs find --obd lustre-OST0038_UUID /gfs1/work/USERID

While reopening HLRN Hannover for production we continue to work on
this case.

If you need individual assistance related to this outage please let
us know.

The HLRN-Team


Previous Message:
-----------------
Filesystem check on /gfs1 in Hannover



Filesystem check on /gfs1 in Hannover
[1875] Dec 20, 2010

At 14:30h today we will perform another filesystem check on /gfs1 in Hannover. The check will take about 60 hours. After that we will know more.

The check requires that you logout at the Hannover site by 14:30h.

Thank you very much for your cooperation.

The HLRN-Team



Previous Message:
-----------------
Preventive drain initiated in Hannover

It is possible that we will carry out a file system check on HLRN Hannover starting Monday Dec. 20 10.00 a.m. So new jobs in Hannover currently will only start if their wall clock limit allows them to finish before that point in time. We can not yet say how long this new check will take, if it takes place at all. It may either be just a short check or take two-three days.

We apologize for this additional inconvenience and the imprecise announcement, but we think it is important to take care that really everything gets fixed as soon as possible.

In addition, since some files currently are inaccessible, it has been decided that the automatic file deletion routine on /gfs1 in Hannover will be stopped next Tuesday (Dec 21st) and get reactivated on Tuesday Jan 4th, 2011.


Gabriel Gaus



Preventive drain initiated in Hannover
[1873] Dec 17, 2010

It is possible that we will carry out a file system check on HLRN Hannover starting Monday Dec. 20 10.00 a.m. So new jobs in Hannover currently will only start if their wall clock limit allows them to finish before that point in time. We can not yet say how long this new check will take, if it takes place at all. It may either be just a short check or take two-three days.

We apologize for this additional inconvenience and the imprecise announcement, but we think it is important to take care that really everything gets fixed as soon as possible.

In addition, since some files currently are inaccessible, it has been decided that the automatic file deletion routine on /gfs1 in Hannover will be stopped next Tuesday (Dec 21st) and get reactivated on Tuesday Jan 4th, 2011.


Gabriel Gaus



Newsletter: Tenth edition of HLRN Informationen published
[1871] Dec 17, 2010

The tenth edition of the HLRN newsletter HLRN Informationen (in German) is available for download at
https://www.hlrn.de/home/view/NewsCenter/NewsLetter.
This web page also contains instructions for (un-)subscribing to the newsletter mailing list.

From the content:
  • Information about the final configuration of the HLRN-II system
  • Project report on "Solving Two-time Quantum Kinetic Equations"
(wwb)



File system problems on /gfs1 at site Hannover
[1869] Dec 16, 2010

Dear HLRN-User,

both HLRN sites recently performed maintenance Lustre file system checks. The checks in Berlin and on /gfs2 ($WORK2) in Hannover were successful. At /gfs1 in Hannover the check aborted for reasons still unclear. Unfortunately, we found in many accounts numerous files inaccessible by commands ls, cp, etc. It is still unclear whether these files can eventually be recovered. Of course, we and our partner SGI are busy to resolve this issue.

Hints point to OST No. 56 of Lustre file system /gfs1 as a potentially faulty component. - Now, instead of closing down entire /gfs1 at the moment we decided to set OST 56 read-only from tomorrow on (Fr. 17/12/2010, 10:00h) with the consequence that all files residing on this OST cannot be moved, deleted or modified anymore. This might affect about 3% of all files in /gfs1, i.e. files for which the command "lfs getstripe -q " returns the number 56.

We sincerely hope that within a few days at most a fix for /gfs1 will be made available to us. Unless that happens, eventually we might have to re-initialise /gfs1, probably around mid-January 2011. We would do this only after a separate announcement and with time to backup your data.

For now we recommend:

1)        Precaution: Backup vital data from /gfs1 (Hannover) to preferably $PERM.

2)        Be alert that some of your files on /gfs1 may not be readable or writeable. (Newly created files will go to a different component in the file system and should not be affected.)

3)        If convenient use /gfs2 at Hannover (instead of /gfs1) or switch the HLRN site at Berlin.

We will do what we can to resolve this situation as fast as possible and will keep you informed on further progress. - Our appologies for any inconvenience incurred!

HLRN site staff






Previous message was:
---------------------
Maintenance in Hannover extended

The file system check needs more time to run in order to correct (currently only minor) problems. We hope to get back into production Wednesday evening.

We apologize for the delay.

Gabriel Gaus



Hannover HLRN Maintenance ANSYS license server on Dec 7th
[1864] Dec 02, 2010

In preparation for the installation of ANSYS programs version 13.0 (ANSYS, CFX, Fluent) we have to perform an upgrade of the ANSYS license server. We will accomplish this on tuesday, Dec 7th, starting at 9 o-clock. Please note that at this time running ANSYS/CFX/Fluent jobs will presumably be affected.

GB


Hannover HLRN tape system upgrade - finished
[1861] Dec 01, 2010

Hannover tape system upgrade is finished for this week.

Gabriel Gaus


The original message was:
-------------------------
Starting tomorrow, there will be an upgrade of the tape / backup system in Hannover. It is expected that everything will be available again on Friday. The disk cache that buffers requests is not affected.
[...]



Finished: Maintenance in Hannover December 3-9
[1859] Dec 01, 2010

Since Thu Dec 09, 16:45 HLRN site Hannover is available to users, again.

$WORK2 passed the file system check successfully.

There remain problems with some existing files in $WORK, which are expected to become solved in the near future.

Thank you for your patience.

/Bka


The original announcement was:

A preventive file system check on the Lustre file systems is scheduled for Friday December 3rd, 7:00 a.m. in Hannover. It will probably take until Monday for the check to finish. The complex will be completely unavailable during this time. Berlin should not be affected by this measure.

We expect the system to be available again Monday in the late afternoon and will post a message as soon as you can continue your work in Hannover.

Thank you in advance for your patience.

Gabriel Gaus


Hannover HLRN tape system upgrade
[1855] Nov 29, 2010

Starting tomorrow, there will be an upgrade of the tape / backup system in Hannover. It is expected that everything will be available again on Friday. The disk cache that buffers requests is not affected.

This has the following impact on your work:

  • file system $PERM (/qfs1/perm) will stay available

  • you will be able to copy files to the archive up to the limits of the disk cache (currently about 20 TB are free)

  • you will be able to copy files from the archive as long as they are still online, i.e. cached on disk


If you need a particular file during the maintenance period, you can check its status by using the following command on hd1-3:

sls -D

If you see the keyword offline in the output, the file is not cached anymore. In this case you may either request it to be cached again with the command stage (available on hperm1 and hperm2 only) or simply copy it to your working directory. Please only stage/copy the files you really need. There is limited capacity, and with mass copying you could accidentally prevent other users from accessing their files during the upgrade.

Gabriel Gaus



Trouble in Hannover with XE and data servers - finished
[1852] Nov 19, 2010

Starting last night, we had problems with the active home server in Hannover, resulting in $HOME being unavailable on the XE cluster and hd1-3. The problem should be solved now. We apologize for the trouble and will reimburse NPL that were lost due to this problem.

Gabriel Gaus


Finished: Maintenance on BICE November 26-29
[1849] Nov 18, 2010

Since Mon Nov 29, 17:45 HLRN site Berlin is available to users, again.

Thank you for your patience.

/Bka

The original announcement was:

Maintenance will be performed on the Berlin complex BICE from Friday, November 26, 9:00 until Monday, November 29, afternoon. BICE will be unavailable during this time.

Maintenance comprises various firmware and system updates as well as preventive full low- and high-level checks on both Lustre file systems (/gfs1 and /gfs1).

We expect BICE to be back online during Monday afternoon, and we will post a message when the system is fully available again.

Thank you for your patience.

(wwb)


Berlin "perm-servers" available
[1848] Nov 15, 2010

In Berlin two new servers, bperm1 and bperm2, are now available. They
provide direct access to the PERM file system. These "perm-servers" are
running under the Solaris operating system. They offer the full set of
SAM-FS commands, in particular the "squota" command. For more
information see:

https://www.hlrn.de/home/view/System/SamFS

(stb)



Hannover up again
[1845] Nov 09, 2010

HLRN complex Hannover is available again. Jobs in Hannover with a completion time from midnight until now will get a refund for the NPL they used.

Thank you for your patience.

Gabriel Gaus


The original message was:
----------------------------
Due to a problem with the Lustre file system (/gfs2)
the HLRN complex at Hannover is closed.



Lustre problem in Hannover
[1843] Nov 09, 2010

Due to a problem with the Lustre file system (/gfs2)
the HLRN complex at Hannover is closed.

We will post a message when the problem is solved and apologize
for the inconvenience.

Harald Schwier



Lustre problem in Hannover
[1839] Nov 08, 2010

Due to a problem with the Lustre file system (/gfs1)
the HLRN complex at Hannover is closed.

We will post a message when the problem is solved and apologize
for the inconvenience.

Harald Schwier



Finished: HLRN Berlin ICE Network Maintenance
[1835] Oct 29, 2010

Since Thu Nov 4, 16:00 the ICE1 and ICE2 clusters at HLRN site Berlin are available to user jobs, again.


The original announcement was:

A network maintenance will be performed at HLRN site Berlin for the ICE1 and ICE2 clusters on Thursday, November 4 starting at 11:00.

Batch jobs that request the features ice, ice1, ice2, smem, lmem or the queues mediumq, bigq will not be scheduled to start during this time.

Interactive work as well as batch jobs requesting the XE or UV clusters are not influenced.

A note will be posted when usage of the ICE clusters is possible, again.

/Bka


NAMD 2.7 available
[1833] Oct 26, 2010

The molecular dynamics package NAMD version 2.7 is available at HLRN.

See NAMD for a description of the package.

The current module default is still version 2.6. It will be switched to 2.7 on Wed Nov 03. Users that wish to use 2.6 after this date, have to load the version-specific module file namd/2.6 explicitely.

/Bka


HLRN Berlin Network Maintenance
[1830] Oct 18, 2010
A network maintenance will be performed on the Berlin complex BICE on Thursday, October 21 from 9:00 until about noon. BICE will be unavailable during this time. Please see the HLRN Status Page for updates on the availability of the Berlin complex.

(wwb)


Project proposals - next deadline October 28, 2010
[1828] Oct 12, 2010

The next deadline for project proposals is October 28, 2010.

Projects that need more than 2000 NPL computing time per quarter have to submit a project proposal ("Großprojektantrag") for review to the Scientific Board ("Wissenschaftlicher Ausschuss") of HLRN. Resources are allocated for one year on a quarterly basis after successfull review of the proposal (see the " Application HowTo" and the Scientific Board portal page).

Please contact your local HLRN support person or your HLRN project consultant before submitting the proposal.

(wwb)



VASP 5.2.8 available
[1826] Oct 12, 2010

The quantum chemistry package VASP version 5.2.8 is available at HLRN.

See The Vienna Ab initio Simulation Package for a description of the package.

The current module default is still version 5.2.2. It will be switched to 5.2.8 on Monday Oct 18. Users that wish to use 5.2.2 after this date, have to load the version-specific module file vasp-5.2/2 explicitely.

Users that are interested in calculating dispersion corrections with VASP are encouraged to read the corresponding topic.

/Bka


UV systems available for all HLRN users
[1824] Oct 07, 2010

The UltraViolet systems (UV) have been installed at HLRN.
Since Thu Oct 07, 14:30 these systems are available to all HLRN users.

See Using the UV systems for quickstart instructions.

The very basic instructions are:
  • The UV is requested e.g. using the resource specification -l feature=uv.
  • UV jobs are not charged to your HLRN account until October 31st 2010.

(ml)



Resolved: Lustre problem on /gfs1 in Berlin
[1821] Sep 27, 2010

Since 7:50 today (Tue Sep 28) the Berlin complex is fully available again. Thank you for your patience.

The original mesage was:

Due to a problem with the Lustre file system /gfs1 the Berlin complex is closed since 11:50 on Mon, Sep 27.

We will post a message when the problem is resolved.

We apologize for the inconvenience.

(wwb)


Finished - Maintenance of data servers / PERM / qfs in Hannover
[1817] Sep 10, 2010

PERM (qfs1) is available again in Hannover.

Thank you for your patience.

Harald Schwier





Maintenance of data servers / PERM / qfs in Hannover Fri Sep 10th, 11:00 am
[1815] Sep 09, 2010

We will have a maintenance of the data servers in Hannover on Fri Sep 10th starting 11:00 am. During this time:


  • all servers hd and hperm are unavailable and users on these machines will be logged out

  • Hannover file systems /qfs (PERM) are unavailable



A message will be posted when the maintenance is finished.

As always, we apologize for the inconvenience.

Gabriel Gaus



Default Compilers to be removed from the User Environment (Module PROGenv)
[1810] Sep 09, 2010

In preparation for the additional UltraViolet SMP system, to be accessible in the near future, HLRN has decided to remove the current default Intel compilers from the default HLRN environment. The default compilers are outdated and do not make use of the better optimization features of the new compiler versions.

On Wednesday, September 22 the module PROGenv will be removed from the default user environment. This means in particular: No Intel compiler is loaded.

Until then we strongly recommend to test your batch jobs and your interactive work in the modified environment immediately. For that you may call module unload PROGenv in your interactive shell and also include that command in your shell setup and your batch jobs.

If needed (e.g. your program ran fine before PROGenv was removed and now complains about program symbols or libraries), the present environment may be reestablished after September 22 by loading the module PROGenv yourself.

To see the new recommended compilation workflow, please visit the HLRN Programming web page, see also the Quickstart Guide.

(ml/wwb)



CFD package Fluent 6 discontinued, ANSYS CFD 12 replaces Fluent
[1812] Sep 08, 2010

As of September 1, 2010 the availability of the CFD package Fluent has been discontinued on the HLRN system. The licenses have been transferred to ANSYS CFD Version 12, which is immediately available on both complexes. Please see the updated HLRN web pages for details.

(wwb)


FEM package ABAQUS versions 6.9-2 and 6.10-1 available
[1811] Sep 08, 2010

Versions 6.9-2 and 6.10-1 of the finite element analysis package ABAQUS are available on the HLRN system.

Version (6.8-1) will remain the default until October 12, 2010, when the default will be switched to the latest version 6.10-1. Until then the module load abaqus command points to 6.8-1.

See the HLRN web pages for an updated documentation of ABAQUS usage at HLRN.

(wwb)


Finished - Maintenance in Hannover Mon Sep 6th, 09:00 am
[1806] Sep 03, 2010

Since Mon Sep 06, 16:15 batch system work at HLRN site Hannover has been resumed.

The issue described in Announcement 1801 has been resolved. That means, OpenMPI applications are possible, again.

Note the upcoming maintenance at HLRN Berlin starting this evening, see Announcement 1798.

/Bka


The original announcement was:

The system in Hannover is reserved for an update of the batch system on Mon Sep 6th starting 09:00 a.m.

Batch processing will not be available at this time, but interactive work should be possible. We expect the maintenance to finish on the same day.

A message will be posted when the maintenance is finished.

Gabriel Gaus



Maintenance finished in Hannover - SAM-QFS / PERM / qfs1
[1805] Sep 03, 2010

PERM / qfs1 is available again in Hannover.
Batch processing on the data nodes has been resumed.

Thank you for your patience.

Gabriel Gaus


The original message was:
-------------------------

Hannover will have a maintenance of the complete tape system (including disk cache) starting tomorrow 11:30 a.m. The maintenance may last until Friday night.

[...]


Finished: Downtime and Maintenance Day in Berlin at ZIB, Sep 6-7, 2010
[1798] Sep 02, 2010

Since Wed Sep 08, 02:45 HLRN site Berlin is available to users, again.

We thank you for your patience.

/Bka


The original announcement was:

The annual infrastructure maintenance will be performed at ZIB from 18:00 on Monday, September 6 until Tuesday, September 7 evening. During this time the Berlin complex BICE and some HLRN services will be partially or fully unavailable for ALL users. The schedule concerning user service during the downtime is as follows:

On Monday, September 6:
  • At 18:00 End of user service on BICE; interactive sessions and all running batch jobs on BICE are terminated.
On Tuesday, September 7:
  • Between 8:00 and approx. 17:00:
    • ZIB may be partially or entirely cut off from electric power and from the Internet
    • ZIB personnel can be contacted by phone, only
    • Email traffic to/from ZIB will be interrupted, but mail will not get lost
    • The web server of the Scientific Council including the user and project database server (zulassung.hlrn.de) will be unavailable
    • The license server serving the ZIB licenses for ABAQUS, ANSYS, and TotalView will be unavailable
  • Tuesday evening: Production on BICE and the services are expected to resume.
All other services, e.g. the HLRN web and mail servers will remain available during the maintenance.

Please plan accordingly.

(wwb)



HLRN OpenMPI apps will fail
[1801] Sep 01, 2010

The batchsystem upgrade done on HLRN machines on Mon, Aug 30 introduced a bug, which prevents OpenMPI applications to start successfully on more than one node.

The bug affects the internals of the OpenMPI-mpiexec and related starters. The pbsdsh command is affected as well by the same bug.

In most cases the OpenMPI-mpiexec command will simply hang until the job wallclock time is reached. No output or error messages will be seen.

Currently there is no workaround. We expect the bug to become fixed soon. However, installation of a fixed Torque version will be done not before next week.

Independently of the fix for the problem, HLRN recommends OpenMPI users to port their codes to other MPI libraries that are available at HLRN, if this is possible. See MPI Implementations at HLRN for an overview. The preferred choice is SGI-MPT, which is fully supported by our current system vendor via our maintenance contract.

In addition it has to be noted, that OpenMPI shows poor communication performance on HLRN machines for the majority of applications, when comparing with SGI-MPT, IntelMPI or MVAPICH2.
In addition it has to be noted that MPI libraries other than SGI-MPT will not be supported on the upcoming UltraViolet architecture. Thus, users that use another MPI library than SGI-MPT will not be able to give this promising architecture even a try.

/Bka



Intel Compiler Suite Professional Edition v11.1.073 installed
[1799] Sep 01, 2010

The Intel Compiler suite v11.1.073 is available at HLRN.

It includes the recent versions of
  • C/C++ compiler
  • Fortran compiler
  • Debugger
  • Math Kernel Library (MKL)
  • Integrated Performance Primitives Library (IPP)
  • Threading Building Blocks C++ Template Library (TBB)
The module file intel.compiler/11.1.073 sets the necessary environment.

The HLRN login environment still loads Intel compiler v10.1.015 as system default until further notice. To switch to another version while compiling or running executables, it is recommended to unload any loaded icc, ifort and mkl module files, load a compiler module of your choice for compiling and linking only, and prepare the executables/libraries to work without loading any compiler module file.

See Intel compiler documentation at HLRN for usage instructions.

Visit online documentation provided by Intel for details.

Release notes can be found on the HLRN ISV documentation server.

/Bka


HLRN Batch system upgrade finished
[1793] Aug 31, 2010

Since Tuesday Aug 31, 00:30 batch system work is resumed on both HLRN sites.

The upgrade includes the following major policy changes:

  1. The qsub command has been disabled completely. Even interactive jobs have to be submitted with msub. Thus, all policy settings apply to both normal as well as interactive jobs to the same extent. Explicitely specifying a queue for interactive jobs is not mandatory anymore.
    See the updated user documentation for further information.

  2. STDOUT and STDERR of a running batch job are spooled under the final name that the output and error files will receive, instead of having an intermediate file in $HOME and then copying the result to the final file when the job has completed.
    See the updated user documentation for further information.


We apologize for the delay.

/Bka



HLRN Berlin MPI network outage
[1789] Aug 27, 2010

On Fri Aug 27 at 09:37 there was a network outage in the MPI communication network at HLRN site Berlin, caused by repair actions.

A number of running jobs aborted because of that.

The problem has been solved now. We apologize for the inconvenience.

/Bka


HLRN Maintenance Mon Aug 30 09:00
[1786] Aug 26, 2010

On Monday, August 30 09:00 both HLRN complexes will undergo a short maintenance to update the batch system software stack.

Processing of batch jobs will be stopped. Interactive work will be allowed, but submission of new batch jobs will fail during the installation process.

We expect to be able to resume batch system work the same day before noon.

/Bka




Maintenance Hannover XE finished
[1784] Aug 16, 2010

The XE (smallq) in Hannover is open again.

Gabriel Gaus


Maintenance Hannover finished
[1782] Aug 16, 2010

Hannover ICE 1 and 2 are available again and batch processing has resumed. The XE (smallq) still needs some work.

Thank you for your patience,

Gabriel Gaus


The original message was:
-------------------------
To improve Lustre stability and to prepare for the integration of the new UV servers, maintenance is scheduled in Hannover for next Monday, Aug 16th, starting 11:30.
[...]

As a second measure, hicegate2 will get an upgrade to SLES 11. [...]




Maintenance in Hannover on Monday Aug 16th, 11:30 am
[1780] Aug 12, 2010

To improve Lustre stability and to prepare for the integration of the new UV servers, maintenance is scheduled in Hannover for next Monday, Aug 16th, starting 11:30. Login sessions still active at this time will be closed and batch processing will be suspended. We hope the maintenance only takes a few hours. Please plan accordingly.

As a second measure, hicegate2 will get an upgrade to SLES 11. Please use hg0 and hg1 (or hd1, 2, 3) until this installation is finished and verified.

Messages will be posted when the systems are ready again.

Gabriel Gaus


Resolved: HLRN Berlin not accessible
[1776] Aug 10, 2010

Since Tue Aug 10, 21:15 HLRN Berlin is accessible for users, again.
Since Tue Aug 10, 23:15 also batch processing on the XE cluster has been resumed.


The original announcement was:

Since Tue Aug 10, 13:15 there is no access to the $WORK2 filesystems at HLRN site Berlin, again.

Running batch jobs may abort. Log in to Berlin login and data servers has been disabled. Open interactive sessions may become terminated.

We hope to solve the problem soon.
We apologize for the inconvenience.

/Bka


Resolved: HLRN Berlin not accessible
[1772] Aug 09, 2010

Since Tue Aug 10 01:00 HLRN site Berlin is open for users, again.
Batch system work on ICE1 and ICE2 clusters has been resumed.

The Berlin XE cluster still needs maintenance. Thus, batch jobs requesting feature=xe (or one of the classes smallq or serialq) will not become scheduled to run until further notice.


The original announcement was:

Since Sat Aug 07, evening there is no access to $WORK and $WORK2 filesystems at HLRN site Berlin.

Running batch jobs have been aborted. Log in to Berlin login and data servers has been disabled.

HLRN site Hannover is not influenced by this event.

We hope that the situation will be resolved during Tue Aug 10.

We apologize for the inconvenience.

/Bka


HLRN Hannover - Archive filesystem available again
[1770] Aug 06, 2010

Dear HLRN Users,
the data nodes and archive servers (hdx, hpermx) at Hannover
are available again.

Harald Schwier





HLRN Hannover - Archive file system
[1768] Aug 06, 2010

Dear HLRN Users,
due to technical problems concerning our tape robot a
short term maintenance is necessary. Access to the data nodes
and archive servers (hdx, hpermx) at Hannover is closed and
active login sessions had to be killed.

We apologize for the inconveniences caused by that.

The maintenance should require about an hour.

Harald Schwier



New msub filter installed
[1764] Jul 27, 2010

To be prepared for the upcoming installation of SGI UltraViolet systems, the HLRN submit filter functionality was extended.

The HLRN submit filter is called during execution of the msub command transparently to the user, and checks/applies site policies to a job script before submission.

The new msub filter gives hints and sometimes rejects jobs with wrong resource requests already at submit time, instead of let these jobs become queued and then rotted.

Jobs with correct setups are not affected, but additional INFO messages regarding application of HLRN policies may be seen on STDERR. This is intended and not an error.

If you feel that your job scripts are not handled appropriately, please report this immediately via the Error report form.

/Bka


Resolved: Maintenance Berlin ICE Monday July 26
[1761] Jul 23, 2010

Since Monday July 26, 12:10 the following issue has been resolved:

A short maintenance on Berlin ICE1 and ICE2 clusters is necessary
to repair network components that are responsible for connecting from ICE compute nodes to outside HLRN Berlin.

Batch system work will be stopped on Berlin ICE clusters on
Monday July 26 08:00.

We expect to continue batch system work before noon. Note that after the maintenance the Berlin ICE2 cluster remains reserved for a big batch job (see Announcement 1755).

Interactive work as well as processing of batch jobs on XE nodes is not influenced.

Until the problem has been solved, the following will not work:
* scp or ssh out of batch jobs running on ICE to machines outside HLRN
* graphical debugging with totalview within running batch jobs

/Bka



Power Outage in Berlin, Complex BICE down (Resolved)
[1758] Jul 20, 2010
Update Jul 21, 17:45: After successful hardware repair action the Berlin complex BICE is back online and in full production since Jul 21, 17:45. NPLs that were lost due to job crashes yesterday have been reimbursed.

Thank you for your patience.

(wwb)

Update Jul 20, 15:25: Due to serious hardware damage in a central network switch BICE will remain unavailable until at least Wednesday, Jul 21 afternoon.

The original announcement was:

Due to a power outage in the Berlin-Dahlem area today (Tuesday, July 20 10:30) the HLRN complex BICE is down. We expect to be back online during this afternoon. For the current HLRN status see our Status web page.

We apologize for this inconvenience and thank you for your patience.

(wwb)


Berlin ICE2 not available from July 26 to August 01
[1755] Jul 19, 2010

The ICE2 cluster (bigq) in Berlin is reserved for a very big run from

July 26 2010 noon until August 01 2010 noon.

Only small jobs requesting 16 to 32 nodes will be scheduled to run with a low throughput in queue bigq on ICE2 during this time. Other jobs requesting feature=ice2 and/or queue bigq will wait in the queue until the reservation has finished.

We recommed to use the ICE2 at HLRN site Hannover during this time.

/Bka



SGI MPT 1.26 available
[1752] Jul 12, 2010

Version 1.26 of the SGI Message Passing Toolkit has been installed at HLRN.

This version includes a patch that makes MPT compatible to the behaviour of MPICH-related MPI implementations like MVAPICH2 or IntelMPI regarding the interpretation of the MPI standard for MPI_COMM_CREATE.
In the past, this incompatibility made it impossible to use MPT for certain applications that relied on the behaviour of MPICH when creating new communicators.

HLRN will switch the default MPT module version to 1.26 on Thu July 15. The current default version is mpt/1.22. Moreover, the temporary test releases mpt/1.24-beta, mpt/1.24fx and mpt/1.26-beta will be removed.

See SGI MPT at HLRN for further information.

/Bka


HLRN Hannover - $PERM available again
[1750] Jul 12, 2010

$PERM is available again

Gabriel Gaus


The original message was:
-------------------------
Due to problems concerning high temperature the
$PERM filesystem is not available in Hannover.

[...]



Hannover up again - Maintenance tomorrow cancelled
[1748] Jul 05, 2010

Hannover finally is up again.

Maintenance for tomorrow has been cancelled since the necessary tasks have already been carried out today.

Please accept our apologies for the prolonged downtime. As always, please contact your consultant if you lost NPL due to system-related job crashes.

Thank you for your patience,

Gabriel Gaus


The original message was:
-------------------------
Hannover has a Lustre problem requiring a file system check.

[...]



Maintenance Berlin finished
[1747] Jul 05, 2010

Since Mon July 05, 17:00 HLRN site Berlin is available to users, again.
Processing of batch jobs has been resumed.

During the scheduled maintenance additional fixes of known problems were applied, which caused a delay in resuming normal production mode.

Thank you for your understanding.
/Bka


Hannover with Lustre problems
[1743] Jul 03, 2010

Hannover has a Lustre problem requiring a file system check.

Login is still open in view of the planned maintenances, but no new jobs will start. You may experience "access denied" messages when trying to write to some of your already existing files. Read access should be ok. The situation probably will remain as is until at least Monday since sgi will need time to analyze the problem.

As always, we apologize for the situation and will post a message when the situation improves...

Gabriel Gaus




Scheduled Maintenance in Berlin and Hannover
[1739] Jul 01, 2010

The following maintenances are scheduled for the two HLRN sites:

Berlin: Monday July 5th, starting 10:30 am
Hannover: Tuesday July 6th, starting 09:30 am

During this time, batch processing will pause and login will be closed on the site having the maintenance. Existing login sessions may be terminated. Please plan accordingly.

By this measure we hope to improve the stability of the Lustre file system (WORK/WORK2).

A message will be posted when the respective complex is available again. We intend to be up again within two hours.

Gabriel Gaus




Hannover up
[1736] Jun 29, 2010

Hannover is up again.

Gabriel Gaus


The original message was:
-------------------------
Hannover has a Lustre problem requiring a file system check.
[...]



Hannover down
[1734] Jun 29, 2010

Hannover has a Lustre problem requiring a file system check.

We will post a message when the problem is solved and apologize for the inconvenience.

Harald Schwier



GROMACS 4.0.7 available at HLRN
[1733] Jun 23, 2010

The molecular dynamics simulation package GROMACS version 4.0.7 is now available at HLRN.

See https://www.hlrn.de/home/view/System/GROMACS for a description of the package.

(BB)



HLRN - Hannover up
[1729] Jun 22, 2010

Hannover is up again.

Gabriel Gaus


The original message was:
-------------------------
Hannover has a Lustre problem requiring a file system check.
[...]


Hannover up
[1724] Jun 16, 2010

Hannover is up.

Gabriel Gaus


The original message was:
-------------------------
Hannover has a Lustre problem requiring a file system check...



Hannover up
[1721] Jun 16, 2010

Hannover is ready again.

Harald Schwier


The original message was:
--------------------------------------
Hannover has a Lustre problem requiring a file system check.


Berlin complex down
[1719] Jun 15, 2010

Update June 16, 11:40: Since 11:35 the Lustre problem is resolved. Batch production and interactive work have been resumed.

Thank you for your patience.

(wwb)

The original announcement was:

Due to a repeated problem with the Lustre file system the Berlin complex is closed.

We will post a message when the problem is resolved.

We apologize for the inconvenience.

(wwb)


HLRN - Hannover down
[1717] Jun 15, 2010

Hannover has a Lustre problem requiring a file system check.

We will post a message when the problem is solved and apologize for the inconvenience.

Harald Schwier



Hannover up
[1714] Jun 14, 2010

Hannover is ready again.

Gabriel Gaus


The original message was:
--------------------------------------
Hannover has a Lustre problem requiring a file system check.




Berlin complex drained
[1710] Jun 12, 2010

Update June 14, 14:00: The Lustre problem is resolved, batch production has been resumed.

The original announcement was:

Due to a problem with the Lustre file system the batch system for the Berlin complex is draining since Saturday, June 12, morning.

We will post a message when the problem is resolved.

Sorry for the inconvenience.

(wwb)


HLRN Hannover maintenance finished
[1708] Jun 10, 2010

Hannover is up and running again.

Thank you for your patience.

Gabriel Gaus


The original message was:
---------------------------------------------
Hannover has maintenance scheduled for Tue June 8th, starting at 08:30. Batch processing will stop and active login sessions will be terminated.



Newsletter: Nineth edition of HLRN Informationen published
[1705] Jun 01, 2010
The nineth edition of the HLRN newsletter HLRN Informationen (in German) is available for download at
https://www.hlrn.de/home/view/NewsCenter/NewsLetter.
This web page also contains instructions for (un-)subscribing to the newsletter mailing list.

From the content:
  • News about the HLRN-II system
  • Project report on "Zukunftsszenarien für die Ostsee"
  • Managing your login environment with environment Modules
(wwb)


Environment modules restructured
[1703] May 27, 2010
We have restructured the environment modules. Modules have been sorted into categories that are shown in all upper-case letters when the available modules are listed with module avail. The new structure is immediately active for all new logins. Existing sessions still use the old structure. Nothing has to be done on the user's side.

See our topic on modules usage for further information.

(wwb)


CP2K available
[1700] May 26, 2010

The quantum chemistry package CP2K is available at HLRN

See CP2K at HLRN for usage instructions and additional information.

/Bka


Das HLRN-System bei der Langen Nacht der Wissenschaften in Berlin
[1699] May 25, 2010
Sie arbeiten mit dem HLRN und wollten schon immer einmal den Rechner sehen? Bei der Langen Nacht der Wissenschaften in Berlin am Samstag, 5. Juni 2010 von 17-1 Uhr können Sie ihn besichtigen. Nähere Informationen (auch) zu den Rechnerführungen erfahren Sie im Programm der Langen Nacht 2010.

(wwb)



Solved: HLRN Berlin Problem with $WORK
[1697] May 17, 2010

Since Mon May 17, 13:15 $WORK at HLRN site Berlin is available, again. Batch system work has been resumed.

Thank you for your patience.


The original announcement was:

On Sat May 15 parts of the Lustre filesystem serving $WORK at HLRN site Berlin went into trouble.

Batch system work at HLRN site Berlin has been stopped. Running jobs are allowed to finish.

Interactive work is still possible, but write access to files in $WORK may fail. Read access of files in $WORK is possible. Access to all files in $WORK2 is possible, too.

SGI analyzes and fixes the problem. Resumption of batch system work is expected today Mon May 17 evening. Interactive sessions may become terminated today afternoon after short notice.

A note will be posted when the problem has been solved.

We apologize for the trouble.

/Bka


Intel Compiler Suite Professional Edition v11.1.072 installed
[1695] May 12, 2010

The Intel Compiler suite v11.1.072 is available at HLRN.

It includes the recent versions of
  • C/C++ compiler
  • Fortran compiler
  • Debugger
  • Math Kernel Library (MKL)
  • Integrated Performance Primitives Library (IPP)
  • Threading Building Blocks C++ Template Library (TBB)
The module file intel.compiler/11.1.072 sets the necessary environment.

The HLRN login environment still loads Intel compiler v10.1.015 as system default until further notice. To switch to another version while compiling or running executables, it is recommended to unload any loaded icc, ifort and mkl module files, load a compiler module of your choice for compiling and linking only, and prepare the executables/libraries to work without loading any compiler module file.

See Intel compiler documentation at HLRN for usage instructions.

Visit online documentations provided by Intel for details.

Look into /sw/intel/Compiler/11.1/072/Documentation/en_US/ for release notes.

/Bka


Berlin ICE-2 not available from 17 May to 21 May
[1693] May 11, 2010

The ICE-2 in Berlin is reserved for a very big run from
17 May 2010 noon to 21 May 2010 noon.

We apologise for the inconvenience.

(stb)



Solved: HLRN site Berlin down
[1690] May 10, 2010

Since Mon May 10 12:45 HLRN site Berlin is available again.


The original announcement was:

Over the last weekend HLRN site Berlin experienced problems related to access to Lustre file systems.

Login has been disabled. Batch system work has been stopped.
SGI and HLRN are analyzing the problem.

A note will be posted when there is anything new.

We apologize for the inconvenience.

/Bka


HLRN maintenance - Downtime Hannover complex finished.
[1687] May 04, 2010

The Hannover complex is available again for users.

(CPR)


The Original message was:
-------------------------

HLRN maintenance - Downtime Hannover complex

For the next phase of the HLRN installation the infrastructure has to be prepared for the new SMP system (SGI UltraViolet) planned for the second half of this year. The actions by SGI include hardware and software measures.

The Hannover complex has been reserved for SGI:
  • May 4, 2010 user access will be closed after 8 a.m.
The system will not be accessible for at least one day. We expect that the Hannover complex will be available for users again within early May 5, 2010.

A concluding news message will inform about the end of the downtime period for the Hannover complex. We hope that the work will go on as scheduled and apologize for any inconvenience with this downtime.

(CPR)



HLRN maintenance - Downtime Hannover complex
[1685] Apr 29, 2010

For the next phase of the HLRN installation the infrastructure has to be prepared for the new SMP system (SGI UltraViolet) planned for the second half of this year. The actions by SGI include hardware and software measures.

The Hannover complex has been reserved for SGI:
  • May 4, 2010 user access will be closed after 8 a.m.
The system will not be accessible for at least one day. We expect that the Hannover complex will be available for users again within early May 5, 2010.

A concluding news message will inform about the end of the downtime period for the Hannover complex. We hope that the work will go on as scheduled and apologize for any inconvenience with this downtime.

(CPR)



HLRN Hannover - Lustre Problem solved
[1683] Apr 20, 2010

HRLN Hannover is available for users again.

Harald Schwier




HLRN Hannover - Lustre problems
[1681] Apr 19, 2010

Login had to be closed temporarily in Hannover because of Lustre
(/gfs1) errors.

We are sorry for the inconvenience,
Harald Schwier



Hannover up again
[1678] Apr 16, 2010

Hannover is back in production.
Thank you for your patience.

Gabriel Gaus


The Original message was:
---------------------------
Production had to be stopped in Hannover due to access problems on the Lustre file systems (/gfs1, possibly /gfs2).
[...]



Emergency maintenance HLRN site Berlin Thu Apr 15 finished
[1675] Apr 14, 2010

Since Thu Apr 15 16:45 HLRN site Berlin is available to users, again.

Batch system work has been resumed. Login to login and data servers (except bd2 and bd4) has been opened.

We thank you for your patience.

/Bka



The original announcement was:

A maintenance has been scheduled at HLRN site Berlin starting

Thu Apr 15 12:00

The goal of the maintenance is to detect and correct low-level inconsistencies in Lustre file systems, which accumulated since the beginning of this year. Data loss or damage is not expected.

From now on, no new batch jobs are started. Batch jobs that are still running Thu noon will be aborted. Login sessions that are still open at this time will be closed. Login to Berlin login and data servers will be disabled.

Until then, processes that try to write into existing files that are stored in selected parts of /gfs1 and /gfs2 may abort.
Reading existing files as well as creation of new files is not influenced.

We expect finish of the maintenance Thu late afternoon.

HLRN site Hannover is not affected.

We apologize for the inconvenience.

/Bka


VASP 4.6.36 and 5.2.2 installed with VTST Tools
[1672] Apr 12, 2010

The VASP software package v4.6 has been updated from v4.6.31 to v4.6.36.
The HLRN module file vasp-4.6 loads the new version.

The VASP v4.6.36 and v5.2.2 installations at HLRN now contain additional executables, which provide the VASP Transition State Tools v2.03d ad of Feb 18, 2009. These are available when running the executable vasp_tst. Gamma-point-only versions are available, too. The VTST tool scripts are in $PATH after loading a VASP module file.

See the HLRN VASP documentation for details.

/Bka


Ferret Versions 6.5 and 6.6 available
[1670] Apr 08, 2010

Ferret Versions 6.5 and 6.6 are available at HLRN.

Version 6.5 is released to fix several bugs.
Version 6.6 supports the same features like 6.5, and opens the ferret world for netcdf-4/HDF-5 data formats.

Refer to the 6.5 release notes and 6.6 release notes for details.

The HLRN default version is rised to 6.4, recommended is 6.5. Users are encouraged to test the new features included in 6.6.

See the HLRN Ferret documentation for installation details of this package.

M. Schmidt


MVAPICH2 1.4.1 available
[1664] Mar 31, 2010

Version 1.4.1 of the MPI implementation MVAPICH2 is available at HLRN.

For a complete listing of features, see

http://mvapich.cse.ohio-state.edu/overview/mvapich2/features.shtml

For the log of changes, see

http://mvapich.cse.ohio-state.edu/download/mvapich2/changes.shtml

v1.4.0 installed at HLRN includes patches that were developed and implemented by HLRN. Because v1.4.1 includes the official versions of these patches, the HLRN default will be switched from v1.4.0 to v1.4.1 on Wed, Apr 7.

Switching will be done by modifying the default mvapich2 module file to point to mvapich2/1.4.1-intel. Executables that were linked with MVAPICH2 shared libraries will use this version from then on.

The HLRN MVAPICH2 documentation at

https://www.hlrn.de/home/view/System/MVAPICH2

will reflect this change in time.

The current MVAPICH2 default version is mvapich2/1.4.0-intel.

Users, that wish to continue using this version instead of the new default, will have to load this module file explicitely in the future.

/Bka



Gaussian 09 available
[1660] Mar 18, 2010

The quantum chemistry package Gaussian 09 is available at HLRN

See G09 at HLRN for usage of Gaussian 09 at HLRN.

Registered Gaussian users will also be informed in a separate e-mail.

/Bka


Resolved: Lustre problems in Berlin
[1657] Mar 12, 2010

Since Fri Mar 12, 22:30 HLRN site Berlin is available to users, again.

Interactive work is possible, again. Batch system work has been resumed.

Unfortunately, all batch jobs that were still running on 21:20 on the Berlin ICE clusters had to be aborted to allow a clean remount of Lustre filesystems.

Interactive work is still not possible on login node bicegate2.
Please watch HLRN Status for updates.

It is probable that jobs, which were running on Mar 12 since about 10:00, and which were using $WORK, $WORK2 and/or $TMPDIR, hung in I/O operations. Thus check your data that were produced during this time.
Also inspect orphaned *.OU and *.ER files in your $HOME. These are STDOUT/STDERR of jobs, which could not be copied to their final destination due to inaccessibility of Lustre at job end.

HLRN will refund charges for jobs, that were running in Berlin on Mar 12 after 10:00. Refunds will appear on Mon Mar 15.

We apologize for the inconvenience.

/Bka


The original message was:

There are currently Lustre problems (/gfs1) on the Berlin complex.

(stb)


Stellenausschreibung an der TU Hamburg-Harburg
[1656] Mar 10, 2010
An der Technischen Universität Hamburg-Harburg ist im Rechenzentrum ab sofort eine unbefristete Stelle zu besetzen, siehe http://www.tu-harburg.de/aktuell/stellen/S-10-08.html.

(it/wwb)


Gaussian03 with NBO 5.9 available
[1654] Mar 09, 2010

The quantum chemistry package Gaussian03 is orginally shipped with
the NBO (Natural Bond Orbital) program version 3.1.

At HLRN, Gaussian03 with NBO 5.9 is installed.

Please visit the NBO 5* home page for additional information.

See G03 at HLRN for usage of Gaussian03 at HLRN.

/Bka


MOLPRO 2006.1 available at HLRN
[1652] Mar 09, 2010

The MOLPRO quantum chemistry package has been made available at HLRN.

The current version is 2006.1 patchlevel 160.

Visit the MOLPRO home page for a complete description of the package.

See MOLPRO at HLRN for usage instructions at HLRN.

Registered MOLPRO users will be informed by a separate e-mail.

/Bka



Database server maintenance
[1648] Mar 08, 2010
Hardware and system maintenance will be performend on the user and project database server zulassung.hlrn.de on March 09, 2010, from 7:00 until 17:00. During that time the web pages for new user accounts, project applications, and administration of existing accounts and projects will be unavailable.

Since the services move to a new network address it may take until Thursday morning for the domain name service DNS to point to the new address.

(wwb)


HLRN Hannover up again
[1645] Mar 05, 2010

HLRN Hannover is available again. Additional work
to improve system stability will follow next week.

We apologize for the inconvenience and thank
you for your patience.

Gabriel Gaus




HLRN Hannover ready for use
[1642] Mar 04, 2010

HLRN Hannover is available again.
Thank you for your patience.

Harald Schwier



HLRN Hannover - Lustre problems
[1639] Mar 03, 2010

Login had to be closed temporarily in Hannover because of Lustre errors.

We are sorry for the inconvenience,
Harald Schwier



HLRN Hannover - Archive filesystem available again
[1637] Mar 02, 2010

Dear HLRN Users,
the data nodes and archive servers (hdx, hpermx) at Hannover are available again.

We are sorry for the inconvenience.

Harald Schwier



HLRN Hannover - Archive file system
[1635] Mar 02, 2010

Dear HLRN Users,
due to technical problems concerning our tape robot a short term maintenance is necessary. Access to the data nodes and archive servers (hdx, hpermx) at Hannover will be closed today (02.03.2010) at 1.00 pm.

The maintenance should require half an hour.

Harald Schwier






Batchsystem Problem after msub -I
[1634] Feb 26, 2010

After installing the latest version of the Moab batch system scheduler, a problem was observed when users submit interactive batch jobs with msub -I.

Doing so yield crashes of a Moab server daemon in certain circumstances.

To ensure normal operation of the batch system at HLRN, submit requests using msub -I are now rejected by the HLRN msub filter. Interactive jobs now have to be submitted with qsub -I, only. The syntax and behaviour of this command is the same like using msub.

Note that regular (non-interactive) batch jobs still have to be submitted with msub. For this case, qsub is disabled.

If you have questions, please contact your HLRN consultant or support@hlrn.de.

/Bka


New ABAQUS license server for ni* users
[1632] Feb 24, 2010

The ABAQUS license server for users from Niedersachsen (account names "ni*") has changed:
abaquslm_license_file="27001@lizenzserv.rrzn.uni-hannover.de"
See our ABAQUS web page for details.

(gb)


HLRN Berlin ready for use
[1630] Feb 11, 2010

Since Thu Feb 11, 00:00, HLRN Berlin is available to users, again.

Thank you for your patience.

/Bka




HLRN Hannover ready for use
[1628] Feb 10, 2010

HLRN Hannover is available again.

Gabriel Gaus




Charge formula for ICE2 nodes changed
[1626] Feb 08, 2010

Before installation of ICE2 nodes with Intel Xeon Gainestown (Nehalem) CPUs at HLRN last summer, the charge for batch jobs running on this architecture was defined to be 1 NPL for one ICE2 node per hour.

This formula was estimated having the memory bandwidth of Gainestown CPUs in mind, which is about three times higher compared to Intel Xeon Harpertown CPUs (Penryn, ICE1 and XE nodes).

Detailed performance measurements had shown, that HLRN real world applications in average do not reach a performance gain of factor three, when moving from ICE1 or XE nodes to ICE2 nodes.

Thus, HLRN decided to adapt the charge formula for ICE2 nodes. The charge for batch jobs running on ICE2 nodes is redefined to be 2 NPL for three ICE2 nodes per hour, which corresponds to an averaged performance gain of factor two compared to ICE1 and XE nodes.

The modified charge formula is in effect for all batch jobs since 01.01.2010.

Approved NPL quotas were not changed. Thus these may last a bit longer. However, HLRN expects that new project applications are submitted considering the changed ICE2 charge rate.

See Accounting regarding details of charging batch jobs at HLRN.

/Bka


Maintenance on Wednesday, February 10
[1622] Feb 05, 2010
System maintenance will be performed on Wednesday, February 10, 2010 on both complexes as follows:
  • on BICE in Berlin from 9:00 until 16:00
  • on HICE in Hannover from 9:00 until 11:00
During these times the complexes will be unavailable for interactive work and batch jobs. As the HLRN mail and web servers will be affected by the network maintenance activity, mail to and from hlrn.de adresses will be delayed, and the web pages will be inaccessible at times.

We will post a note when the complexes are available again.

(wwb)


STAR-CCM+ Version 4.06 is installed
[1621] Feb 03, 2010

The CFD package STAR-CCM+ version 4.06 is available on the HLRN system. See our STAR-CCM+ web page for details on how to access and use the package and the online documentation.

(gb)


TotalView Debugger 8.7 available
[1620] Feb 01, 2010
The TotalView debugger 8.7 is now available and has also been made the default version on the HLRN system. To use it type module load totalview. Older versions of TotalView are still available by loading the appropriate module file.

The Totalview documentation topic describes the details of the installation at HLRN.

(wwb)


Quota enforcement on $HOME
[1616] Jan 28, 2010

To improve system stability, we had to activate quotas on $HOME.

  • Up to 20 gigabytes in $HOME are permitted (soft quota).

  • Up to 50 gigabytes may be used temporarily (hard quota) for up to 14 days (grace period).

  • Use the command quota -s to view your current usage.

  • No quota has yet been activated for $WORK or $WORK2.


Non-optimal use of $HOME occurred quite frequently. Please use $WORK for batch jobs and hpn-scp to copy files between the complexes. Use $HOME for small configuration files or to edit text files.

See also:
https://www.hlrn.de/home/view/System/DataHandling


Individual quotas have been set for a few accounts that already would have reached their hard limit. They will be adjusted within the next two weeks.


Gabriel Gaus



Archive facilities at HLRN (Hannover site) available.
[1611] Jan 28, 2010

Dear HLRN Users!

The archive facilities at HLRN (Hannover site) are available.

hperm1 and hperm2 are accesssible from the gateway and data servers.

Please remember:
Only the archive servers hperm1 and hperm2 provide the full set of user commands.

For further information on archive user commands and data handling please see:
https://www.hlrn.de/home/view/System/SamFS
https://www.hlrn.de/home/view/System/DataHandling

Archive facilities at HLRN (Berlin site) are expected upcoming soon.

Yours sincerely,
your HLRN-Team

(CPR) (AG)


Tape robot maintenance in Hannover Tuesday Feb. 2nd 2010
[1610] Jan 28, 2010

The tape robot in Hannover will be down for a maintenance / upgrade next Tuesday Feb. 2nd starting 10.00 a.m. During the maintenance, functionality of PERM (/qfs1) in Hannover is limited. Only files available in the disk cache are accessible. Within the capacity limits of the disk cache, copy to $PERM is also still possible.

Gabriel Gaus



solved: $HOME not available on Berlin XE and data nodes
[1607] Jan 27, 2010

Since Thu Jan 28 early morning, $HOME is available on Berlin XE nodes, data and login servers, again. Processing of batch jobs has been resumed.

/Bka


ICE2 memory upgrade finished
[1605] Jan 21, 2010

The memory upgrade of ICE2 nodes announced in [https://www.hlrn.de/home/view/NewsCenter/Archive2009#ID-1579 this News] has been finished in time.

All 1920 ICE2 nodes (960 nodes per HLRN site) are now equipped with 48 GByte RAM per node (6 GByte per core). SGI technicians and HLRN staff had to plug nearly 8000 new DIMMs of 4 GByte each during the last weeks to achieve this.

The node feature mmem does not exist any longer. Batch jobs requesting this feature will not be scheduled. The node feature lmem is now a simple alias for the node feature ice2.

The HLRN documentation

has been updated to reflect the changes.

/Bka



HLRN Berlin $HOME replication is delayed
[1603] Jan 20, 2010

The replication of $HOME from HLRN site Berlin to HLRN site Hannover is delayed since Tue Jan 19. This is due to massive file operations in $HOME in Berlin.

We expect that $HOME in Hannover is in sync with $HOME in Berlin again at the end of this week. Replication of $HOME from Hannover to Berlin is not affected.

HLRN recommends the following:
  1. Avoid working interactively on both HLRN sites at the same time.
  2. Do not submit batch jobs that request the remote or either HLRN partition, if the jobs depend on the content of $HOME.
  3. Avoid huge file operations in $HOME. Hold data in $WORK and/or $WORK2, instead. Move data to $PERM for long-term storage.


See this document for details about handling data at HLRN.

/Bka



HLRN Site Berlin available again
[1601] Jan 18, 2010

Since Mon Jan 18, 16:30 HLRN site Berlin is available to users, again. Batch system work has been resumed.

Data servers bd3 and bd4 are not yet available for interactive log in. Please watch the HLRN status pages for availability of servers for interactive work.

/Bka


HLRN Berlin $HOME was not available 14.01.2010
[1598] Jan 15, 2010

The $HOME filesystem was not available on all machines of HLRN site Berlin on Thu Jan 14 from about 03:00 to about 10:30.

Batch jobs running during this time may have been aborted due to missing access to files or overdrawing their wallclock limit. Interactive work was not possible, too.

HLRN quoted the charges for a number of batch jobs back, that were aborted by the batch system due to wallclock time excess during this time. Please consult your responsible HLRN consultant if you feel that you forfeited more NPL due to this event than HLRN refunded.

We are sorry for the inconvenience.

/Bka


Additional HLRN login servers available
[1596] Jan 15, 2010

Since Thu Jan 14, one additional login server per HLRN site is available to HLRN users. So there are now three distinct login servers per HLRN site.

The new servers are named bicegate0.hlrn.de (short hostname bg0) and hicegate0.hlrn.de (short hostname hg0). In addition, the generic aliases bicegate.hlrn.de and hicegate.hlrn.de point to these servers, which causes them to be the default login servers for the two HLRN sites.

The new servers are equipped with 8 physical Intel Gainestown (Nehalem) cores and 148 GByte physical RAM. SMT was switched on to yield 16 virtual cores per server. The older login servers bg1, bg2, hg1, hg2 have 8 physical Intel Harpertown (Penryn) cores and 32 GByte RAM. The remaining configuration including user environment, access to file systems and software availability does not differ.

HLRN users are encouraged to make use of all available login servers to balance load. However, users that frequently need to do memory consuming short pre- or postprocessing work should prefer the new servers.

Regular massive pre- and postprocessing has still to be done via batch jobs to retain the availability of HLRN login servers to all users for doing usual interactive work like file editing, compiling or controlling batch jobs. We hope that the additional servers help to increase the quality of HLRN service regarding this.

/Bka


HLRN Site Berlin not available 18.01.2010
[1594] Jan 14, 2010

Due to reconfiguration of network components, a downtime is scheduled for HLRN site Berlin starting Mon Jan 18 2010, 10:00. Resumption of service is expected Mon Jan 18 2010, afternoon.

Batch system work will be stopped. Login to Berlin login and data servers will be disabled. Active interactive sessions will be terminated.

Delivery of HLRN mails will be delayed during the down time. The WWW servers https://www.hlrn.de and https://zulassung.hlrn.de will not be available.

A note will be posted when HLRN site Berlin is available to users, again.

We apologize for the inconvenience.

/Bka


HLRN Hannover - Lustre Problem solved
[1592] Jan 12, 2010

HRLN Hannover is available for users again.

Harald Schwier


HLRN Hannover - Lustre problems
[1590] Jan 12, 2010

Login had to be closed temporarily in Hannover because of Lustre Errors.
We are sorry for the inconvenience,
Harald Schwier




Parallel Programming Workshops
[1589] Jan 08, 2010
HLRN has scheduled workshops for parallel programming with MPI and OpenMP:
  • January 18-22, 2010 in Bremen, organized by BremHLR (workshop language is English)
  • February 15-19, 2010 in Berlin, organized by ZIB (workshop language is German)
For details see the workshop websites. Both workshops are open to all HLRN users.

(wwb)



 
Norddeutscher Verbund für Hoch- und Höchstleistungsrechnen
Back to top of page