HLRN Support and Consultancy

– How to check system status, determine problems, and get support –

Abstract

This document describes the structure of HLRN Support and its use. Hints are given for problem determination and analysis by the user and important information is listed that should be gathered before contacting HLRN Support people. That information will put the recipients of your problem report in a position to analyse and resolve your problem quickly.

HLRN Support Organisation

HLRN Support is organised by a network of consultants distributed over the participating North German states and their local computer centers.

Each project and each user is assigned an individual consultant as the primary contact for support. The consultant ("Fachberater") is usually a scientist with experience in the field of expertise of the user and geographically close to the user or project. This person should be contacted for assistance concerning any problems related to HLRN as well as any other technical, organisatorical or working area related questions arising during the carrying out of the project. This includes project application, mathematical methods, programming, optimisation, parallelisation, application software, batch job handling, data handling and storage, resource allocations, quotas, etc..

If the geographical proximity is not feasible, a second support contact is assigned, the local advisor ("lokaler Berater"), who is usually affiliated with the computing center of the user's university. This person should be contacted for assistance concerning technical access to HLRN as well as general technical questions e.g. regarding application for accounts and projects. He/she may also be contacted concerning any other problems and questions related to HLRN.

Both support contacts are assigned to the project or user when the project or account is establishd. Contact information can be found on the HLRN Contact web page. The preferred method for contact is by email. In any case please follow the guidelines on problem reports.

In addition to these support persons a number of administrative consultants are located onsite of the HLRN complexes in Berlin and Hannover. These persons have operational and administrative rights and capabilities on the systems and may be contacted in addition to the regular support personnel by electronic mail to <support@hlrn.de>. Contact this list if immediate operational intervention is required or if you have any general remarks or suggestions concerning HLRN.

HLRN also maintains a number of additional email addresses that are relevant for users.

<benutzerverwaltung@hlrn.de> Questions concerning account administrativa
<zulassung@hlrn.de> Office of the Scientific Board of HLRN. Project administrativa.
<sw-request@hlrn.de> Suggestions and requests concerning software products (commercial, open source etc.) on the HLRN system
<support@hlrn.de> Fallback for any HLRN-related inquiries.

System Maintenance

The systems may be down periodically for software and hardware upgrades. Users are usually notified of such times in advance by email to the announcement mailing list and by an entry in HLRN System News. Unscheduled downtimes are unusual but do occur. In such cases, notification to users may not be possible in a timely manner. If you cannot access the system please check the HLRN System Status web page first before you contact us.

Information about HLRN System and Activities

HLRN offers various sources of information about the system relevant for problem determination and analysis.

HLRN System Status

The HLRN System Status web page contains a brief status overview. It is generated automatically and updated every 5 minutes (note also the disclaimer). It may also contain brief notes on news for users. For details about the individual complexes, the Moab/Torque batch system and the file systems there is also a long version of this page available. Please consult this page first if you have problems accessing the system or running jobs.

HLRN News

The HLRN News Center web page is a collection of recent announcements concerning all aspects of HLRN. Older news articles and out-dated information is moved to the HLRN News Archive for later reference. Consult the News web page regularly for any announcements if you are not on the announcement mailing list or if you did not receive announcements distributed to all users.

HLRN News and Announcement Mailing List

In order to disseminate relevant information about the HLRN system in a timely manner to interested users we have introduced a special mailing list. Since this is an opt-in mailing list, users need to subscribe to it via the web form Ändern von Attributen einer Benutzerkennung. This form may also be used to unsubscribe or re-subscribe any time (a valid user account is required).

How to determine and report problems

Problem severity

Severity 1 problems comprise problems which affect ALL users and significantly impact all users ability to access HLRN, compile programs, run jobs and access and analyse data. If it appears that one or both HLRN complexes, file systems, network, or critical software are malfunctioning and preventing you from doing your work, then it is most likely severity 1. Please call your consultant or send email to <support@hlrn.de> immediately. Be prepared to provide detailed information regarding the problem as given below.

If the problems you are experiencing are individual in nature, intermittent, unrepeatable, or application specific, then it is most likely not a severity 1 problem. Please document your problem (see below) and email to your consultant or to HLRN Support for later action.

Guidelines for problem description

How you can help

You can greatly speed up the process of isolating and correcting errors you encounter while using the HLRN by providing complete and detailed information. If you report a problem to HLRN be prepared to answer a number of questions and to have certain information ready to tell or to send by email. In an ideal world this information should put the support person in a position to analyse and resolve your problem immediately without asking any further question.

General questions and information

  • Who? - User identification: name and user id (login); for problems with project allocations: project id; your phone number
  • When? - When did the problem occur? When did you notice it? This helps us to locate important information from e.g. log files quickly.
  • Where? - Where did the problem happen? Which complex (Berlin, Hannover)? Which component (MPP, SMP)? Which node (login, data)? Which compute node(s)?
  • What? - What happened? What exactly were you doing or trying to do?
  • Which errors or problems occured? Which symptoms? Which error messages did you get? Please report system or software messages literally and exactly, e.g. by copying with the mouse into an email.
    ALERT! Note: Please do not submit large data or log files by email. Give file name(s) and full path including the complex instead so we can inspect those files directly if you agree to that practice.
  • Is the problem reproducible? We can then try to repeat your case with different debug options or in a different environment and debug further.
  • What did you check, test, and reproduce already?
  • Were there any recent changes in your environment that could probably trigger the problem? What did you change?

Special types of problems

  • Access problems:
    • Is a scheduled maintenance announced? Have you checked the HLRN System Status page first?
    • From which computer did you try to access HLRN (hostname and domain? IP address? Type of operating system? Which Secure Shell version (ssh -V)?
    • Which HLRN complex do you try to access? Which node (login, data)?
  • Compiler or Run-time Errors:
    • How did you compile and run the program (command line, input parameters or file(s)); provide a makefile if possible.
    • Which software versions (compiler, software package) did you use?
    • If the program crashes, provide traceback information if available and appropriate.
    • If you are getting incorrect results, include an example of the correct results.
      ALERT! Note: HLRN staff may not be familiar with your algorithms and techniques.
    • Make every attempt to reproduce your error in a short program. Isolating the section of your long program that is creating the problem(s) will give you better turn-around when you request assistance from the consultants.
  • I/O problems:
    • Which file system do you use? Which directory? Which file(s)?
    • Have you hit your file or inode quota?
    • For files on PERM: are the files possibly migrated?
    • Corrupt files: do not remove or overwrite possibly corrupt files. Move them to another place on the same file system instead (using the mv command). Provide the following detailed information:
      • Where was the file created (Berlin/Hannover, which node(s) (login, data, compute)?
      • Exact path for the file
      • When was the file created? (Creation date and time of the file)
      • How was the file created (software)?
      • Was the file moved or copied since creation?
      • Was the file modified since creation?
      • Date and time of the last successful access to the file
      • Location of last successful access (Berlin/Hannover, which node(s))?
      • Location where the file was usually used until the error occured (Berlin/Hannover/both)?
      • Location where the error occured first (Berlin/Hannover)?
  • Batch system problems (Moab/Torque): please provide the
    • PBS job script
    • job output with runtime error messages
    • Moab and/or Torque job id; Job protocol with information about job submit, dispatch and finish time (this information is usually contained in the HLRN accounting receipt on the standard error file of the job).
  • HLRN web server: With which pages (URL?) do you have a problem? Which version of which browser do you use?

ALERT! Last but not least: Did you read relevant documentation (HLRN documentation, manuals, man pages, release notes)? Did you use our search function for the content of the HLRN website? Did you check the HLRN FAQ list? Did you use an Internet search engine? It is surprising what information and help may show up with the proper search!

Problem report form

For reporting a problem to HLRN  hlrnsmallrightarrowgo to the problem report form.


Last modification: GuidoLaubender - 14 Jun 2016 09:57 (Version: 16)
Printable | Top of Page | Sitemap                   
 
2001-2017 ©   HLRN Logo   Norddeutscher Verbund für Hoch- und Höchstleistungsrechnen (HLRN).     This site is powered by the TWiki collaboration platform