240px-Diamond warning sign (Vienna Convention style).svg.png Content of this wiki is DEPRECATED 240px-Diamond warning sign (Vienna Convention style).svg.png

Windows Compute Cluster Server 2003

Z old-wiki.siliconhill.cz
(Rozdíly mezi verzemi)
Přejít na: navigace, hledání
 
(Není zobrazeno 14 mezilehlých verzí od 2 uživatelů.)
Řádka 1: Řádka 1:
 +
[[Windows Server 2008 HPC]]
  
== Server usage ==
+
==Partneři projektu==
  
 +
Významnými partneři projektu jsou:
  
The server will be kept by the Silicon Hill club. We have rich experience with servers. Calculations will be realized in cooperation with the Department of Electroenergetics of the Faculty of Electrical Engineering at CTU. In the future we will collaborate with other departments and faculties. We will also develop applications for computing clusters, such as codecs, with a support of the Department of Telecommunications Engineering.
 
  
 +
[[Soubor:Microsoft.png]]
  
== Computing tasks ==
+
Velký dík společnosti Microsoft patří za poskytnutí licencí na software a hlavně za zapůjčení Clusteru Tyan PSC T-630
  
  
=== Department of Electroenergetics projects ===
+
[[Soubor:Intel.png]]
  
'''Mathematical modeling of coupled problems of thermal and electromagnetic field in non-linear materials with hysteresis. Numerical algorithms design for there methods.'''
+
Společnost Intel propůjčila stroj, který slouží jako Terminal Server. Zároveň společnost Intel poskytla balík vývojových nástrojů.
  
  
Mathematical modeling of simultaneous physical fields (electromagnetic, thermal, stress etc.) is one of actual tasks of present research. For so called linear materials there are
+
==O systému==
algorithms known and commercial SW available. Unfortunately ferrous alloys, very important for nearly any device, have the dependence of flux density on the intensity of magnetic field nonlinear, or, for field calculation worse, ambiguous, and the dependence is so called hysteresis curve. The research of mathematical and numerical models and algorithms is important for better understanding of the behavior of transformers, ballasts, induction-heating systems etc. These models and algorithms are studied at the Department of Electrical Power Engineering, the Faculty of Electrical Engineering, Czech Technical University in Prague.
+
The aim of the research is to study the accuracy of developed methods and their computing speed.
+
  
'''Research of effective methods of fault location using H-matrices.'''
+
Celý systém se skládá ze dvou počítačů. Jeden slouží pro uživatele jako Terminal Server (itest.siliconhill.cz), kde jsou k dispozici vývojové nástroje firmy Intel a Microsoft. Druhý slouží jako výpočetní cluster (mscluster.siliconhill.cz).
  
In the case of a fault in the power grid the position of the fault has to be found as soon as possible and the error of the position assessment should be minimal. Present fault locators use relatively old algorithms and the inaccuracy of the position can be several kilometers.
 
Nowadays synchronous measurement of phasors is possible and so new methods of fault positioning can be used.
 
The proposed use of H-matrices is in fact a problem of discrete (the position of fault) and complex continuous (impedance of the fault) optimization problem and for practical tests of developed algorithms the computation speed is an essential problem.
 
  
'''Mathematical modeling of flooded books drying.'''
+
=== ITEST ===
  
It is a sad fact that floods are not rare phenomena in the Czech Republic. Relatively high number of old books damaged by floods is waiting for preservation (drying, disinfection, acclimatization etc.) in deep freezers. Mr. Kyncl and Mr. Kubin from Department of Electrical Power Engineering worked on the design of multi-purpose vacuum chamber for preservation of the books.
+
* OS: Microsoft Windows Server R2 2003 x64, Enterprise Edition
Processes in the books while preservation consists mainly of diffusion and heat transfer. The aim is to prove developed simulation SW and determine optimal heating power and the time of the preservation process.
+
* SW: Microsoft Visual Studio 2005 Team Suite Edition, Intel Software Development Tools (Fortran a C++ compilery, V-Tune analyzer, Thread Checker ... )
  
 +
* CPU 2x QuadCore Intel Xeon E5320, 1866 MHz (7 x 267)
 +
* Motherboard [http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DB8.cfm Supermicro X7DB8](3 PCI-E x8, 3 PCI-X, 1 SEPC, 8 FB-DIMM, Video, Dual Gigabit LAN, SCSI), Chipset Intel Blackford 5000P
 +
* 8GB RAM ECC DDR2-667 (333 MHz)
 +
* [http://www.wdc.com/en/products/Products.asp?DriveID=138 WD Caviar SE 120GB]
  
 +
[[Soubor:TyanPSC.JPG|thumb|right|MSCLUSTER, Tyan PSC T-630]]
  
=== The FAKE GAME project ===
+
=== MSCLUSTER ===
 +
[[Soubor:TyanPSC parametry.JPG|thumb|right|Tyan PSC T-630 parametry]]
  
 +
Tyan PSC T-630
  
'''Overview'''
+
* 5 nodů - 1 head, 4 compute
+
* OS: 1x Microsoft Windows Compute Cluster Server 2003, 4x Microsoft Windows Server Standard
Keywords like data mining (DM) and knowledge discovery (KD) appear in several thousands of articles in recent time. Such popularity is driven mainly by demand of private companies. They need to analyze their data effectively to get some new useful knowledge that can be capitalized. This process is called knowledge discovery and data mining is a crucial part of it. Although several methods and algorithms for data mining have been developed, there are still a lot of gaps to fill. The problem is that real world data are so diverse that no universal algorithm has been developed to mine all data effectively. Also stages of the knowledge discovery process need the full time assistance of an expert on data preprocessing, data mining and the knowledge extraction.
+
  
These problems can be solved by a KD environment capable of automatical data preprocessing, generating regressive, predictive models and classifiers, automatical identification of interesting relationships in data (even in complex and high-dimensional ones) and presenting discovered knowledge in a comprehensible form. In order to develop such environment, this thesis focuses on the research of methods in the areas of data preprocessing, data mining and information visualization.
+
* CPU: 10x Intel DualCore Xeon E5100, 2330 MHz
 +
* 10GB RAM
  
The Group of Adaptive Models Evolution (GAME) is data mining engine able to adapt itself and perform optimally on big (but still limited) group of realworld data sets. The Fully Automated Knowledge Extraction using GAME (FAKE GAME) framework is proposed to automate the KD process and to eliminate the need for the assistance of data mining expert.
 
  
The GAME engine is the only GMDH type algorithm capable of solving very complex problems (as demonstrated on the Spiral data benchmarking problem). It can handle irrelevant inputs, short and noisy data samples. It uses an evolutionary algorithm to find optimal topology of models. Ensemble techniques are employed to estimate quality and credibility of GAME models.
 
  
Within the FAKE framework we designed and implemented several modules for data preprocessing, knowledge extraction and for visual knowledge discovery.
 
  
  
'''Goals'''
 
  
We are developing the open source software FAKE GAME. This software should be able to automatically preprocess various data, to generate regressive, predictive models and classifiers (by means of GAME engine), to automatically identify interesting relationships in data (even in high-dimensional ones) and to present discovered knowledge in a comprehensible form. The software should fill gaps which are not covered by existing open source data mining environments WEKA and YALE.
 
  
  
'''Experiments on the cluster'''
+
== Jak na to==
  
We currently lack computational resources for experiments with various optimization methods applied to adjust parameters of GAME units. These methods, particularly nature inspired methods such as Continuous Ant Colony Optimization, Particle Swarm Optimization, etc. are very demanding and several days are needed to finish problems of medium complexity on standard computers. With cluster of 32 cores we can run our processes in several configurations and the best configuration option can be recognized in fraction of time required at present.  
+
Každý uživatel může přistoupit pouze na stroj itest.siliconhill.cz přes vzdálenou plochu (remote desktop, mstsc ...). Dostane se tak k mocným nástrojům firmy Microsoft a Intel. Na tomto stroji si uživatel může postavit svoji aplikaci pro Cluster.
  
We also plan to design novel methods of parallelization. Special "Niching" genetic algorithm is used in GAME and should be parallelized. We will also explore capability to maintain diversity in populations distributed over the cluster.  
+
Každému uživateli se po přihlášení automaticky připojí síťová jednotka '''I:''' , která je umístěná na Clusteru. Síťová cesta je '''\\mscluster\username$\''' . Zde je vhodné ukládat data, na které bude cluster přistupovat.
  
The cluster can be utilized also for evolution of neural networks (NEAT approach) and for computational experiments within the course Neural Networks and Neurocomputers at Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague
+
Pokud máte postavenou aplikaci a nakopírovanou ve vaší síťové jednotce, je potom možné pomocí nástroje Compute Cluster Job Manager spustit úlohu přímo na clusteru (MSCLUSTER.SH.CVUT.CZ). Jak na to najdete [http://technet2.microsoft.com/windowsserver/en/technologies/featured/ccs/default.mspx tady]a konkrétněji [http://technet2.microsoft.com/windowsserver/en/library/dce5123f-8af4-47c2-9192-9075998e24c71033.mspx?mfr=true tady].
  
For more information about FAKE GAME project you can look into Mr. Kordik [http://neuron.felk.cvut.cz/~kordikp/dissertation/kordik-dissertation-fake-game.pdf thesis].
+
== Příklad ==
  
 +
Malý příklad: mám v C++ naprogramovanou úlohu, která používá MPI-2, a chci ji pustit na clusteru na zkoušku třikrát, jednou na jednom procesoru (třeba pro srovnání) a potom na 10 a na 20 procesorech. V terminologii CCS to tedy budou tři "tasky", které poběží po sobě v rámci jednoho "jobu".
  
 +
1. '''Přeložím projekt ve Visual Studiu.''' Nezapomenu přidat:
 +
* do "Additional Include Directories" adresář "C:\Program Files\Microsoft Compute Cluster Pack\Include"
 +
* do "Additional Library Directories" adresář "C:\Program Files\Microsoft Compute Cluster Pack\Lib\i386" (příp jiný binární target..)
 +
* do "Additional Dependencies" knihovnu "msmpi.lib"
  
=== Physics projects ===
+
2. Překladem vznikla binárka '''"mpitest.exe"''', umístím ji do síťového adresáře '''"i:\"''' (namapovaný share '''"\\mscluster\user$\"'''). To bude pracovní adresář mé MPI distribuované aplikace, nakopíruji tam tedy i případné vstupní soubory. V našem příkladu nebudu vstupy používat, udělalo by se to celkem snadno třeba přes '''stdin'''..
  
 +
3. Pro '''spouštění jobů a tasků''' se dá použít GUI nástroj '''Computer Cluster Job Manager''', já tu ale popíšu použití comman-line příkazu '''job''' (pozor, ve skutečnosti je to batch, z jiných batchů se musí volat přes "call"). Referenci všech command-line příkazů "job manageru" najdete [http://technet2.microsoft.com/windowsserver/en/library/dce5123f-8af4-47c2-9192-9075998e24c71033.mspx?mfr=true zde].
 +
* založené nového jobu:
 +
  job new /scheduler:mscluster /numprocessors:20
 +
  ... vypíše JobID ID, které budu používat dále:
 +
* vložení tří na sobě závislých tasků:
 +
  job add ID /name:1          /numprocessors:1  /workdir:\\mscluster\user$\ /stdout:out01.txt /scheduler:mscluster mpiexec mpitest.exe
 +
  job add ID /name:2 /depend:1 /numprocessors:10 /workdir:\\mscluster\user$\ /stdout:out10.txt /scheduler:mscluster mpiexec mpitest.exe
 +
  job add ID /name:3 /depend:2 /numprocessors:20 /workdir:\\mscluster\user$\ /stdout:out20.txt /scheduler:mscluster mpiexec mpitest.exe
 +
* spuštění jobu:
 +
  job submit /id:ID /scheduler:mscluster
 +
Přes GUI Job Manager pak mohu on-line sledovat, jak se moje úloha zpracovává. Nakonec v adresáři
 +
'''"i:\"''' najdu tři výstupní soubory "out01.txt", "out10.txt" a "out20.txt".
  
In cooperation with CTU, Faculty of Electrical Engineering, Department of Physics, we have a unique opportunity to use the cluster in the following areas:
+
== linky ==
  
The most time and memory consuming calculations are molecular dynamics simulations, in which particle-particle interaction is evaluated and therefore the simulations are of N2 complexity. Such simulations are used for example in material science, biochemistry, biophysics, meteorology and cosmology. In physics, molecular dynamics is used to examine the dynamics of atomic-level phenomena that cannot be observed directly. Such simulations can be done only on huge computer clusters.
 
  
Another robust calculations requiring parallel computations are algorithms of the order N log N, such as tree codes or Particle in Cell methods commonly used for example in plasma physics for simulation of nonlinear wave phenomena, onset of helical and turbulent structures, magnetic reconnection processes, etc. The same
+
*[http://technet2.microsoft.com/windowsserver/en/technologies/featured/ccs/default.mspx Technet]
methods are used for simulations of sea streams, time evolution of structures of many particles such dust storms, spiral arms in galaxies, and others.
+
*[http://www.microsoft.com/windowsserver2003/ccs/default.aspx CCS Home]
 +
*[http://windowshpc.net/ CCS Community]
  
== Software ==
+
== Resene projekty ==
  
===gridMathematica ===
+
[[Category:Projekty]]
In cooperation with the Department of Electroenergetics of CTU we will receive gridMathematica 2 software from WolframResearch. It is scientific computing software that is used by top universities all around the world. It is delivers an optimized parallel Mathematica environment for modern multiprocessor machines, clusters, grids, and supercomputers.
+
 
+
 
+
===Windows Compute Cluster Server 2003===
+
An important partner in this project is the Microsoft Company, which will provide this project with its state-of-the-art system for computing clusters.
+
 
+
Windows Compute Cluster Server 2003 can be easily and quickly deployed using standard Windows deployment technologies, and additional compute nodes can be added to the compute cluster by simply plugging in the nodes and connecting them. The Microsoft Message Passing Interface (MS-MPI) implementation is fully compatible with the reference MPICH2. Integration with Active Directory enables role-based security for administration and users, and the use of Microsoft Management Console provides a familiar administrative and scheduling interface.
+
 
+
 
+
This diagram represents a typical Windows Compute Cluster Server 2003 network.
+
 
+
[[Soubor:Cce overview 1.jpg]]
+
 
+
'''Core Technologies'''
+
 
+
Windows Compute Cluster Server 2003 supports the following core technologies:
+
 
+
• x64-based host and cluster nodes
+
+
• Message Passing Interface v2 (MPI2)
+
+
• Gigabit Ethernet, Ethernet over Remote Direct Memory Access (RDMA), Infiniband, and Myrinet networking technologies
+
+
• Third-party compilers and libraries
+
+
 
+
 
+
http://www.microsoft.com/windowsserver2003/ccs/overview.mspx
+
 
+
== Staff ==
+
 
+
 
+
The solution of this project will involve:
+
 
+
 
+
For CTU:
+
 
+
Doc. Dr. Ing. Jan Kyncl
+
 
+
Ing. Petr Kubín
+
 
+
Ing. Tomáš Novotný
+
 
+
CTU-FEE, Department of Electroenergetics (13115)
+
 
+
 
+
Ing. Pavel Kordík
+
 
+
CTU-FEE, Department of Computer Science and Engineering (13136)
+
 
+
 
+
prof. RNDr. Petr Kulhánek, CSc.
+
 
+
CTU-FEE, Department of Physics (13102)
+
 
+
 
+
For Silicon Hill:
+
 
+
Jaromír Kašpar
+
 
+
Zbyněk Čech
+
 
+
Jan Fleišmann
+
 
+
 
+
For Microsoft:
+
 
+
Dr. Dalibor Kačmář
+
 
+
Ing. Jan Toman
+
 
+
 
+
For HP:
+
 
+
Jan Kučera
+
 
+
 
+
For Intel:
+
 
+
MuDr. Pavel Kubů
+

Aktuální verze z 19. 3. 2012, 23:17

Windows Server 2008 HPC

Obsah

[editovat] Partneři projektu

Významnými partneři projektu jsou:


Microsoft.png

Velký dík společnosti Microsoft patří za poskytnutí licencí na software a hlavně za zapůjčení Clusteru Tyan PSC T-630


Intel.png

Společnost Intel propůjčila stroj, který slouží jako Terminal Server. Zároveň společnost Intel poskytla balík vývojových nástrojů.


[editovat] O systému

Celý systém se skládá ze dvou počítačů. Jeden slouží pro uživatele jako Terminal Server (itest.siliconhill.cz), kde jsou k dispozici vývojové nástroje firmy Intel a Microsoft. Druhý slouží jako výpočetní cluster (mscluster.siliconhill.cz).


[editovat] ITEST

  • OS: Microsoft Windows Server R2 2003 x64, Enterprise Edition
  • SW: Microsoft Visual Studio 2005 Team Suite Edition, Intel Software Development Tools (Fortran a C++ compilery, V-Tune analyzer, Thread Checker ... )
  • CPU 2x QuadCore Intel Xeon E5320, 1866 MHz (7 x 267)
  • Motherboard Supermicro X7DB8(3 PCI-E x8, 3 PCI-X, 1 SEPC, 8 FB-DIMM, Video, Dual Gigabit LAN, SCSI), Chipset Intel Blackford 5000P
  • 8GB RAM ECC DDR2-667 (333 MHz)
  • WD Caviar SE 120GB
MSCLUSTER, Tyan PSC T-630

[editovat] MSCLUSTER

Tyan PSC T-630 parametry

Tyan PSC T-630

  • 5 nodů - 1 head, 4 compute
  • OS: 1x Microsoft Windows Compute Cluster Server 2003, 4x Microsoft Windows Server Standard
  • CPU: 10x Intel DualCore Xeon E5100, 2330 MHz
  • 10GB RAM





[editovat] Jak na to

Každý uživatel může přistoupit pouze na stroj itest.siliconhill.cz přes vzdálenou plochu (remote desktop, mstsc ...). Dostane se tak k mocným nástrojům firmy Microsoft a Intel. Na tomto stroji si uživatel může postavit svoji aplikaci pro Cluster.

Každému uživateli se po přihlášení automaticky připojí síťová jednotka I: , která je umístěná na Clusteru. Síťová cesta je \\mscluster\username$\ . Zde je vhodné ukládat data, na které bude cluster přistupovat.

Pokud máte postavenou aplikaci a nakopírovanou ve vaší síťové jednotce, je potom možné pomocí nástroje Compute Cluster Job Manager spustit úlohu přímo na clusteru (MSCLUSTER.SH.CVUT.CZ). Jak na to najdete tadya konkrétněji tady.

[editovat] Příklad

Malý příklad: mám v C++ naprogramovanou úlohu, která používá MPI-2, a chci ji pustit na clusteru na zkoušku třikrát, jednou na jednom procesoru (třeba pro srovnání) a potom na 10 a na 20 procesorech. V terminologii CCS to tedy budou tři "tasky", které poběží po sobě v rámci jednoho "jobu".

1. Přeložím projekt ve Visual Studiu. Nezapomenu přidat:

  • do "Additional Include Directories" adresář "C:\Program Files\Microsoft Compute Cluster Pack\Include"
  • do "Additional Library Directories" adresář "C:\Program Files\Microsoft Compute Cluster Pack\Lib\i386" (příp jiný binární target..)
  • do "Additional Dependencies" knihovnu "msmpi.lib"

2. Překladem vznikla binárka "mpitest.exe", umístím ji do síťového adresáře "i:\" (namapovaný share "\\mscluster\user$\"). To bude pracovní adresář mé MPI distribuované aplikace, nakopíruji tam tedy i případné vstupní soubory. V našem příkladu nebudu vstupy používat, udělalo by se to celkem snadno třeba přes stdin..

3. Pro spouštění jobů a tasků se dá použít GUI nástroj Computer Cluster Job Manager, já tu ale popíšu použití comman-line příkazu job (pozor, ve skutečnosti je to batch, z jiných batchů se musí volat přes "call"). Referenci všech command-line příkazů "job manageru" najdete zde.

  • založené nového jobu:
 job new /scheduler:mscluster /numprocessors:20
  ... vypíše JobID ID, které budu používat dále:
  • vložení tří na sobě závislých tasků:
 job add ID /name:1           /numprocessors:1  /workdir:\\mscluster\user$\ /stdout:out01.txt /scheduler:mscluster mpiexec mpitest.exe
 job add ID /name:2 /depend:1 /numprocessors:10 /workdir:\\mscluster\user$\ /stdout:out10.txt /scheduler:mscluster mpiexec mpitest.exe
 job add ID /name:3 /depend:2 /numprocessors:20 /workdir:\\mscluster\user$\ /stdout:out20.txt /scheduler:mscluster mpiexec mpitest.exe
  • spuštění jobu:
 job submit /id:ID /scheduler:mscluster

Přes GUI Job Manager pak mohu on-line sledovat, jak se moje úloha zpracovává. Nakonec v adresáři "i:\" najdu tři výstupní soubory "out01.txt", "out10.txt" a "out20.txt".

[editovat] linky

[editovat] Resene projekty

Jmenné prostory

Varianty
Akce