2010/12/17

hadoop-streaming.jar [options]

Usage: $HADOOP_HOME/bin/hadoop jar \
          $HADOOP_HOME/hadoop-streaming.jar [options]
Options:
  -input    <path>     DFS input file(s) for the Map step
  -output   <path>     DFS output directory for the Reduce step
  -mapper   <cmd|JavaClassName>      The streaming command to run
  -combiner <JavaClassName> Combiner has to be a Java class
  -reducer  <cmd|JavaClassName>      The streaming command to run
  -file     <file>     File/dir to be shipped in the Job jar file
  -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
  -outputformat TextOutputFormat(default)|JavaClassName  Optional.
  -partitioner JavaClassName  Optional.
  -numReduceTasks <num>  Optional.
  -inputreader <spec>  Optional.
  -cmdenv   <n>=<v>    Optional. Pass env.var to streaming commands
  -mapdebug <path>  Optional. To run this script when a map task fails
  -reducedebug <path>  Optional. To run this script when a reduce task fails
  -verbose
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

For more details about these options:
Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info

2010/12/15

Pure Mpi.NET

PureMpi.NET is a completely managed implementation of the message passing interface.  The object-oriented API is simple, and easy to use for parallel programming. It has been developed based on the latest .NET technologies, including Windows Communication Foundation (WCF).  This allows you to declaratively specify the binding and endpoint configuration for your environment, and performance needs.  When using the SDK, a programmer will definitely see the MPI'ness of the interfaces come through, and will enjoy taking full advantage of .NET features - including generics, delegates, asynchronous results, exception handling, and extensibility points.  
PureMpi.NET allows you to create high performance production quality parallel systems, with all the benefits of in .NET

Hadoop?


What Is Hadoop?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes these subprojects:
  • Hadoop Common: The common utilities that support the other Hadoop subprojects.
  • HDFS: A distributed file system that provides high throughput access to application data.
  • MapReduce: A software framework for distributed processing of large data sets on compute clusters.
  • ZooKeeper: A high-performance coordination service for distributed applications.
Other Hadoop-related projects at Apache include:
  • Avro: A data serialization system.
  • Chukwa: A data collection system for managing large distributed systems.
  • HBase: A scalable, distributed database that supports structured data storage for large tables.
  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout: A Scalable machine learning and data mining library.
  • Pig: A high-level data-flow language and execution framework for parallel computation.

nic

#CentOS
#Fedora
[inetrfaces]
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
#static
DEVICE=eth0                 
BOOTPROTO=static            
IPADDR=192.168.1.1          
NETMASK=255.255.255.0       
NETWORK=192.168.1.0         
GATEWAY=192.168.1.254       
BROADCAST=192.168.1.255     
HWADDR=XX:XX:XX:XX:XX:XX  
ONBOOT=yes                  
#dhcp
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
[dns]
sudo vi /etc/resolv.conf
search domainname           //test.domainname
nameserver 168.95.1.1      
nameserver 208.67.220.220  
[up&down]
sudo ifup eth0
sudo ifdown eth0
sudo /etc/init.d/network restart
[virtual nic-CentOs]
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0:0
DEVICE=eth0:0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.2.254
NETMASK=255.255.255.0
NETWORK=192.168.2.0
BROADCAST=192.168.2.255
sudo ifup eth0:0
[virtual nic-Ubuntu]
vi /etc/network/interfaces
#static
auto eth0
iface eth0 inet static
address 192.168.1.1
netmask 255.255.255.0
gateway 192.168.1.254
#dhcp
auto eth0
iface eth0 inet dhcp
/etc/init.d/networking restart
auto eth0:0               
iface eth0:0 inet static  
address 192.168.2.254
netmask 255.255.255.0
broadcast 192.168.2.255

Red Hat Enterprise Linux Deployment Guide


Red Hat Enterprise Linux 5

Red Hat Enterprise Linux Deployment Guide


CentOS-5.5

Mono & MonoDevelop

Mono
Mono is a software platform designed to allow developers to easily create cross platform applications. Sponsored by Novell, Mono is an open source implementation of Microsoft's .NET Framework based on the ECMA standards for C# and the Common Language Runtime. A growing family of solutions and an active and enthusiastic contributing community is helping position Mono to become the leading choice for development of Linux applications.
MonoDevelop
MonoDevelop is an IDE primarily designed for C# and other .NET languages. MonoDevelop enables developers to quickly write desktop and ASP.NET Web applications on Linux, Windows and Mac OSX. MonoDevelop makes it easy for developers to port .NET applications created with Visual Studio to Linux and to maintain a single code base for all platforms.

check version

:~$uname -a
:~$lsb_release -a
:~$cat /proc/version
:~$cat /etc/issue

2010/12/13

multi-thread

java
multi-thread
UncaughtExceptionHandler
Lock & Condition
BlockingQueue
Callable & Future
Excutors

hosts.deny

#CentOS

/etc/hosts.deny
[1]
sshd: a.b.c.0/255.255.255.0
sshd: d.e.f.0/255.255.255.0
sshd: g.h.i.0/255.255.255.0
[2]
sshd: a.b.c.0/255.255.255.0, d.e.f.0/255.255.255.0, g.h.i.0/255.255.255.0
/etc/rc.d/init.d/sshd restart
# /etc/rc.d/init.d/xinetd restart
# /etc/rc.d/init.d/network restart

2010/12/12

hadoop

#ubuntu-10.10-server-amd64.iso   
#jdk-6u23-linux-x64.bin
#hadoop-0.20.2.tar.gz



[ip]

sudo vim.tiny /etc/network/interfaces


# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0

iface eth0 inet static

address 192.168.31.101
netmask 255.255.255.0
network 192.168.31.0
broadcast 192.168.31.255
gateway 192.168.31.1

dns-nameserver 168.95.1.1

[update-ubuntu]

sudo apt-get update ; sudo apt-get dist-upgrade ;

[install SSH & rsync]

sudo apt-get install ssh rsync ;

[cp]

scp /tmp/jdk-6u23-linux-x64.bin hadoop@192.168.31.101:~/

scp /tmp/hadoop-0.20.2.tar.gz hadoop@192.168.31.101:~/

[install java]

sh jdk-6u23-linux-x64.bin

[install hadoop]

tar zxvf hadoop-0.20.2.tar.gz -C /home/hadoop/

mv /home/hadoop/hadoop-0.20.2/* /home/hadoop/

sudo chown –R hadoop:hadoop /home/hadoop/

[setup java environment]

vim.tiny /home/hadoop/conf/hadoop-env.sh
>>>
export JAVA_HOME=/home/hadoop/jdk1.6.0_23

[before-clone-1/4]

vim.tiny /home/hadoop/conf/core-site.xml
>>>
<property> 
   <name>fs.default.name</name>
    <value>hdfs://hadoop01:9000</value> 
  </property>

[before-clone-2/4]
vim.tiny /home/hadoop/conf/hdfs-site.xml
>>>
<property> 
   <name>dfs.replication</name>
    <value>3</value> 
  </property>

[before-clone-3/4]
vim.tiny /home/hadoop/conf/mapred-site.xml
>>>
<property> 
   <name>mapred.job.tracker</name>
    <value>hadoop02:9001</value> 
  </property>

[before-clone-4/4]
mkdir /home/hadoop/.ssh/

[make a copy]

virt-clone \
     --original fog \
     --name hadoop18 \
     --file /var/lib/xen/images/hadoop18.img

[@masters]
sudo vim.tiny /etc/hosts

192.168.31.101 192.168.31.101 hadoop01
192.168.31.102 hadoop02
192.168.31.103 hadoop03
192.168.31.104 hadoop04
192.168.31.105 hadoop05
192.168.31.106 hadoop06
192.168.31.107 hadoop07
192.168.31.108 hadoop08

[@slaves]
sudo vim.tiny /etc/hosts

192.168.31.101 hadoop01
192.168.31.102 hadoop02

192.168.31.103 192.168.31.103 hadoop03

192.168.31.104 192.168.31.104 hadoop04

192.168.31.105 192.168.31.105 hadoop05

192.168.31.106 192.168.31.106 hadoop06

192.168.31.107 192.168.31.107 hadoop07

192.168.31.108 192.168.31.108 hadoop08

[gen public&private key]

ssh-keygen -t dsa -P "" -f /home/hadoop/.ssh/id_dsa

[cat public key]

cp /home/hadoop/.ssh/id_dsa.pub /home/hadoop/.ssh/authorized_keys

[share public key]

scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.102:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.103:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.104:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.105:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.106:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.107:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.31.108:/home/hadoop/.ssh/

[share pravite key]

scp /home/hadoop/.ssh/id_dsa hadoop@192.168.31.102:/home/hadoop/.ssh/

[@masters]

vim.tiny /home/hadoop/conf/masters

hadoop01
hadoop02

vim.tiny /home/hadoop/conf/slaves

hadoop03
hadoop04
hadoop05
hadoop06
hadoop07
hadoop08

[format]

/home/hadoop/bin/hadoop namenode -format

hadoop namenode -format

[start-start-dfs]

/home/hadoop/bin/start-dfs.sh

[start-start-mapred]

/home/hadoop/bin/start-mapred.sh

[copy file from host to guest]

scp ./pg5000.txt hadoop@192.168.31.101:~/pg5000-data

[copy file from guest to dfs]

hadoop dfs -copyFromLocal pg5000-data pg5000-data-in

[dfs-ls]

hadoop dfs -ls

[dfs-mkdir]

hadoop dfs –mkdir /tmp/input
hadoop dfs –mkdir /out

[dfs-del]

hadoop fs -rmr /folder_name or file_name 

[wordcount]

hadoop jar hadoop-0.20.2-examples.jar wordcount pg5000-data-in pg5000-data-ans

[get the file from dfs]

hadoop dfs -cat pg5000-data-ans/part-r-00000 >> pg5000-data-ans-out

[scp file from guest to host]

scp /home/hadoop/pg5000-data-ans-out hadoop@192.168.122.1:/home/hadoop/pg5000-data-ans-out



///

scp ./pg20417.txt hadoop_001@192.168.31.101:~/pg20417-data

./exec-hadoop dfs -copyFromLocal pg20417-data pg20417-data-in

./exec-hadoop dfs -ls

./exec-hadoop jar hadoop-0.20.2-examples.jar wordcount pg20417-data-in pg20417-data-ans