write down,forget
  • adidaseqt
  • eqtturbored
  • eqtsupport9317
  • eqtsupport
  • 9317adidas
  • adidaseqtboost9317
  • eqtsupport93
  • 9317eqt
  • eqt support 9317 adv
  • support 9317 adv
  • eqtadv
  • eqt9317
  • eqtadv9317
  • support93
  • originalseqt
  • adidas eqt
  • eqt support 9317
  • eqt support
  • eqt adv
  • eqt 9317
  • Mahout 安装配置

    <Category: Mahout, 个性化推荐, 机器学习> 查看评论

    Apache Mahout 是一个机器学习的框架,构建在hadoop上支持大规模数据集的处理,目前最新版本0.4。

    Apache 简介
    http://www.ibm.com/developerworks/cn/java/j-mahout/

    基于 Apache Mahout 构建社会化推荐引擎
    http://www.ibm.com/developerworks/cn/java/j-lo-mahout/

    Taste:
    http://taste.sourceforge.net

    Mahout currently has

    • Collaborative Filtering
    • User and Item based recommenders
    • K-Means, Fuzzy K-Means clustering
    • Mean Shift clustering
    • Dirichlet process clustering
    • Latent Dirichlet Allocation
    • Singular value decomposition
    • Parallel Frequent Pattern mining
    • Complementary Naive Bayes classifier
    • Random forest decision tree based classifier
    • High performance java collections (previously colt collections)
    • A vibrant community
    • and many more cool stuff to come by this summer thanks to Google summer of code

    mahout安装(centos)

    cd /usr/local
    sudo mkdir mahout
    sudo svn co http://svn.apache.org/repos/asf/mahout/trunk mahout

    安装maven3
    cd /tmp
    sudo wget http://apache.etoak.com//maven/binaries/apache-maven-3.0.2-bin.tar.gz
    tar vxzf apache-maven-3.0.2-bin.tar.gz
    sudo mv apache-maven-3.0.2 /usr/local/maven

    vi ~/.bashrc

    添加如下两行
    export M3_HOME=/usr/local/maven
    export PATH=${M3_HOME}/bin:${PATH}

    执行 . ~/.bashrc,使设置生效[或者先logout,之后再login]
    查看maven版本,看是否安装成功
    mvn -version

    安装mahout
    cd /usr/local/mahout
    sudo mvn install

    如果报JAVA_HOME is not set,如果是用sudo,检查root的java设置
    vi /etc/profile
    export JAVA_HOME=/usr/local/jdk1.6/
    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    export PATH=$PATH:$JAVA_HOME/bin
    执行. /etc/profile 再执行mvn clean install -DskipTests=true //skip tests,fast build

    数据准备
    cd /tmp
    wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

    hadoop fs -mkdir testdata
    hadoop fs -put synthetic_control.data testdata
    hadoop fs -lsr testdata

    如果报HADOOP_HOME环境变量没有设置
    sudo vi /etc/profile,添加
    export HADOOP_HOME=/usr/lib/hadoop-0.20/

    hadoop集群来执行聚类算法
    cd /usr/local/mahout

    bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job
    bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
    bin/mahout org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job
    bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    bin/mahout org.apache.mahout.clustering.syntheticcontrol.meanshift.Job

    如果执行成功,在hdfs的/user/dev/output里面应该可以看到输出结果
    GroupLens Data Sets
    http://www.grouplens.org/node/12,包括MovieLens Data Sets、Wikilens Data Set、Book-Crossing Data Set、Jester Joke Data Set、EachMovie Data Set

    下载1m的rating数据

    mkdir 1m_rating
    wget http://www.grouplens.org/system/files/million-ml-data.tar__0.gz
    tar vxzf million-ml-data.tar__0.gz
    rm million-ml-data.tar__0.gz

    拷贝数据到grouplens代码的目录,我们先本地测试下mahout的威力
    cp *.dat /usr/local/mahout/examples/src/main/java/org/apache/mahout/cf/taste/example/grouplens

    cd /usr/local/mahout/examples/
    执行
    mvn -q exec:java -Dexec.mainClass=”org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner”
    如果不想做上面拷贝文件的操作,则指定输入文件位置就行,如下:
    mvn -q exec:java -Dexec.mainClass=”org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner” -Dexec.args=”-i input——file”
    上传到hdfs
    hadoop fs -copyFromLocal 1m_rating/  mahout_input/1mrating

    补充

    mahout,svn地址:https://svn.apache.org/repos/asf/mahout/trunk

    https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html

    将lucene索引数据转换成文本向量,指定索引目录~/index 字段名称Name,索引临时输出文件~/dict.txt ,最终结果输出文件路径output.txt,并限制最大向量数目50
    $/usr/local/mahout/bin/mahout lucene.vector –dir ~/index –field Name –dictOut ~/dict.txt –output output.txt –max 50 –norm 2

    查看下dict的文件内容
    $head -n dict.txt
    10225
    #term doc freq idx
    Michale 67 0
    medcl 1 1
    jack 3 2
    lopoo 2 3
    003 2 4

    由上面的数据可见,dict.txt里面是我们的指定的Name字段的索引信息

    使用taste-web来快速构建基于grouplens数据集的电影推荐系统

    $cd taste-web/
    拷贝grouplens的推荐包到taste-web的lib目录下,如果jar包还没有,转到目录执行mvn install即可
    $ cp examples/target/grouplens.jar taste-web/lib/

    taste-web]$ vi recommender.properties
    取消掉这一行的注释,配置使用grouplens的recommender,如下:
    recommender.class=org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender

    启动jetty,如果一切正常,访问8080端口,可以看到有这么个webservice,http://platformb:8080/RecommenderService.jws
    mvn jetty:run-war

    执行如下命令,查看推荐结果:http://platformb:8080/RecommenderServlet?userID=1
    看截图1,2,结果的第一列表示推荐的评分,第二项为电影的id,简单几步就完成了一个推荐功能,是不是很强悍啊。
    捕获1捕获2
    彪悍的配置文件们彪悍的配置文件们

    本文来自: Mahout 安装配置

    

    只有 4 条评论 订阅该文评论 RSS

    夏小兵

    这个还不是很明白 能介绍一下吗 QQ332817520

    medcl Reply:

    @夏小兵, 安装吗?qq已加

    晴天

    不会,你能加我qq吗 362027200 非常感谢

    heipark

    不错,靠谱文章一篇。

    eqt support adidas eqt support 93 primeknit og colorway ba7506 adidas eqt running 93 updated with primeknit construction adidas eqt boost 93 17 white turbo red adidas eqt support 9317 white turbo red adidas eqt support 93 17 adidas eqt support 9317 adidas eqt support 9317 turbo red releases tomorrow adidas originals adidas eqt tactile green pack adidas eqt tactile green pack adidas eqt light green pack womens adidas eqt light green pack coming soon adidas eqt milled leather pack release date adidas originals eqt milled leather pack adidas eqt support ultra boost turbo red white adidas adv support burnt orange grey where to buy the adidas eqt support 9317 turbo red adidas eqt boost 91 16 turbo red adidas eqt support 93 turbo red adidas eqt support 9317 white turbo red available now