write down,forget
adidas eqt support ultra primeknit vintage white coming soon adidas eqt support ultra boost primeknit adidas eqt support ultra pk vintage white available now adidas eqt support ultra primeknit vintage white sz adidas eqt support ultra boost primeknit adidas eqt adv support primeknit adidas eqt support ultra boost turbo red white adidas eqt support ultra boost turbo red white adidas eqt support ultra boost turbo red adidas eqt support ultra whiteturbo adidas eqt support ultra boost off white more images adidas eqt support ultra boost white tactile green adidas eqt support ultra boost beige adidas eqt support ultra boost beige adidas eqt support refined camo drop adidas eqt support refined camo drop adidas eqt support refined running whitecamo adidas eqt support 93 primeknit og colorway ba7506 adidas eqt running support 93 adidas eqt support 93
标签 Tag : trendingtopics

热门话题,时间及空目录的处理

<Category: Hadoop, Linux> Comments Off on 热门话题,时间及空目录的处理

 

先查看hadoop目录的文件数,然后再决定是不是在input里面加上该目录
[dev@platformB dailyrawdata]$  hadoop fs -ls /trendingtopics |wc -l
3

计算时间的方法
[dev@platformB dailyrawdata]$ lastdate=20110619
[dev@platformB dailyrawdata]$ echo $lastdate
20110619
[dev@platformB dailyrawdata]$ echo date --date "-d $lastdate + 1day" +"%Y%m%d"
20110620

[dev@platformB dailyrawdata]$ echo D9=date --date "now -20 day" +"%Y%m%d"
D9=20110530

 

[dev@platformB dailyrawdata]$ D1=date --date "now" +"%Y/%m/%d"
[dev@platformB dailyrawdata]$ echo $D1
2011/06/20

注:等号后面不能有空格,如下面:

[dev@platformB dailyrawdata]$ D1= date --date "now" +"%Y/%m/%d"
-bash: 2011/06/20: No such file or directory

 

拷贝今天的文件到指定目录

DAYSTR=date --date "now" +"%Y/%m/%d"

hadoop fs -copyFromLocal dailyrawdata/* /trendingtopics/data/raw/$DAYSTR

 

慢着,当目录下文件为空的时候,Hadoop Stream Job的根据你指定的Input Pattern找不到文件的时候会抛异常,结果就造成了Job的失败。

找了半天也没有找到好的办法(那个知道比较好的办法,还请不吝赐教啊),只能先判断目录是否为空,为空则将文件夹重定向到一个空文件。

#touch blank file
BLANK=”/your folder/temp/blank”
hadoop fs -touchz $BLANK

#define a function to check hdfs files
function check_hdfs_files(){

#run hdfs command to check the files
hadoop fs -ls $1 &>/dev/null

#if file match is zero
#check file exists
if  [ $? -ne 0 ]
then
eval “$2=$BLANK”
echo “can’t find any files,use blank file instead”
fi

return $?
}

 

D0=date --date "now" +"/your folder/%Y/%m/%d/${APPNAME}-${TENANT}*"
D1=date --date "now -1 day" +"/your folder/%Y/%m/%d/$APPNAME-$TENANT*"

#check file exists
check_hdfs_files $D0 “D0”
check_hdfs_files $D1 “D1”

本文来自: 热门话题,时间及空目录的处理

搭建trendingtopics

<Category: 小道消息> Comments Off on 搭建trendingtopics

https://github.com/datawrangling/trendingtopics
https://github.com/datawrangling/spatialanalytics

搭建trendingtopics,步骤。

环境准备

配置文件

安装

如果保错:undefined local variable or method `version_requirements’
vi config/environment.rb
在开头加入:

安装mysql client和mysql gem

配置数据库连接

安装数据库

生成100条文章来做demo数据

server启动后,访问地址http://localhost:3000/

报错:

创建表 CREATE TABLE raw_daily_stats_table1 (redirect_title STRING, dates STRING, pageviews STRING, total_pageviews BIGINT, monthly_trend DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE; 加载数据 LOAD DATA INPATH ‘/home/dev/finalresult-a’ INTO TABLE raw_daily_stats_table; //文件路径为hadoop的文件路径,上面的路径对应为hdfs://platformB/home/dev/finalresult-a

加载的时候如果报加载失败,检查你的hdfs,会发现生成了一个你的文件名+_copy_1的文件,然后你load这个文件就成了。 hive> show tables > ; FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database ‘/var/lib/hive/metastore/metastore_db’, see the next exception for details. NestedThrowables: java.sql.SQLException: Failed to start database ‘/var/lib/hive/metastore/metastore_db’, se e the next exception for details. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> cat derby.log ============= begin nested exception, level (3) =========== ERROR XSDB6: Another instance of Derby may have already booted the database /var/lib/hive/ metastore/metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Un known Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknow n Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source ) 原来异常退出造成前面的访问derby进程还在,而derby是文件型的存储,每次只能一个进程打开,so,你懂的,看来生成环境使用mysql才是王道,打开配置文件hive-default.xml

hive查询及排序: select * from raw_daily_stats_table sort by monthly_trend; select * from raw_daily_stats_table sort by monthly_trend desc limit 10; http://www.fuzhijie.me/?p=377 http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin

本文来自: 搭建trendingtopics