write down,forget
adidas eqt support ultra primeknit vintage white coming soon adidas eqt support ultra boost primeknit adidas eqt support ultra pk vintage white available now adidas eqt support ultra primeknit vintage white sz adidas eqt support ultra boost primeknit adidas eqt adv support primeknit adidas eqt support ultra boost turbo red white adidas eqt support ultra boost turbo red white adidas eqt support ultra boost turbo red adidas eqt support ultra whiteturbo adidas eqt support ultra boost off white more images adidas eqt support ultra boost white tactile green adidas eqt support ultra boost beige adidas eqt support ultra boost beige adidas eqt support refined camo drop adidas eqt support refined camo drop adidas eqt support refined running whitecamo adidas eqt support 93 primeknit og colorway ba7506 adidas eqt running support 93 adidas eqt support 93
分类 Category : Hadoop

cloudra-manager修改使用自定义源

<Category: Hadoop> Comments Off on cloudra-manager修改使用自定义源

使用cloudra-manager来管理hadoop集群,但是官方源太慢了,搭本地源呗,另外repo写死在package里面了,将包解开,修改下,替换repo仓库地址为本地源即可。
阅读这篇文章的其余部分 »

本文来自: cloudra-manager修改使用自定义源

clouderaCDH3国内源

<Category: Hadoop> Comments Off on clouderaCDH3国内源

贡献一个cloudra CDH3 国内源 #如何使用呢?

阅读这篇文章的其余部分 »

本文来自: clouderaCDH3国内源

how 2 run hadoop streaming job over brisk

<Category: Hadoop> Comments Off on how 2 run hadoop streaming job over brisk

–error—
[root@platformD testmr]# ./job.sh
rmr: cannot remove /test_output: No such file or directory.
File: /tmp/testmr/-Dbrisk.job.tracker=10.129.6.36:8012 does not exist, or is not readable

阅读这篇文章的其余部分 »

本文来自: how 2 run hadoop streaming job over brisk

brisk调试部署全纪录

<Category: cassandra, Hadoop, nosql> Comments Off on brisk调试部署全纪录

brisk快速测试记录。
参考链接:
http://www.datastax.com/docs/0.8/brisk/about_pig
阅读这篇文章的其余部分 »

本文来自: brisk调试部署全纪录

流计算是什么东东?

<Category: Hadoop, 分布式> Comments Off on 流计算是什么东东?

 

貌似现在正在流行流计算,流计算或流式计算主要用来做实时数据分析,如实时交易数据,广告,查询等,

我们知道一般用Hadoop来做离线分析都需要一定的延时,并且必须等数据收集处理完等一系列若干的操作,等报告结果出来之后,黄花菜都凉了,而流计算则刚好填补这一块的空白,流计算对正在发生的事件产生的数据进行实时分析,而FlumeBase就是这样一个项目,它建立在Flume(cloudra的分布式日志收集系统)之上,并提供类sql的查询方式(rtsql)。

Flumebase允许用户动态的插入查询到flume日志收集环境,这些查询请求会对进来的日志进行抽查处理,只要是符合查询条件的,就会进行相应的处理,如持续监控、数据格式转换、过滤等各种任务。

https://github.com/cloudera/flume

https://github.com/flumebase/flumebase

http://blog.flumebase.org/?p=14

http://flumebase.org/documentation/0.2.0/UserGuide.html#d0e7

http://www.docin.com/p-152156266.html

类似的开源流计算框架还有yahoo的s4,s4貌似比flume要成熟不少,不过都值得关注。

http://s4.io/

s4最开始是为yahoo个性化广告产品而开发的一个产品,号称能够每秒处理上千个事件。http://docs.s4.io/manual/overview.html

本文来自: 流计算是什么东东?

Hadoop and MapReduce: Big Data Analytics [gartner]

<Category: Hadoop> Comments Off on Hadoop and MapReduce: Big Data Analytics [gartner]

收藏,下载地址:http://dl.medcl.com/get.php?id=29&path=books%2Fgartner%2CHadoop+and+MapReduce+Big+Data+Analytics.7z

阅读这篇文章的其余部分 »

本文来自: Hadoop and MapReduce: Big Data Analytics [gartner]

Hive derby lock及目录权限错误

<Category: Hadoop> Comments Off on Hive derby lock及目录权限错误

FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
NestedThrowables:
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Hive history file=/tmp/dev/hive_job_log_dev_201107062337_381665684.txt
FAILED: Error in semantic analysis: line 1:83 Exception while processing raw_daily_stats_table: Unable to fetch table raw_daily_stats_table

查看hive配置文件/etc/hive/conf/hive-default.xml,找到你的元数据存放位置

打开hdfs目录发现
/user/hive/warehouse

raw_daily_stats_table 目录的权限成root了,但是我是以dev身份执行的,

执行:

结果发现还是报,神啊

FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
NestedThrowables:
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

打开配置文件/etc/hive/conf/hive-site.xml发现如下节点

然后定位到相应目录

db.lck 干掉, dbex.lck干掉

再跑hadoop相关脚本,ok~

本文来自: Hive derby lock及目录权限错误

热门话题,时间及空目录的处理

<Category: Hadoop, Linux> Comments Off on 热门话题,时间及空目录的处理

 

先查看hadoop目录的文件数,然后再决定是不是在input里面加上该目录
[dev@platformB dailyrawdata]$  hadoop fs -ls /trendingtopics |wc -l
3

计算时间的方法
[dev@platformB dailyrawdata]$ lastdate=20110619
[dev@platformB dailyrawdata]$ echo $lastdate
20110619
[dev@platformB dailyrawdata]$ echo date --date "-d $lastdate + 1day" +"%Y%m%d"
20110620

[dev@platformB dailyrawdata]$ echo D9=date --date "now -20 day" +"%Y%m%d"
D9=20110530

 

[dev@platformB dailyrawdata]$ D1=date --date "now" +"%Y/%m/%d"
[dev@platformB dailyrawdata]$ echo $D1
2011/06/20

注:等号后面不能有空格,如下面:

[dev@platformB dailyrawdata]$ D1= date --date "now" +"%Y/%m/%d"
-bash: 2011/06/20: No such file or directory

 

拷贝今天的文件到指定目录

DAYSTR=date --date "now" +"%Y/%m/%d"

hadoop fs -copyFromLocal dailyrawdata/* /trendingtopics/data/raw/$DAYSTR

 

慢着,当目录下文件为空的时候,Hadoop Stream Job的根据你指定的Input Pattern找不到文件的时候会抛异常,结果就造成了Job的失败。

找了半天也没有找到好的办法(那个知道比较好的办法,还请不吝赐教啊),只能先判断目录是否为空,为空则将文件夹重定向到一个空文件。

#touch blank file
BLANK=”/your folder/temp/blank”
hadoop fs -touchz $BLANK

#define a function to check hdfs files
function check_hdfs_files(){

#run hdfs command to check the files
hadoop fs -ls $1 &>/dev/null

#if file match is zero
#check file exists
if  [ $? -ne 0 ]
then
eval “$2=$BLANK”
echo “can’t find any files,use blank file instead”
fi

return $?
}

 

D0=date --date "now" +"/your folder/%Y/%m/%d/${APPNAME}-${TENANT}*"
D1=date --date "now -1 day" +"/your folder/%Y/%m/%d/$APPNAME-$TENANT*"

#check file exists
check_hdfs_files $D0 “D0”
check_hdfs_files $D1 “D1”

本文来自: 热门话题,时间及空目录的处理

hadoop thrift client

<Category: Hadoop> Comments Off on hadoop thrift client

http://code.google.com/p/hadoop-sharp/
貌似不给力,pass

http://wiki.apache.org/hadoop/HDFS-APIs
http://wiki.apache.org/hadoop/MountableHDFS
http://wiki.apache.org/hadoop/Hbase/Stargate
http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfsproxy.html

统统不给力啊,走thrift吧,看了下svn,cocoa之类的都有现成的了,为啥没有c#,faint
阅读这篇文章的其余部分 »

本文来自: hadoop thrift client

Hive安装Tips

<Category: Hadoop> Comments Off on Hive安装Tips

Hive安装

下载地址
http://hive.apache.org/releases.html
阅读这篇文章的其余部分 »

本文来自: Hive安装Tips