<Date: 2011-11-29>
<Author: medcl>
<Category: 小道消息>
最近实在不能忍受84的抽风了,换服务器了,果然是便宜没好货,好货不便宜,在此强烈建议不买burst.net的vps,超售太严重了,服务器性能爆低,主要是磁盘,通过vmstat查看的话,cs常年在1w以上,母机一天要重启几次,一个字“烂”,算了。
快速搬家成功,几点经验,选择相同的版本,使用相同的软件,相同配置,基本上拷贝粘贴,网站、数据库什么的直接resync过来,启动,ok!新vps在网络驿站上买的,加州T2线路,速度不错,大家可以参考,这个是我的推介链接: http://member.netdak.com/aff.php?aff=016。
服务器换国内了。。。
本文来自: 逃离burst.net
<Date: 2011-11-16>
<Author: medcl>
<Category: 小道消息>
新建了2个qq群,欢迎大家一起交流elasticsearch方面的相关内容。
群1:190605846
群2:暂不开放
另外,已经刚申请了elasticsearch.cn域名,打算组建一个elasticsearch在国内的交流社区,整理收集相关的资料文档,方便新手学习elasticsearch和促进elasticsearch在国内的推广。
第一步,打算先将官方的站点的文档翻译下,毕竟目前还没有比那更完善的文档了,由于文档比较多,所以在这里希望能招募有共同想法的童鞋一起来完成这项伟大的工作。
有什么想法请留意或加QQ群吧。
you know for search , :)
本文来自: elasticsearch技术交流群,欢迎加入
<Date: 2011-08-06>
<Author: medcl>
<Category: 小道消息>

弄了个小东西来监控站点nginx status信息,没有找到现成的(轻量级的),需要的点击这里下载:NginxStatusMonitor
有简单的配置文件,两个参数:status地址,刷新时间。
<?xml version="1.0"?>
<configuration>
<appSettings>
<add key="address" value="http://medcl.net/status"/>
<add key ="interval" value="1000"/>
</appSettings>
<startup><supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0,Profile=Client"/></startup></configuration>
至于nginx怎么开启status,看这里:
http://wiki.nginx.org/HttpStubStatusModule
状态信息简单说明:
active connections -- number of all open connections including connections to backends
server accepts handled requests -- nginx accepted 16630948 connections, handled 16630948 connections (no one was closed just it was accepted), and handles 31070465 requests (1.8 requests per connection)
reading -- nginx reads request header
writing -- nginx reads request body, processes request, or writes response to a client
waiting -- keep-alive connections, actually it is active - (reading + writing)
本文来自: nginx server status monitor
<Date: 2011-06-13>
<Author: medcl>
<Category: 小道消息>
Interix(SUA) "Interix" is the name of the system when you run the command 'uname -a'. If you install SUA (Subsystem for Unix-based Applications) or SFU (Services for Unix) then Interix is still the system. Softway从96年开始发布名为OpenNT的WinNT POSIX子系统,目的是用来支持UNIX环境,99年被微软收购,改名Interix。 你可能会问,什么是SUA,Subsystem for UNIX-based Applications的简称。额。。。
C:\Users\Medcl>uname -a
Windows Medcl-THINK 6.1 SP1 genuineintel Intel64_Family_6_Model_42_Stepping_7
. There are different versions of Interix: Interix 3.5 is with SFU 3.5, Interix 5.2 is with SUA on Windows Server 2003/R2 and Interix 6.0 is with SUA on Vista and Server 2008. Interix 6.1 is with Windows 7 (Ultimate and Enterprise Editions) and Server 2008 R2.
今天我要介绍怎么像apt-get yum easy_install一样以package来安装SUA的包或者扩展,非常方便,帅啊。
http://www.suacommunity.com/pkg_install.htm
开始之前,先控制面板、打开关闭系统功能、安装SUA、安装包:Utilities and SDK for Subsystem for UNIX-based Applications(强烈建议只用administrator来装,只有administrator才能担当真正的root,其他的administrators组的都不行)
两种方式,
第一种,安装Add-on Bundles包,里面包含了很多常用的第三方的扩展包,下载地址:http://www.suacommunity.com/SUA.aspx#bundles。
完整版:ftp://ftp.interopsystems.com/pkgs/bundles/pkg-current-bundlecomplete60x64.exe
第二种,手动安装,比较麻烦,(哈哈,没有试)。
1.安装bootstrap
下载,需穿越。
win764位安装这个:ftp://ftp.interopsystems.com/pkgs/bootstrap/pkg-current-bootstrap60x64.exe
注意,这个文件默认是没有权限执行的,具体的这里不说了。
安装完之后,就可以使用pkg_update来安装软件了,cool。
pkg_update -La ,检查更新
pkg_update -L {name},安装指定软件包
pkg_update -L pkg,更新安装包本身
pkg_update -LH {name},使用http方式下载安装软件包,默认ftp
DISPLAY=localhost:0.0
bash-3.2$ pkg_update -L wget
Starting checks for updates
Percent download complete/package:
100% |**************************************************| wget
Done.
bash-3.2$ wget
wget: missing URL
Usage: wget [OPTION]... [URL]...
Try `wget --help' for more options.
bash-3.2$
相关网站:http://www.suacommunity.com/
SUA介绍视频:http://www.interopsystems.com/SUAfamiliarization-02.wmv
SUA安装视频:http://www.suacommunity.com/SUAinstallation-03.wmv
SUA安装文档:http://www.interopsystems.com/Download/Installing_SUA.pdf
本文来自: Interix(SUA)下pkg的使用
<Date: 2011-04-13>
<Author: medcl>
<Category: 小道消息, 数据仓库, 数据挖掘>
今天下午有点时间,研究了下#第一届大学生全国数据挖掘邀请赛#的数据集,顺便写个酱油贴。
数据集版权归 上海花千树信息科技有限公司 世纪佳缘交友网站 http://www.love21cn.com 所有
阅读这篇文章的其余部分 »
本文来自: 第一届全国大学生数据挖掘邀请赛-数据集分析篇
<Date: 2011-04-06>
<Author: medcl>
<Category: Python, 小道消息>
make string(ex. '3', '32') left padded with zeroes (ex. '003', '032')
HOWTO:
string1 = '32'
int2 = "%03d" % (int(string1))
print int2
本文来自: python string padding left
<Date: 2011-03-08>
<Author: medcl>
<Category: 小道消息>
https://github.com/datawrangling/trendingtopics
https://github.com/datawrangling/spatialanalytics
搭建trendingtopics,步骤。
环境准备
sudo apt-get install ruby
sudo gem install rails -include-dependenciesgem
/home/cloudera/.gem/ruby/1.8/bin
git clone git://github.com/datawrangling/trendingtopics.git
配置文件
cd trendingtopics
cp config/config.yml.example config/config.yml
cp config/database.yml.example config/database.yml
安装
如果保错:undefined local variable or method `version_requirements'
vi config/environment.rb
在开头加入:
if Gem::VERSION >= "1.3.6"
module Rails
class GemDependency
def requirement
r = super
(r == Gem::Requirement.default) ? nil : r
end
end
end
end
安装mysql client和mysql gem
//解压msyql源码包
./configure
//./configure --prefix=/usr/local/mysql
make install
//错误error: sys/ttydefaults.h: No such file or directory':
//http://phaseshiftllc.com/archives/2008/10/26/installing-mysql-gem-on-windows-cygwin-for-rails
//make distclean
//./configure --without-readline CFLAGS=-O2
//./configure --prefix /usr/local/mysql --without-server --without-readline --without-libeditCFLAGS=-O2 CFLAGS=-O2 CXXFLAGS=-O2
//make install
//gem install mysql
cp support-files/my-medium.cnf /etc/my.cnf
cd /usr/local/mysql
vi /etc/my.conf
[client] 中加入 protocol=TCP
//ref:http://www.phpvim.net/os/windows/build-mysql-client-on-cygwin.html
mysql -h localhost -u root
gem install mysql -include-dependenciesgem
mysqld_safe --user=mysql &
mysql -h localhost -u root -p
配置数据库连接
replace
socket: /tmp/mysql.sock
with
username: root
password: 555555
host: localhost
安装数据库
rake db:create
rake db:migrate
生成100条文章来做demo数据
//启动server
script/server
//报错,缺少包,执行如下
rake gems:install
gem sources -a http://gems.github.com
gem install jpignata-bossman
//再执行
script/server
server启动后,访问地址http://localhost:3000/
$ script/server
=> Booting WEBrick
=> Rails 2.3.2 application starting on http://0.0.0.0:3000
=> Call with -d to detach
=> Ctrl-C to shutdown server
[2011-03-24 13:50:11] INFO WEBrick 1.3.1
[2011-03-24 13:50:11] INFO ruby 1.8.7 (2008-08-11) [i386-cygwin]
[2011-03-24 13:50:11] INFO WEBrick::HTTPServer#start: pid=4760 port=3000
报错:
242093716 [main] bash 4404 exception::handle: Exception: STATUS_ACCESS_VIOLATION
242095173 [main] bash 4404 open_stackdumpfile: Dumping stack trace to bash.exe.s
tackdump
242305333 [main] bash 2116 exception::handle: Exception: STATUS_ACCESS_VIOLATION
242306088 [main] bash 2116 open_stackdumpfile: Dumping stack trace to bash.exe.s
tackdump
242617570 [main] bash 4032 exception::handle: Exception: STATUS_ACCESS_VIOLATION
242619190 [main] bash 4032 open_stackdumpfile: Dumping stack trace to bash.exe.s
tackdump
243121910 [main] bash 3596 exception::handle: Exception: STATUS_ACCESS_VIOLATION
243123323 [main] bash 3596 open_stackdumpfile: Dumping stack trace to bash.exe.s
tackdump
243458891 [main] bash 4968 fork: child -1 - died waiting for longjmp before init
ialization, retry 0, exit code 0x600, errno 11
bash: fork: Resource temporarily unavailable
//原因:temp放在虚拟磁盘,cygwin访问的权限不够
//如果是其他的原因可尝试如下方法:
rebaseall: only ash processes are allowed during rebasing
Exit all Cygwin processes and stop all Cygwin services.
Execute ash from Start/Run... or a cmd or command window.
Execute '/bin/rebaseall' from ash.
from:http://cygwin.com/ml/cygwin/2005-09/msg00919.html
创建表 CREATE TABLE raw_daily_stats_table1 (redirect_title STRING, dates STRING, pageviews STRING, total_pageviews BIGINT, monthly_trend DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 加载数据 LOAD DATA INPATH '/home/dev/finalresult-a' INTO TABLE raw_daily_stats_table; //文件路径为hadoop的文件路径,上面的路径对应为hdfs://platformB/home/dev/finalresult-a
hive> LOAD DATA INPATH '/home/dev/finalresult-a' INTO TABLE raw_daily_stats_table;
Loading data to table raw_daily_stats_table
OK
Time taken: 4.927 seconds
加载的时候如果报加载失败,检查你的hdfs,会发现生成了一个你的文件名+_copy_1的文件,然后你load这个文件就成了。 hive> show tables > ; FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database '/var/lib/hive/metastore/metastore_db', see the next exception for details. NestedThrowables: java.sql.SQLException: Failed to start database '/var/lib/hive/metastore/metastore_db', se e the next exception for details. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> cat derby.log ============= begin nested exception, level (3) =========== ERROR XSDB6: Another instance of Derby may have already booted the database /var/lib/hive/ metastore/metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Un known Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknow n Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source ) 原来异常退出造成前面的访问derby进程还在,而derby是文件型的存储,每次只能一个进程打开,so,你懂的,看来生成环境使用mysql才是王道,打开配置文件hive-default.xml
hive.metastore.warehouse.dir
/user/hive_remote/warehouse
hive.metastore.local
true
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost/hive_remote?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
dandan
hive查询及排序: select * from raw_daily_stats_table sort by monthly_trend; select * from raw_daily_stats_table sort by monthly_trend desc limit 10; http://www.fuzhijie.me/?p=377 http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
本文来自: 搭建trendingtopics
<Date: 2011-02-26>
<Author: medcl>
<Category: 小道消息>
Jekyll is a simple, blog aware, static site generator. It takes a template directory (representing the raw form of a website), runs it through Textile or Markdown and Liquid converters, and spits out a complete, static website suitable for serving with Apache or your favorite web server. This is also the engine behind GitHub Pages, which you can use to host your project’s page or blog right here from GitHub.
阅读这篇文章的其余部分 »
本文来自: static site generator:jekyll
<Date: 2011-02-19>
<Author: medcl>
<Category: 小道消息, 资源分享>
<Date: 2011-01-11>
<Author: medcl>
<Category: .NET, 小道消息, 搜索, 资源分享>
ElasticSeach.Client客户端更新了,支持索引模板了,cool~,下载地址:http://github.com/medcl/ElasticSearch.Net
ElasticSearch IndexTemplate帮助文档。
创建一个索引模板:
var tempkey = "test_template_key1";
var template = new TemplateSetting(tempkey);
template.Template = "business_*";//支持通配符,假设所有business开头的索引自动使用如下的索引设置(setting和mapping)
template.IndexSetting = new IndexSetting(3, 2);
var type1 = new TypeSetting("mytype") { };
type1.CreateNumField("identity", NumType.Float);
type1.CreateDateField("datetime");
var type2 = new TypeSetting("mypersontype");
type2.CreateStringField("personid");
type2.SourceSetting = new SourceSetting();
type2.SourceSetting.Enabled = false;
template.AddTypeSetting(type1);
template.AddTypeSetting(type2);
result = ElasticSearchClient.Instance.CreateTemplate(tempkey, template);
基于索引模板的创建索引(以后不需要做重复的索引Mapping操作了,yeah):
result = ElasticSearchClient.Instance.CreateIndex("business_111");//创建索引,自动获得索引设置和mapping设置
获取索引模板的信息:
var temp = ElasticSearchClient.Instance.GetTemplate(tempkey);
更多详细的操作,参照我写的测试用例,https://github.com/medcl/ElasticSearch.Net/tree/master/ElasticSearchTests。
本文来自: ElasticSeach.Client Updated!