write down,forget

Cassandra 0.7 蓄势待发

<Category: nosql, 分布式, 小道消息> 查看评论

0.7beta已经出了,0.7的正式发布估计也是很快了,我们先来看看这次0.7带来了什么新的变化吧。

1.支持二级索引,superColumn下的columns终于可以直接查询了,(10万Columns记录SuperColumn读取居然要10多分钟,汗!)
2.row记录压缩时不需要读入到内存
3.large row的支持
4.Keyspace等相关配置可以动态修改了,不需要重启服务
5.配置文件变化storage.xml->cassandra.yaml
6.支持truncate一次清空ColumnFamily的数据
7.支持Hadoop的输出格式
8.通过rowcache提高8倍读取速度(太重要了)
9.新的数据分区方式:ByteOrderedPartitioner
10.新的数据类型:IntegerType
11.添加preload_row_cache选项
12.默认使用framed transport (早应该了)
13.range slices 查询的优化及multi_get_count的支持
14.row keys 使用 bytes类型,提高性能
15.sstable新增版本管理
等等。。
还有一些其他的优化或调整可以看下面的详细信息

JPA 1.0 ORM library for the Cassandra database.
,关注ing’
其他相关消息:DBTHINK‘s Cassandra Summit 2010上两个不错的ppt
消息来源:https://svn.apache.org/repos/asf/cassandra/trunk/NEWS.txt
0.7.0

=====

Features
——–
– Secondary indexes (indexes on column values) are now supported
– Row size limit increased from 2GB to 2 billion columns. rows
are no longer read into memory during compaction.
– Keyspace and ColumnFamily definitions may be added and modified live
– Streaming data for repair or node movement no longer requires
anticompaction step first
– NetworkTopologyStrategy (formerly DatacenterShardStrategy) is ready for
use, enabling ConsitencyLevel.DCQUORUM and DCQUORUMSYNC. See comments
in cassandra.yaml.
– Optional per-Column time-to-live field allows expiring data without
have to issue explicit remove commands
truncate thrift method allows clearing an entire ColumnFamily at once
– Hadoop OutputFormat support
– Up to 8x faster reads from row cache
– A new ByteOrderedPartitioner supports bytes keys with arbitrary content,
and orders keys by their byte value. This should be used in new
deployments instead of OrderPreservingPartitioner.
– Optional round-robin scheduling between keyspaces for multitenant
clusters
– Dynamic endpoint snitch mitigates the impact of impaired nodes
– New IntegerType, faster than LongType and allows integers of
both less and more bits than Long’s 64

Upgrading
———
The Thrift API has changed in incompatible ways; see below, and refer
to http://wiki.apache.org/cassandra/ClientOptions for a list of
higher-level clients that have been updated to support the 0.7 API.

The Cassandra inter-node protocol is incompatible with 0.6.x releases,
meaning you will have to bring your cluster down prior to upgrading;
you cannot mix 0.6 and 0.7 nodes.

Keyspace and ColumnFamily definitions are stored in the system
keyspace, rather than the configuration file.

The process to upgrade is:
1) run “nodetool drain” on _each_ 0.6 node. When drain finishes (log
message “Node is drained” appears), stop the process.
2) Convert your storage-conf.xml to the new cassandra.yaml using
“bin/config-converter”.
3) Stand up your cluster with the 0.7 version.
4) Initialize your Keyspace and ColumnFamily definitions using
“bin/schematool import”. _You only need to do
this to one node_.

Thrift API
———-
– Row keys are now bytes: keys stored by versions prior to 0.7.0 will be
returned as UTF-8 encoded bytes. OrderPreservingPartitioner and
CollatingOrderPreservingPartitioner continue to expect that keys contain
UTF-8 encoded strings, but RandomPartitioner now works on any key data.
– i64 timestamps have been replaced with the Clock struct.
– keyspace parameters have been replaced with the per-connection
set_keyspace method.
– The return type for login() is now AccessLevel.
– The get_string_property() method has been removed.
– The get_string_list_property() method has been removed.

Configuraton
————
– Configuration file renamed to cassandra.yaml and log4j.properties to
log4j-server.properties
– The ThriftAddress and ThriftPort directives have been renamed to
RPCAddress and RPCPort respectively.
– EndPointSnitch was renamed to RackInferringSnitch. A new SimpleSnitch
has been added.
– RackUnawareStrategy and RackAwareStrategy have been renamed to
SimpleStrategy and OldNetworkTopologyStrategy, respectively.
– RowWarningThresholdInMB replaced with in_memory_compaction_limit_in_mb
– GCGraceSeconds is now per-ColumnFamily instead of global

JMX

– StreamingService moved from o.a.c.streaming to o.a.c.service
– GMFD renamed to GOSSIP_STAGE
– {Min,Mean,Max}RowCompactedSize renamed to {Min,Mean,Max}RowSize
since it no longer has to wait til compaction to be computed

Other
—–
– If extending AbstractType, make sure you follow the singleton pattern
followed by Cassandra core AbstractType classes: provide a public
static final variable called ‘instance’.

详细的Changes,From:https://svn.apache.org/repos/asf/cassandra/trunk/CHANGES.txt
dev(开发中的)
* remove cassandra.yaml dependency from Hadoop and Pig (CASSADRA-1322)
* expose CfDef metadata in describe_keyspaces (CASSANDRA-1633)
* restore use of mmap_index_only option (CASSANDRA-1241)
* dropping a keyspace with no column families generated an error
(CASSANDRA-1378)
* rename RackAwareStrategy to OldNetworkTopologyStrategy, RackUnawareStrategy
to SimpleStrategy, DatacenterShardStrategy to NetworkTopologyStrategy,
AbstractRackAwareSnitch to AbstractNetworkTopologySnitch (CASSANDRA-1392)
* merge StorageProxy.mutate, mutateBlocking (CASSANDRA-1396)
* faster UUIDType, LongType comparisons (CASSANDRA-1386, 1393)
* fix setting read_repair_chance from CLI addColumnFamily (CASSANDRA-1399)
* fix updates to indexed columns (CASSANDRA-1373)
* fix race condition leaving to FileNotFoundException (CASSANDRA-1382)
* fix sharded lock hash on index write path (CASSANDRA-1402)
* add support for GT/E, LT/E in subordinate index clauses (CASSANDRA-1401)
* cfId counter got out of sync when CFs were added (CASSANDRA-1403)
* less chatty schema updates (CASSANDRA-1389)
* rename column family mbeans. ‘type’ will now include either
‘IndexColumnFamilies’ or ‘ColumnFamilies’ depending on the CFS type.
(CASSANDRA-1385)
* disallow invalid keyspace and column family names. This includes name that
matches a ‘^w+’ regex. (CASSANDRA-1377)

0.7-beta1
* sstable versioning (CASSANDRA-389)
* switched to slf4j logging (CASSANDRA-625)
* add (optional) expiration time for column (CASSANDRA-699)
* access levels for authentication/authorization (CASSANDRA-900)
* add ReadRepairChance to CF definition (CASSANDRA-930)
* fix heisenbug in system tests, especially common on OS X (CASSANDRA-944)
* convert to byte[] keys internally and all public APIs (CASSANDRA-767)
* ability to alter schema definitions on a live cluster (CASSANDRA-44)
* renamed configuration file to cassandra.xml, and log4j.properties to
log4j-server.properties, which must now be loaded from
the classpath (which is how our scripts in bin/ have always done it)
(CASSANDRA-971)
* change get_count to require a SlicePredicate. create multi_get_count
(CASSANDRA-744)
* re-organized endpointsnitch implementations and added SimpleSnitch
(CASSANDRA-994)
* Added preload_row_cache option (CASSANDRA-946)
* add CRC to commitlog header (CASSANDRA-999)
* removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065)
* add truncate thrift method (CASSANDRA-531)
* http mini-interface using mx4j (CASSANDRA-1068)
* optimize away copy of sliced row on memtable read path (CASSANDRA-1046)
* replace constant-size 2GB mmaped segments and special casing for index
entries spanning segment boundaries, with SegmentedFile that computes
segments that always contain entire entries/rows (CASSANDRA-1117)
* avoid reading large rows into memory during compaction (CASSANDRA-16)
* added hadoop OutputFormat (CASSANDRA-1101)
* efficient Streaming (no more anticompaction) (CASSANDRA-579)
* split commitlog header into separate file and add size checksum to
mutations (CASSANDRA-1179)
* avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219)
* revise HH schema to be per-endpoint (CASSANDRA-1142)
* add joining/leaving status to nodetool ring (CASSANDRA-1115)
* allow multiple repair sessions per node (CASSANDRA-1190)
* optimize away MessagingService for local range queries (CASSANDRA-1261)
* make framed transport the default so malformed requests can’t OOM the
server (CASSANDRA-475)
* significantly faster reads from row cache (CASSANDRA-1267)
* take advantage of row cache during range queries (CASSANDRA-1302)
* make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276)
* keep persistent row size and column count statistics (CASSANDRA-1155)
* add IntegerType (CASSANDRA-1282)
* page within a single row during hinted handoff (CASSANDRA-1327)
* push DatacenterShardStrategy configuration into keyspace definition,
eliminating datacenter.properties. (CASSANDRA-1066)
* optimize forward slices starting with ” and single-index-block name
queries by skipping the column index (CASSANDRA-1338)
* streaming refactor (CASSANDRA-1189)
* faster comparison for UUID types (CASSANDRA-1043)
* secondary index support (CASSANDRA-749 and subtasks)

本文来自: Cassandra 0.7 蓄势待发