楼主: ReneeBK
2933 29

Cassandra: The Definitive Guide [推广有奖]

21
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:07:28
  1. Simple Snitch

  2. By default, Cassandra uses org.apache.cassandra.locator.EndPointSnitch. It operates by simply comparing different octets in the IP addresses of each node. If two hosts have the same value in the second octet of their IP addresses, then they are determined to be in the same data center. If two hosts have the same value in the third octet of their IP addresses, then they are determined to be in the same rack. “Determined to be” really means that Cassandra has to guess based on an assumption of how your servers are located in different VLANs or subnets.
复制代码

22
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:08:08
  1. PropertyFileSnitch

  2. The org.apache.cassandra.locator.PropertyFileSnitch used to be in contrib, but was moved into the main code base in 0.7. This snitch allows you more control when using a Rack-Aware Strategy by specifying node locations in a standard key/value properties file called cassandra-rack.properties.

  3. This snitch was contributed by Digg, which uses Cassandra and regularly contributes to its development. This snitch helps Cassandra know for certain if two IPs are in the same data center or on the same rack—because you tell it that they are. This is perhaps most useful if you move servers a lot, as operations often need to, or if you have inherited an unwieldy IP scheme.

  4. The default configuration of cassandra-rack.properties looks like this:

  5. # Cassandra Node IP=Data Center:Rack
  6. 10.0.0.10=DC1:RAC1
  7. 10.0.0.11=DC1:RAC1
  8. 10.0.0.12=DC1:RAC2

  9. 10.20.114.10=DC2:RAC1
  10. 10.20.114.11=DC2:RAC1
  11. 10.20.114.15=DC2:RAC2

  12. # default for unknown nodes
  13. default=DC1:r1
复制代码

23
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:10:18
  1. Basic Write Properties

  2. There are a few basic properties of Cassandra’s write ability that are worth noting. First, writing data is very fast in Cassandra, because its design does not require performing disk reads or seeks. The memtables and SSTables save Cassandra from having to perform these operations on writes, which slow down many databases. All writes in Cassandra are append-only.

  3. Because of the database commit log and hinted handoff design, the database is always writeable, and within a column family, writes are always atomic.
复制代码

24
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:13:09

  1. Example 7-3. SlicePredicate.java
  2. package com.cassandraguide.rw;

  3. // imports omitted

  4. public class SlicePredicateExample {
  5.        
  6.         public static void main(String[] args) throws Exception {
  7.                 Connector conn = new Connector();
  8.                 Cassandra.Client client = conn.connect();
  9.                
  10.                 SlicePredicate predicate = new SlicePredicate();
  11.                 List<byte[]> colNames = new ArrayList<byte[]>();
  12.                 colNames.add("a".getBytes());
  13.                 colNames.add("b".getBytes());
  14.                 predicate.column_names = colNames;
  15.                
  16.                 ColumnParent parent = new ColumnParent("Standard1");

  17.                 byte[] key = "k1".getBytes();
  18.                 List<ColumnOrSuperColumn> results =
  19.                         client.get_slice(key, parent, predicate, ConsistencyLevel.ONE);
  20.                
  21.                 for (ColumnOrSuperColumn cosc : results) {       
  22.                         Column c = cosc.column;
  23.                         System.out.println(new String(c.name, "UTF-8") + " : "
  24.                                         + new String(c.value, "UTF-8"));
  25.                 }
  26.                                
  27.                 conn.close();
  28.                
  29.                 System.out.println("All done.");
  30.         }
  31. }
复制代码

25
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:13:56
  1. Example 7-4. GetRangeSliceExample.java
  2. package com.cassandraguide.rw;

  3. //imports omitted

  4. public class GetRangeSliceExample {
  5.        
  6.         public static void main(String[] args) throws Exception {
  7.                 Connector conn = new Connector();
  8.                 Cassandra.Client client = conn.connect();
  9.                
  10.                 System.out.println("Getting Range Slices.");
  11.                
  12.                 SlicePredicate predicate = new SlicePredicate();
  13.                 List<byte[]> colNames = new ArrayList<byte[]>();
  14.                 colNames.add("a".getBytes());
  15.                 colNames.add("b".getBytes());
  16.                 predicate.column_names = colNames;
  17.                
  18.                 ColumnParent parent = new ColumnParent("Standard1");
  19.                
  20.                 KeyRange keyRange = new KeyRange();
  21.                 keyRange.start_key = "k1".getBytes();
  22.                 keyRange.end_key = "k2".getBytes();

  23.                 //a key slice is returned
  24.                 List<KeySlice> results =
  25.                         client.get_range_slices(parent, predicate, keyRange,
  26.                                         ConsistencyLevel.ONE);
  27.                
  28.                 for (KeySlice keySlice : results) {       
  29.                         List<ColumnOrSuperColumn> cosc = keySlice.getColumns();
  30.                                                
  31.                         System.out.println("Current row: " +
  32.                                         new String(keySlice.getKey()));
  33.                        
  34.                         for (int i = 0; i < cosc.size(); i++) {       
  35.                                 Column c = cosc.get(i).getColumn();
  36.                                 System.out.println(new String(c.name, "UTF-8") + " : "
  37.                                                 + new String(c.value, "UTF-8"));
  38.                         }
  39.                 }
  40.                                
  41.                 conn.close();
  42.                
  43.                 System.out.println("All done.");
  44.         }
  45. }
复制代码

26
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:26:26
  1. Example 7-5. MultigetSliceExample.java
  2. package com.cassandraguide.rw;

  3. //imports omitted

  4. public class MultigetSliceExample {
  5.        
  6.         private static final ConsistencyLevel CL = ConsistencyLevel.ONE;
  7.        
  8.         private static final String columnFamily = "Standard1";

  9.         public static void main(String[] args) throws UnsupportedEncodingException,
  10.                         InvalidRequestException, UnavailableException, TimedOutException,
  11.                         TException, NotFoundException {

  12.                 Connector conn = new Connector();
  13.                 Cassandra.Client client = conn.connect();
  14.                
  15.                 System.out.println("Running Multiget Slice.");

  16.                 SlicePredicate predicate = new SlicePredicate();
  17.                 List<byte[]> colNames = new ArrayList<byte[]>();
  18.                 colNames.add("a".getBytes());
  19.                 colNames.add("c".getBytes());
  20.                 predicate.column_names = colNames;

  21.                 ColumnParent parent = new ColumnParent(columnFamily);

  22.                 //instead of one row key, we specify many
  23.                 List<byte[]> rowKeys = new ArrayList<byte[]>();
  24.                 rowKeys.add("k1".getBytes());
  25.                 rowKeys.add("k2".getBytes());
  26.                
  27.                 //instead of a simple list, we get a map, where the keys are row keys
  28.                 //and the values the list of columns returned for each
  29.                 Map<byte[],List<ColumnOrSuperColumn>> results =
  30.                         client.multiget_slice(rowKeys, parent, predicate, CL);
  31.                
  32.                 for (byte[] key : results.keySet()) {       
  33.                         List<ColumnOrSuperColumn> row = results.get(key);
  34.                        
  35.                         System.out.println("Row " + new String(key) + " --> ");
  36.                         for (ColumnOrSuperColumn cosc : row) {
  37.                                 Column c = cosc.column;
  38.                                 System.out.println(new String(c.name, "UTF-8") + " : "
  39.                                                 + new String(c.value, "UTF-8"));
  40.                         }
  41.                 }
  42.                                
  43.                 conn.close();
  44.                
  45.                 System.out.println("All done.");
  46.         }
  47. }
复制代码

27
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:29:07

Deleting

  1. Let’s run an example that will delete some data that we previously inserted. Note that there is no “delete” operation in Cassandra, it’s remove, and there’s really no “remove,” it’s just a write (of a tombstone flag). Because a remove operation is really a tombstone write, you still have to supply a timestamp with the operation, because if there are multiple clients writing, the highest timestamp wins—and those writes might include a tombstone or a new value. Cassandra doesn’t discriminate here; whichever operation has the highest timestamp will win.

  2. A simple delete looks like this:

  3. Connector conn = new Connector();
  4. Cassandra.Client client = conn.connect();

  5. String columnFamily = "Standard1";
  6. byte[] key = "k2".getBytes(); //this is the row key

  7. Clock clock = new Clock(System.currentTimeMillis());

  8. ColumnPath colPath = new ColumnPath();
  9. colPath.column_family = columnFamily;
  10. colPath.column = "b".getBytes();

  11. client.remove(key, colPath, clock, ConsistencyLevel.ALL);

  12. System.out.println("Remove done.");

  13. conn.close();
复制代码

28
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:32:01
  1. Batch Mutates

  2. There were many examples of using batch mutate to perform multiple inserts in Chapter 4, so I won’t rehash that here. I’ll just present an overview.

  3. To perform many insert or update operations at once, use the batch_mutate method instead of the insert method. Like a batch update in the relational world, the batch_mutate operation allows grouping calls on many keys into a single call in order to save on the cost of network round trips. If batch_mutate fails in the middle of its list of mutations, there will be no rollback, so any updates that have already occured up to this point will remain intact. In the case of such a failure, the client can retry the batch_mutate operation.
复制代码

29
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:32:56
  1. Running the Word Count Example

  2. Word count is one of the examples given in the MapReduce paper and is the starting point for many who are new to the framework. It takes a body of text and counts the occurrences of each distinct word. Here we provide some code to perform a word count over data contained in Cassandra. A working example of word count is also included in the Cassandra source download.

  3. First we need a Mapper class, shown in Example 12-1.

  4. Example 12-1. The TokenizerMapper.java class
  5. public static class TokenizerMapper extends Mapper<byte[],
  6.     SortedMap<byte[], IColumn>, Text, IntWritable> {

  7.   private final static IntWritable one = new IntWritable(1);
  8.   private Text word = new Text();
  9.   private String columnName;

  10.   public void map(byte[] key, SortedMap<byte[], IColumn> columns, Context context)
  11.     throws IOException, InterruptedException {

  12.     IColumn column = columns.get(columnName.getBytes());
  13.     String value = new String(column.value());
  14.     StringTokenizer itr = new StringTokenizer(value);

  15.     while (itr.hasMoreTokens()) {
  16.       word.set(itr.nextToken());
  17.       context.write(word, one);
  18.     }
  19.   }

  20.   protected void setup(Context context)
  21.     throws IOException, InterruptedException {

  22.     this.columnName = context.getConfiguration().get(“column_name”);
  23. }
  24. }
复制代码

30
Lisrelchen(未真实交易用户) 发表于 2017-4-15 02:33:29
  1. Cassandra Hadoop Source Package

  2. Cassandra has a Java source package for Hadoop integration code, called org.apache.cassandra.hadoop. There we find:

  3. ColumnFamilyInputFormat
  4. The main class we’ll use to interact with data stored in Cassandra from Hadoop. It’s an extension of Hadoop’s InputFormat abstract class.

  5. ConfigHelper
  6. A helper class to configure Cassandra-specific information such as the server node to point to, the port, and information specific to your MapReduce job.

  7. ColumnFamilySplit
  8. The extension of Hadoop’s InputSplit abstract class that creates splits over our Cassandra data. It also provides Hadoop with the location of the data, so that it may prefer running tasks on nodes where the data is stored.

  9. ColumnFamilyRecordReader
  10. The layer at which individual records from Cassandra are read. It’s an extension of Hadoop’s RecordReader abstract class.

  11. There are similar classes for outputting data to Cassandra in the Hadoop package, but at the time of this writing, those classes are still being finalized.
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-12 12:04