Writing MApreduce code for counting number of records
Writing MApreduce code for counting number of records
i want to write a mapreduce code for counting number of records in given CSV file.i am not getting what to do in map and what to do in reduce how should i go about solving this can anyone suggest something?
Answer by Arnon Rotem-Gal-Oz for Writing MApreduce code for counting number of records
- Your map should emit 1 for each record read
- your combiner should emit the sum of all the "1"s it got (sub total per map)
- you reducer should emit the the grand total number of records
Answer by Niels Basjes for Writing MApreduce code for counting number of records
Your mapper must emit a fixed key ( just use a Text with the value "count") an a fixed value of 1 (same as you see in the wordcount example).
Then simply use a LongSumReducer as your reducer.
The output of your job will be a record with the key "count" and the value isthe number of records you are looking for.
You have the option of (dramatically!) improving the performance by using the same LongSumReducer as a combiner.
Answer by user3378430 for Writing MApreduce code for counting number of records
Use job.getcounters() to retrieve the values that you have incremented per record after the completion of the job. If you are using java for writing your mapreduce job then use enum for counting mechanism.
Answer by Dheeraj R for Writing MApreduce code for counting number of records
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class LineCount
{ public static class Map extends MapReduceBase implements Mapper
{ private final static IntWritable one = new IntWritable(1); private Text word = new Text("Total Lines"); public void map(LongWritable key, Text value, OutputCollector output,Reporter reporter) throws IOException { output.collect(word, one); } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(LineCount.class); conf.setJobName("LineCount"); conf.setNumReduceTasks(5); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
}
Answer by Marc for Writing MApreduce code for counting number of records
I'd just use the identity Mapper and the identity Reducer.
This is the Mapper.class and Reducer.class. Then just read the map input records
You really don't have to do any coding to get this.
Answer by Unmesha SreeVeni for Writing MApreduce code for counting number of records
Hope I have a better solution than the accepted answer.
Instead of emiting 1 for each record, why not we just increment a counter in map() and emit the incremented counter after each map task in cleanup().
The intermediate read writes can be reduced. And reducer need to only aggregate list of few values.
public class LineCntMapper extends Mapper { Text keyEmit = new Text("Total Lines"); IntWritable valEmit = new IntWritable(); int partialSum = 0; public void map(LongWritable key, Text value, Context context) { partialSum++; } public void cleanup(Context context) { valEmit.set(partialSum); context.write(keyEmit, valEmit); } }
You can find full working code here
Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 71
0 comments:
Post a Comment