JSON:键值对存放,键必须唯一,值可以是任意类型
{
"employees": [
{ "firstName":"Bill" , "lastName":"Gates" },
{ "firstName":"George" , "lastName":"Bush" },
{ "firstName":"Thomas" , "lastName":"Carter" }
]
}

解析:

com.alibaba.fastjson.JSONObject:JSON解析器
import com.alibaba.fastjson.JSONObject;
rawjson是一行json数据
JSONObject jsonObject = JSONObject.parseObject(rawJson);

解析实例:

JSON 转换 csv
@Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        boolean is_i__data = (line.contains("[") || line.contains("]") || line.equals("") || line == null);
        if (is_i__data) {
            return;
        }
        line = line.substring(0, line.length() - 1);
        if (!line.substring(line.length() - 2, line.length()).equals("}}")) {
            line = line + "}";
        }
        JSONObject json = JSONObject.parseObject(line);
        JSONObject detial = JSONObject.parseObject(json.getString("detail"));
        String name = json.getString("name");
        List<String> data = new ArrayList<String>();

        if (name==null||name.equals(""))
            return;
        if (detial.keySet().size()!=11)
            return;
        data.add(name);
        for(String k :detial.keySet()){
            String v = detial.getString(k);

            data.add(v.replaceAll("r|n",""));
        }


        line = StringUtils.join(data,"|");
        String[] list = line.split("|");
        if(list.length!=12){
            System.out.println(Arrays.toString(list));
            System.out.println(list.length);
        }

        context.write(key,new Text(line));
    }
CSV:分隔符存放,一条数据以换行符,一列数据以分隔符(不一定)

1cef84ed00d1b190e333cf8c0edc4993.png

解析:

在hive中,解析CSV数据
row format delimited
fields terminated by '|'
stored as textfile;
或者使用csv解析工具:OpenCSVSerde
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'escapeChar'='',
  'quoteChar'='"',
  'separatorChar'=',')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
在mysql中,解析CSV数据
fields terminated by ',' 
optionally enclosed by '"' 
lines terminated by 'rn';
Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐