你会Hive表的基本操作吗？

大数据 2023-07-05 17:29:38

54阅读

文中转载微信公众平台「Java互联网大数据与数据库管理」，创作者柯同学们。转截文中请联络Java互联网大数据与数据库管理微信公众号。

1. 创建表

create table句子遵循sql语法习惯性，只不过是Hive的英语的语法更灵便。比如，能够界定表的数据信息文档存储部位，应用的储存文件格式等。

 
  create table if not exists test.user1( 
  name string comment 'name', 
  salary float comment 'salary', 
  address struct<country:string, city:string> comment 'home address' 
  ) 
  comment 'description of the table' 
  partitioned by (age int) 
  row format delimited fields terminated by '\t' 
  stored as orc;

沒有特定external关键词，则为管理方法表，跟mysql一样，if not exists假如表存有则不做实际操作，不然则新创建表。comment能够为其做注解，系统分区为age年纪，列中间分节符是\t，储存文件格式为列式储存orc，储存部位为默认设置部位，即主要参数hive.metastore.warehouse.dir(默认设置：/user/hive/warehouse)特定的hdfs文件目录。

2. 复制表

应用like能够复制一张跟原表结构一样的空表，里边是沒有数据信息的。

 
  create table if not exists test.user2 like test.user1;

3. 查询表结构

根据desc [可选主要参数] tableName指令查询表结构，能够看得出复制的表test.user1与原表test.user1的表构造是一样的。

 
  hive> desc test.user2; 
  OK 
  name                    string                  name                 
  salary                  float                   salary               
  address                 struct<country:string,city:string>  home address         
  age                     int                                          
   
  # Partition Information          
  # col_name                data_type               comment              
   
  age                     int

还可以加formatted，能够见到更为详尽和冗杂的輸出信息内容。

 
  hive> desc formatted test.user2; 
  OK 
  # col_name                data_type               comment              
   
  name                    string                  name                 
  salary                  float                   salary               
  address                 struct<country:string,city:string>  home address         
   
  # Partition Information          
  # col_name                data_type               comment              
   
  age                     int                                          
   
  # Detailed Table Information          
  Database:               test                      
  Owner:                  hdfs                      
  CreateTime:             Mon Dec 21 16:37:57 CST 2020      
  LastAccessTime:         UNKNOWN                   
  Retention:              0                         
  Location:               hdfs://nameservice2/user/hive/warehouse/test.db/user2     
  Table Type:             MANAGED_TABLE             
  Table Parameters:          
      COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"} 
      numFiles                0                    
      numPartitions           0                    
      numRows                 0                    
      rawDataSize             0                    
      totalSize               0                    
      transient_lastDdlTime   1608539877           
   
  # Storage Information          
  SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde     
  InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat   
  OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat      
  Compressed:             No                        
  Num Buckets:            -1                        
  Bucket Columns:         []                        
  Sort Columns:           []                        
  Storage Desc Params:          
      field.delim             \t                   
      serialization.format    \t

4. 删除表

这跟sql中删掉指令drop table是一样的：

 
  drop table if exists table_name;

针对管理方法表(內部表)，立即把表彻底删除了;针对外界表，还必须删掉相匹配的hdfs文档才会完全将这张表删掉掉，为了更好地安全性，一般hadoop群集是打开垃圾回收站作用的，删掉表面表的数据信息就在垃圾回收站，后边假如想修复也是能够修复的，立即从垃圾回收站mv到hive相匹配文件目录就可以。

5. 改动表

大部分表特性能够根据alter table来改动。

5.1 表重新命名

 
  alter table test.user1 rename to test.user3;

5.2 增、修、删系统分区

提升系统分区应用指令alter table table_name add partition(...) location hdfs_path

 
  alter table test.user2 add if not exists 
  partition (age = 101) location '/user/hive/warehouse/test.db/user2/part-0000101' 
  partition (age = 102) location '/user/hive/warehouse/test.db/user2/part-0000102'

改动系统分区也是应用alter table ... set ...指令

 
  alter table test.user2 partition (age = 101) set location '/user/hive/warehouse/test.db/user2/part-0000110'

删除分区指令文件格式是alter table tableName drop if exists partition(...)

 
  alter table test.user2 drop if exists partition(age = 101)

5.3 改动列信息内容

能够对某一字段名开展重新命名，并改动部位、种类或是注解：

改动前：

 
  hive> desc user_log; 
  OK 
  userid                  string                                       
  time                    string                                       
  url                     string

改动字段名time为times，而且应用after把部位放进url以后，原本是在以前的。

 
  alter table test.user_log 
  change column time times string 
  comment 'salaries' 
  after url;

再看来表结构：

 
  hive> desc user_log; 
  OK 
  userid                  string                                       
  url                     string                                       
  times                   string                  salaries

time -> times，部位在url以后。

5.4 提升列

hive也是能够加上列的：

 
  alter table test.user2 add columns ( 
  birth date comment '生日', 
  hobby string comment '喜好' 
  );

5.5 删掉列

删掉列并不是特定列删掉，必须把原来全部列写一遍，要删掉的列清除掉就可以：

 
  hive> desc test.user3; 
  OK 
  name                    string                  name                 
  salary                  float                   salary               
  address                 struct<country:string,city:string>  home address         
  age                     int                                          
   
  # Partition Information          
  # col_name                data_type               comment              
   
  age                     int

假如要删掉列salary，只必须那样写：

 
  alter table test.user3 replace columns( 
  name string, 
  address struct<country:string,city:string> 
  );

这儿会出错：

 
  FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Replacing columns cannot drop columns for table test.user3. SerDe may be incompatible

这张test.user3表有orc文件格式的，不兼容删掉，如果是textfile文件格式，上边这类replace书写是能够删掉列的。一般状况下不容易随便去删掉列的，提升列倒是普遍。

5.6 改动表的特性

能够提升额外的表特性，或是改动特性，可是删不掉特性：

 
  alter table tableName set tblproperties( 
      'key' = 'value' 
  );

举例说明：这儿新创建一张表：

 
  create table t8(time string,country string,province string,city string) 
  row format delimited fields terminated by '#'  
  lines terminated by '\n'  
  stored as textfile;

这条句子将t8表格中的字段名分节符'#'改动成'\t';

 
  alter table t8 set serdepropertyes('field.delim'='\t');

the end

免责声明：本文不代表本站的观点和立场，如有侵权请联系本站删除！本站仅提供信息存储空间服务。

你会Hive表的基本操作吗？

精选推荐

让 Flutter 在鸿蒙系统上跑起来

小冰一口气发布11个AI歌手：仅训练45天媲美专业级歌手

京东智能客服品牌焕新：“言犀”亮相2020京东JDD大会

AI改进建筑施工安全的十种方式

六项任务、多种数据类型，谷歌、DeepMind提出高效Transformer评估基准

第四范式NeurIPS 2020：知识图谱嵌入的自动化

随机推荐

你会Hive表的基本操作吗？

精选推荐

让 Flutter 在鸿蒙系统上跑起来

小冰一口气发布11个AI歌手：仅训练45天 媲美专业级歌手

京东智能客服品牌焕新：“言犀”亮相2020京东JDD大会

AI改进建筑施工安全的十种方式

六项任务、多种数据类型，谷歌、DeepMind提出高效Transformer评估基准

第四范式NeurIPS 2020：知识图谱嵌入的自动化

随机推荐

小冰一口气发布11个AI歌手：仅训练45天媲美专业级歌手